WorldWideScience

Sample records for length dna sequences

  1. Sequence-Dependent Persistence Length of Long DNA

    Science.gov (United States)

    Chuang, Hui-Min; Reifenberger, Jeffrey G.; Cao, Han; Dorfman, Kevin D.

    2017-12-01

    Using a high-throughput genome-mapping approach, we obtained circa 50 million measurements of the extension of internal human DNA segments in a 41 nm ×41 nm nanochannel. The underlying DNA sequences, obtained by mapping to the reference human genome, are 2.5-393 kilobase pairs long and contain percent GC contents between 32.5% and 60%. Using Odijk's theory for a channel-confined wormlike chain, these data reveal that the DNA persistence length increases by almost 20% as the percent GC content increases. The increased persistence length is rationalized by a model, containing no adjustable parameters, that treats the DNA as a statistical terpolymer with a sequence-dependent intrinsic persistence length and a sequence-independent electrostatic persistence length.

  2. Full-Length Mitochondrial-DNA Sequencing on the PacBio RSII.

    Science.gov (United States)

    Vossen, Rolf H A M; Buermans, Henk P J

    2017-01-01

    Conventional mitochondrial-DNA (MT DNA) sequencing approaches use Sanger sequencing of 20-40 partially overlapping PCR fragments per individual, which is a time- and resource-consuming process. We have developed a high-throughput, accurate, fast, and cost-effective human MT DNA sequencing approach. In this setup we first generate long-range PCR products for two partially overlapping 7.7 and 9.2 kb MT DNA-specific amplicons, add sample-specific barcodes, and sequence these on the PacBio RSII system to obtain full-length MT DNA sequences for genotyping/haplotyping purposes.

  3. Sequencing strategy of mitochondrial HV1 and HV2 DNA with length heteroplasmy

    DEFF Research Database (Denmark)

    Rasmussen, Erik Michael; Sørensen, E; Eriksen, Birthe

    2002-01-01

    downstream of the homopolymeric region. This junction primer method gave clear and unambiguous results using samples from 21 individuals with length heteroplasmy in the hypervariable regions HV1, HV2 or both. The method is of special value for forensic casework, because sequencing of both strands of an mt......We describe a method to obtain reliable mitochondrial DNA (mtDNA) sequences downstream of the homopolymeric stretches with length heteroplasmy in the sequencing direction. The method is based on the use of junction primers that bind to a part of the homopolymeric stretch and the first 2-4 bases...

  4. Investigation of length heteroplasmy in mitochondrial DNA control region by massively parallel sequencing.

    Science.gov (United States)

    Lin, Chun-Yen; Tsai, Li-Chin; Hsieh, Hsing-Mei; Huang, Chia-Hung; Yu, Yu-Jen; Tseng, Bill; Linacre, Adrian; Lee, James Chun-I

    2017-09-01

    Accurate sequencing of the control region of the mitochondrial genome is notoriously difficult due to the presence of polycytosine bases, termed C-tracts. The precise number of bases that constitute a C-tract and the bases beyond the poly cytosines may not be accurately defined when analyzing Sanger sequencing data separated by capillary electrophoresis. Massively parallel sequencing has the potential to resolve such poor definition and provides the opportunity to discover variants due to length heteroplasmy. In this study, the control region of mitochondrial genomes from 20 samples was sequenced using both standard Sanger methods with separation by capillary electrophoresis and also using massively parallel DNA sequencing technology. After comparison of the two sets of generated sequence, with the exception of the C-tracts where length heteroplasmy was observed, all sequences were concordant. Sequences of three segments 16184-16193, 303-315 and 568-573 with C-tracts in HVI, II and III can be clearly defined from the massively parallel sequencing data using the program SEQ Mapper. Multiple sequence variants were observed in the length of C-tracts longer than 7 bases. Our report illustrates the accurate designation of all the length variants leading to heteroplasmy in the control region of the mitochondrial genome that can be determined by SEQ Mapper based on data generated by massively parallel DNA sequencing. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. DNA interactions with a Methylene Blue redox indicator depend on the DNA length and are sequence specific.

    Science.gov (United States)

    Farjami, Elaheh; Clima, Lilia; Gothelf, Kurt V; Ferapontova, Elena E

    2010-06-01

    A DNA molecular beacon approach was used for the analysis of interactions between DNA and Methylene Blue (MB) as a redox indicator of a hybridization event. DNA hairpin structures of different length and guanine (G) content were immobilized onto gold electrodes in their folded states through the alkanethiol linker at the 5'-end. Binding of MB to the folded hairpin DNA was electrochemically studied and compared with binding to the duplex structure formed by hybridization of the hairpin DNA to a complementary DNA strand. Variation of the electrochemical signal from the DNA-MB complex was shown to depend primarily on the DNA length and sequence used: the G-C base pairs were the preferential sites of MB binding in the duplex. For short 20 nts long DNA sequences, the increased electrochemical response from MB bound to the duplex structure was consistent with the increased amount of bound and electrochemically readable MB molecules (i.e. MB molecules that are available for the electron transfer (ET) reaction with the electrode). With longer DNA sequences, the balance between the amounts of the electrochemically readable MB molecules bound to the hairpin DNA and to the hybrid was opposite: a part of the MB molecules bound to the long-sequence DNA duplex seem to be electrochemically mute due to long ET distance. The increasing electrochemical response from MB bound to the short-length DNA hybrid contrasts with the decreasing signal from MB bound to the long-length DNA hybrid and allows an "off"-"on" genosensor development.

  6. Length-independent DNA packing into nanopore zero-mode waveguides for low-input DNA sequencing

    Science.gov (United States)

    Larkin, Joseph; Henley, Robert Y.; Jadhav, Vivek; Korlach, Jonas; Wanunu, Meni

    2017-12-01

    Compared with conventional methods, single-molecule real-time (SMRT) DNA sequencing exhibits longer read lengths than conventional methods, less GC bias, and the ability to read DNA base modifications. However, reading DNA sequence from sub-nanogram quantities is impractical owing to inefficient delivery of DNA molecules into the confines of zero-mode waveguides—zeptolitre optical cavities in which DNA sequencing proceeds. Here, we show that the efficiency of voltage-induced DNA loading into waveguides equipped with nanopores at their floors is five orders of magnitude greater than existing methods. In addition, we find that DNA loading is nearly length-independent, unlike diffusive loading, which is biased towards shorter fragments. We demonstrate here loading and proof-of-principle four-colour sequence readout of a polymerase-bound 20,000-base-pair-long DNA template within seconds from a sub-nanogram input quantity, a step towards low-input DNA sequencing and mammalian epigenomic mapping of native DNA samples.

  7. Generation and Analysis of Full-length cDNA Sequences from Elephant Shark (Callorhinchus milii)

    KAUST Repository

    Kodzius, Rimantas

    2009-03-17

    Cartilaginous fishes are the oldest living group of jawed vertebrates and therefore is an important group for understanding the evolution of vertebrate genomes including the human genome. Our laboratory has proposed elephant shark (C. milii) as a model cartilaginous fish genome because of its relatively small genome size (910 Mb). The whole genome of C. milii is being sequenced (first cartilaginous fish genome to be sequenced completely). To characterize the transcriptome of C. milii and to assist in annotating exon-intron boundaries, transcriptional start sites and alternatively spliced transcripts, we are generating full-length cDNA sequences from C. milii.

  8. Characterization of full-length sequenced cDNA inserts (FLIcs from Atlantic salmon (Salmo salar

    Directory of Open Access Journals (Sweden)

    Lunner Sigbjørn

    2009-10-01

    Full Text Available Abstract Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP, the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91% of the transcripts were annotated using Gene Ontology (GO terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS. The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS. This

  9. cDNA Library Enrichment of Full Length Transcripts for SMRT Long Read Sequencing.

    Science.gov (United States)

    Cartolano, Maria; Huettel, Bruno; Hartwig, Benjamin; Reinhardt, Richard; Schneeberger, Korbinian

    2016-01-01

    The utility of genome assemblies does not only rely on the quality of the assembled genome sequence, but also on the quality of the gene annotations. The Pacific Biosciences Iso-Seq technology is a powerful support for accurate eukaryotic gene model annotation as it allows for direct readout of full-length cDNA sequences without the need for noisy short read-based transcript assembly. We propose the implementation of the TeloPrime Full Length cDNA Amplification kit to the Pacific Biosciences Iso-Seq technology in order to enrich for genuine full-length transcripts in the cDNA libraries. We provide evidence that TeloPrime outperforms the commonly used SMARTer PCR cDNA Synthesis Kit in identifying transcription start and end sites in Arabidopsis thaliana. Furthermore, we show that TeloPrime-based Pacific Biosciences Iso-Seq can be successfully applied to the polyploid genome of bread wheat (Triticum aestivum) not only to efficiently annotate gene models, but also to identify novel transcription sites, gene homeologs, splicing isoforms and previously unidentified gene loci.

  10. Sequencing of mitochondrial HV1 and HV2 DNA with length heteroplasmy

    DEFF Research Database (Denmark)

    Rasmussen, E. Michael; Eriksen, Birthe; Larsen, Hans Jakob

    2003-01-01

    This study presents a fast method for sequencing the poly C/G regions in HV1 and HV2 in the mitochondrial DNA (mtDNA)......This study presents a fast method for sequencing the poly C/G regions in HV1 and HV2 in the mitochondrial DNA (mtDNA)...

  11. Sequencing of first-strand cDNA library reveals full-length transcriptomes.

    Science.gov (United States)

    Agarwal, Saurabh; Macfarlan, Todd S; Sartor, Maureen A; Iwase, Shigeki

    2015-01-21

    Massively parallel strand-specific sequencing of RNA (ssRNA-seq) has emerged as a powerful tool for profiling complex transcriptomes. However, many current methods for ssRNA-seq suffer from the underrepresentation of both the 5' and 3' ends of RNAs, which can be attributed to second-strand cDNA synthesis. The 5' and 3' ends of RNA harbour crucial information for gene regulation; namely, transcription start sites (TSSs) and polyadenylation sites. Here we report a novel ssRNA-seq method that does not involve second-strand cDNA synthesis, as we Directly Ligate sequencing Adaptors to the First-strand cDNA (DLAF). This novel method with fewer enzymatic reactions results in a higher quality of the libraries than the conventional method. Sequencing of DLAF libraries followed by a novel analysis pipeline enables the profiling of both 5' ends and polyadenylation sites at near-base resolution. Therefore, DLAF offers the first genomics tool to obtain the 'full-length' transcriptome with a single library.

  12. 5'-end sequences of budding yeast full-length cDNA clones and quality scores - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project 5'-end sequences of budding yeast full-length cDNA clones and quality... scores Data detail Data name 5'-end sequences of budding yeast full-length cDNA clones and quality...or-capping method, the sequence quality score generated by the Phred software, and links to SGD, dbEST and U...es. FASTA format. Quality Phred's quality score About This Database Database Desc...g yeast full-length cDNA clones and quality scores - Budding yeast cDNA sequencing project | LSDB Archive ...

  13. Incorporation of guanosine gels into sieving matrices for length- and sequence-based separation of DNA in capillary electrophoresis.

    Science.gov (United States)

    Dong, Yingying; McGown, Linda B

    2011-05-01

    Sieving gels are used in capillary gel electrophoresis to resolve DNA strands of different lengths. For complex samples, however, such as those encountered in metagenomic analysis of microbial communities or biofilms, length-based separation may mask the true genetic diversity of the community since different organisms may contribute same-length DNA with different sequences. There is a need, therefore, for DNA separations based on both the length and sequence. Previous work has demonstrated the ability of guanosine gels (G-gels) to separate four single-stranded DNA 76-mers that differ by only a few A/G base substitutions. The goal of the present work is to determine whether G-gels could be combined with commercial sieving gels in order to simultaneously separate DNA based on both length and sequence. The results are given for the four 76-mers and for a standard dsDNA ladder. Commercial sieving gels were used alone and in combination with G-gels. For the 76-mers, the combined medium was less efficient than the G-gel alone but was able to achieve partial resolution. The combined medium was at least as effective as the sieving gel alone at resolving the denatured DNA ladder and showed indications of sequence-based resolution as well, as supported by MALDI-MS. The results show that the combined sieving gel/G-gel medium retains the selectivity of the individual media, providing a promising approach to simultaneous length- and sequence-based DNA separation for metagenomic analysis of complex systems. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. An efficient method for generation and subcloning of tandemly repeated DNA sequences with defined length, orientation and spacing.

    Science.gov (United States)

    Jiang, S W; Trujillo, M A; Eberhardt, N L

    1996-08-15

    Tandemly repeated DNA sequences generated from single synthetic oligonucleotide monomers are useful for many purposes. With conventional ligation procedures low yields and random orientation of oligomers makes cloning of defined repeated sequences difficult. We solved these problems using 2 bp overhangs to direct orientation and random incorporation of linkers containing restriction sites during ligation. Ligation products are amplified by PCR using the linker oligonucleotides as primers. Restriction digestion of the PCR products generate multimer distributions whose length is controlled by the monomer/linker ratio. The concatenated DNA fragments of defined length, orientation and spacing can be directly used for subcloning or other applications without further treatment.

  15. Dna Sequencing

    Science.gov (United States)

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  16. Saccharomyces cerevisiae Hrq1 helicase activity is affected by the sequence but not the length of single-stranded DNA.

    Science.gov (United States)

    Rogers, Cody M; Bochman, Matthew L

    2017-05-13

    Mutations in the human RecQ4 DNA helicase are associated with three different diseases characterized by genomic instability. To gain insight into how RecQ4 dysfunction leads to these pathologies, several groups have used the Saccharomyces cerevisiae RecQ4 homolog Hrq1 as an experimental model. Hrq1 displays many of the same functions as RecQ4 in vivo and in vitro. However, there is some disagreement in the literature about the effects of single-stranded DNA (ssDNA) length on Hrq1 helicase activity and the ability of Hrq1 to anneal complementary ssDNA oligonucleotides into duplex DNA. Here, we present a side-by-side comparison of Hrq1 and RecQ4 helicase activity, demonstrating that in both cases, long random-sequence 3' ssDNA tails inhibit DNA unwinding in vitro in a length-dependent manner. This appears to be due to the formation of secondary structures in the random-sequence ssDNA because Hrq1 preferentially unwound poly(dT)-tailed forks independent of ssDNA length. Further, RecQ4 is capable of ssDNA strand annealing and annealing-dependent strand exchange, but Hrq1 lacks these activities. These results establish the importance of DNA sequence in Hrq1 helicase activity, and the absence of Hrq1 strand annealing activity explains the previously identified discrepancies between S. cerevisiae Hrq1 and human RecQ4. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    Science.gov (United States)

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  18. [Construction and sequence analysis of a normalized full-length cDNA library of Dendrobium officinale].

    Science.gov (United States)

    Jiang, Min; Wang, Jiang; Wen, Guo-Song; Xu, Shao-Zhong; Zha, Ying-Hong; Rong, Tian-Ju; Qian, Xiong

    2013-02-01

    In order to obtain functional genes, a normalized stems cDNA library was constructed from medicinal plant Dendrobium officinale. SMART (switching mechanism at 5' end of RNA transcript) cDNA synthesis combined with DSN (duplex-specific nuclease) normalization was applied to construct the normalized full-length cDNA library of D. officinale. The titer of cDNA library was about 1.3 x 10(6) cfu x mL(-1) and the average insertion size was about 1.5 kb with high recombination rate (93.9%). Random selected 163 positive clones were sequenced at single side. Bio-information analysis indicated that 147 from 150 high-quality unique sequences matched corresponding homologous proteins, and they participated in various biological processes based on GO (gene ontology). There were 8 clones with complete coding sequence, which presumed to be full-length genes. These results showed preliminarily that we successfully constructed a normalized full-length cDNA library of D. officinale which could be used to screen the functional genes related to metabolic pathways of medicinal ingredients.

  19. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    Directory of Open Access Journals (Sweden)

    Wallis James G

    2007-07-01

    Full Text Available Abstract Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12 gene that is responsible for ricinoleate biosynthesis. The role(s of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2 gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at

  20. Construction and EST sequencing of full-length, drought stress cDNA libraries for common beans (Phaseolus vulgaris L.).

    Science.gov (United States)

    Blair, Matthew W; Fernandez, Andrea C; Ishitani, Manabu; Moreta, Danilo; Seki, Motoaki; Ayling, Sarah; Shinozaki, Kazuo

    2011-11-25

    Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs) made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean. Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes. The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole-genome sequences. In addition the library has a large number of

  1. Construction and EST sequencing of full-length, drought stress cDNA libraries for common beans (Phaseolus vulgaris L.

    Directory of Open Access Journals (Sweden)

    Blair Matthew W

    2011-11-01

    Full Text Available Abstract Background Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean. Results Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes. Conclusions The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole

  2. Molecular cloning and nucleotide sequence of full-length cDNA for sweet potato catalase mRNA.

    Science.gov (United States)

    Sakajo, S; Nakamura, K; Asahi, T

    1987-06-01

    A nearly full-length cDNA clone for catalase (pCAS01) was obtained through immunological screening of cDNA expression library constructed from size-fractionated poly(A)-rich RNA of wounded sweet potato tuberous roots by Escherichia coli expression vector-primed cDNA synthesis. Two additional catalase cDNA clones (pCAS10 and pCAS13), which contained cDNA inserts slightly longer than that of pCAS01 at their 5'-termini, were identified by colony hybridization of another cDNA library. Those three catalase cDNAs contained primary structures not identical, but closely related, to one another based on their restriction enzyme and RNase cleavage mapping analyses, suggesting that microheterogeneity exists in catalase mRNAs. The cDNA insert of pCAS13 carried the entire catalase coding capacity, since the RNA transcribed in vitro from the cDNA under the SP6 phage promoter directed the synthesis of a catalase polypeptide in the wheat germ in vitro translation assay. The nucleotide sequencing of these catalase cDNAs indicated that 1900-base catalase mRNA contained a coding region of 1476 bases. The amino acid sequence of sweet potato catalase deduced from the nucleotide sequence was 35 amino acids shorter than rat liver catalase [Furuta, S., Hayashi, H., Hijikata, M., Miyazawa, S., Osumi, T. & Hashimoto, T. (1986) Proc. Natl Acad. Sci. USA 83, 313-317]. Although these two sequences showed only 38% homology, the sequences around the amino acid residues implicated in catalytic function, heme ligand or heme contact had been well conserved during evolution.

  3. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    Directory of Open Access Journals (Sweden)

    Bendahmane Abdelhafid

    2011-05-01

    Full Text Available Abstract Background Melon (Cucumis melo, an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs and 3,073 single nucleotide polymorphisms (SNPs in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but

  4. Human uroporphyrinogen III synthase: Molecular cloning, nucleotide sequence, and expression of a full-length cDNA

    International Nuclear Information System (INIS)

    Tsai, Shihfeng; Bishop, D.F.; Desnick, R.J.

    1988-01-01

    Uroporphyrinogen III synthase, the fourth enzyme in the heme biosynthetic pathway, is responsible for conversion of the linear tetrapyrrole, hydroxymethylbilane, to the cyclic tetrapyrrole, uroporphyrinogen III. The deficient activity of URO-synthase is the enzymatic defect in the autosomal recessive disorder congenital erythropoietic porphyria. To facilitate the isolation of a full-length cDNA for human URO-synthase, the human erythrocyte enzyme was purified to homogeneity and 81 nonoverlapping amino acids were determined by microsequencing the N terminus and four tryptic peptides. Two synthetic oligonucleotide mixtures were used to screen 1.2 x 10 6 recombinants from a human adult liver cDNA library. Eight clones were positive with both oligonucleotide mixtures. Of these, dideoxy sequencing of the 1.3 kilobase insert from clone pUROS-2 revealed 5' and 3' untranslated sequences of 196 and 284 base pairs, respectively, and an open reading frame of 798 base pairs encoding a protein of 265 amino acids with a predicted molecular mass of 28,607 Da. The isolation and expression of this full-length cDNA for human URO-synthase should facilitate studies of the structure, organization, and chromosomal localization of this heme biosynthetic gene as well as the characterization of the molecular lesions causing congenital erythropoietic porphyria

  5. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    Directory of Open Access Journals (Sweden)

    Reginaldo M Kuroshu

    Full Text Available BACKGROUND: Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. METHODOLOGY: We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded, and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. CONCLUSIONS: The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  6. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    Science.gov (United States)

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  7. Complementary DNA-amplified fragment length polymorphism ...

    African Journals Online (AJOL)

    owner

    2011-05-09

    May 9, 2011 ... Complementary DNA-amplified fragment length polymorphism (cDNA-AFLP) technology was used to analyze ... that 9 of the studied expressed sequence tags (ESTs) are related to protein modification, 12 ESTs are involved in the .... primers were used during the first strand synthesis of our cDNA synthesis ...

  8. Sequencing and comparative genomics analysis inSenecio scandensBuch.-Ham. Ex D. Don, based on full-length cDNA library.

    Science.gov (United States)

    Qian, Gang; Ping, Junjiao; Zhang, Zhen; Xu, Delin

    2014-09-03

    Senecio scandens Buch.-Ham. ex D. Don, an important antibacterial source of Chinese traditional medicine, has a widespread distribution in a few ecological habitats of China. We generated a full-length complementary DNA (cDNA) library from a sample of elite individuals with superior antibacterial properties, with satisfactory parameters such as library storage (4.30 × 10 6 CFU), efficiency of titre (1.30 × 10 6 CFU/mL), transformation efficiency (96.35%), full-length ratio (64.00%) and redundancy ratio (3.28%). The BLASTN search revealed the facile formation of counterparts between the experimental sample and Arabidopsis thaliana in view of high-homology cDNA sequence (90.79%) with e -values cDNA clones consist of the major of functional genes identified by a large set of microarray data from the present experimental material. For other Compositae species, a large set of full-length cDNA clones reported in the present article will serve as a useful resource to facilitate further research on the transferability of expressed sequence tag-derived simple sequence repeats (EST-SSR) development, comparative genomics and novel transcript profiles.

  9. A theoretical study of the possible use of electroosmotic flow to extend the read length of DNA sequencing by end-labeled free solution electrophoresis.

    Science.gov (United States)

    McCormick, Laurette C; Slater, Gary W

    2006-05-01

    End-labeled free solution electrophoresis (ELFSE) provides a means of separating DNA with free-solution CE, eliminating the need for gels and polymer solutions which increase the run time and can be difficult to load into a capillary. In free-solution electrophoresis, DNA is normally free-draining and all fragments reach the detector at the same time, whereas ELFSE uses an uncharged label molecule attached to each DNA fragment in order to render the electrophoretic mobility size-dependent. With ELFSE, however, the larger molecules are not separated enough (limiting the read length in the case of ssDNA sequencing) while the smaller ones are overseparated; the larger ones are too fast while the shorter ones are too slow, which is the opposite of traditional gel-based methods. In this article, we show how an EOF could be used to overcome these problems and extend the DNA sequencing read length of ELFSE. This counterflow would allow the larger, previously unresolved molecules more time to separate and thereby increase the read length. Through our theoretical investigation, we predict that an EOF mobility of approximately the same magnitude as that of unlabeled DNA would provide the best results for the regime where all molecules move in the same direction. Even better resolution would be possible for smaller values of EOF which allow different directions of migration; however, the migration times then would become too large. The flow would need to be well controlled since the gain in read length decreases as the magnitude of the counterflow increases; an EOF mobility double that of unlabeled DNA would no longer increase the read length, although ELFSE would still benefit from a reduction in migration time.

  10. 5'-end sequences of budding yeast full-length cDNA clones - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available switchLanguage; BLAST Search Image Search Home About Archive Update History Data ...cription Download License Update History of This Database Site Policy | Contact Us 5'-end sequences of buddi

  11. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution.

    Science.gov (United States)

    Modahl, Cassandra M; Mackessy, Stephen P

    2016-06-01

    Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides

  12. Increased mRNA expression of a laminin-binding protein in human colon carcinoma: Complete sequence of a full-length cDNA encoding the protein

    International Nuclear Information System (INIS)

    Yow, Hsiukang; Wong, Jau Min; Chen, Hai Shiene; Lee, C.; Steele, G.D. Jr.; Chen, Lanbo

    1988-01-01

    Reliable markers to distinguish human colon carcinoma from normal colonic epithelium are needed particularly for poorly differentiated tumors where no useful marker is currently available. To search for markers the authors constructed cDNA libraries from human colon carcinoma cell lines and screened for clones that hybridize to a greater degree with mRNAs of colon carcinomas than with their normal counterparts. Here they report one such cDNA clone that hybridizes with a 1.2-kilobase (kb) mRNA, the level of which is ∼9-fold greater in colon carcinoma than in adjacent normal colonic epithelium. Blot hybridization of total RNA from a variety of human colon carcinoma cell lines shows that the level of this 1.2-kb mRNA in poorly differentiated colon carcinomas is as high as or higher than that in well-differentiated carcinomas. Molecular cloning and complete sequencing of cDNA corresponding to the full-length open reading frame of this 1.2-kb mRNA unexpectedly show it to contain all the partial cDNA sequence encoding 135 amino acid residues previously reported for a human laminin receptor. The deduced amino acid sequence suggests that this putative laminin-binding protein from human colon carcinomas consists of 295 amino acid residues with interesting features. There is an unusual C-terminal 70-amino acid segment, which is trypsin-resistant and highly negatively charged

  13. Assessment of adaptive evolution between wheat and rice as deduced from full-length common wheat cDNA sequence data and expression patterns

    Directory of Open Access Journals (Sweden)

    Hayashizaki Yoshihide

    2009-06-01

    Full Text Available Abstract Background Wheat is an allopolyploid plant that harbors a huge, complex genome. Therefore, accumulation of expressed sequence tags (ESTs for wheat is becoming particularly important for functional genomics and molecular breeding. We prepared a comprehensive collection of ESTs from the various tissues that develop during the wheat life cycle and from tissues subjected to stress. We also examined their expression profiles in silico. As full-length cDNAs are indispensable to certify the collected ESTs and annotate the genes in the wheat genome, we performed a systematic survey and sequencing of the full-length cDNA clones. This sequence information is a valuable genetic resource for functional genomics and will enable carrying out comparative genomics in cereals. Results As part of the functional genomics and development of genomic wheat resources, we have generated a collection of full-length cDNAs from common wheat. By grouping the ESTs of recombinant clones randomly selected from the full-length cDNA library, we were able to sequence 6,162 independent clones with high accuracy. About 10% of the clones were wheat-unique genes, without any counterparts within the DNA database. Wheat clones that showed high homology to those of rice were selected in order to investigate their expression patterns in various tissues throughout the wheat life cycle and in response to abiotic-stress treatments. To assess the variability of genes that have evolved differently in wheat and rice, we calculated the substitution rate (Ka/Ks of the counterparts in wheat and rice. Genes that were preferentially expressed in certain tissues or treatments had higher Ka/Ks values than those in other tissues and treatments, which suggests that the genes with the higher variability expressed in these tissues is under adaptive selection. Conclusion We have generated a high-quality full-length cDNA resource for common wheat, which is essential for continuation of the

  14. Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum).

    Science.gov (United States)

    Ke, Tao; Dong, Caihua; Mao, Han; Zhao, Yingzhong; Chen, Hong; Liu, Hongyan; Dong, Xuyan; Tong, Chaobo; Liu, Shengyi

    2011-12-24

    Sesame (Sesamum indicum) is one of the most important oilseed crops with high oil contents and rich nutrient value. However, genetic improvement efforts in sesame could not get benefit from molecular biology technology due to poor DNA and RNA sequence resources. In this study, we carried out a large scale of expressed sequence tags (ESTs) sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes. A normalized and full-length enriched cDNA library from 5 ~ 30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs) which then formed 4,713 contigs and 27,708 singletons with 44.9% uniESTs being putative full-length open reading frames. Approximately 26,091 of all these uniESTs have significant matches to the counterparts in Nr database of GenBank, and 21,628 of them were assigned to one or more Gene ontology (GO) terms. Homologous genes involved in oil biosynthesis were identified including some conservative transcription factors regulating oil biosynthesis such as LEAFY COTYLEDON1 (LEC1), PICKLE (PKL), WRINKLED1 (WRI1) and majority of them were found for the first time in sesame seeds. One hundred and 17 ESTs were identified possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin. In total, 9,347 putative functional genes from developing seeds were identified, which accounts for one third of total genes in the sesame genome. Further analysis of the uniESTs identified 1,949 non-redundant simple sequence repeats (SSRs). This study has provided an overview of genes expressed during sesame seed development. This collection of sesame full-length cDNAs covered a wide variety of genes in seeds, in particular, candidate genes involved in biosynthesis of sesame oils and lignans. These EST sequences enriched with full length will contribute to comparative genomic studies on sesame and other oilseed plants and serve as an abundant

  15. Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum

    Directory of Open Access Journals (Sweden)

    Ke Tao

    2011-12-01

    Full Text Available Abstract Background Sesame (Sesamum indicum is one of the most important oilseed crops with high oil contents and rich nutrient value. However, genetic improvement efforts in sesame could not get benefit from molecular biology technology due to poor DNA and RNA sequence resources. In this study, we carried out a large scale of expressed sequence tags (ESTs sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes. Results A normalized and full-length enriched cDNA library from 5 ~ 30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs which then formed 4,713 contigs and 27,708 singletons with 44.9% uniESTs being putative full-length open reading frames. Approximately 26,091 of all these uniESTs have significant matches to the counterparts in Nr database of GenBank, and 21,628 of them were assigned to one or more Gene ontology (GO terms. Homologous genes involved in oil biosynthesis were identified including some conservative transcription factors regulating oil biosynthesis such as LEAFY COTYLEDON1 (LEC1, PICKLE (PKL, WRINKLED1 (WRI1 and majority of them were found for the first time in sesame seeds. One hundred and 17 ESTs were identified possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin. In total, 9,347 putative functional genes from developing seeds were identified, which accounts for one third of total genes in the sesame genome. Further analysis of the uniESTs identified 1,949 non-redundant simple sequence repeats (SSRs. Conclusions This study has provided an overview of genes expressed during sesame seed development. This collection of sesame full-length cDNAs covered a wide variety of genes in seeds, in particular, candidate genes involved in biosynthesis of sesame oils and lignans. These EST sequences enriched with full length will contribute to comparative genomic studies on sesame and

  16. Construction of a full-length cDNA library and preliminary analysis of expressed sequence tags from lymphocytes of half-pipe snowboarding athletes.

    Science.gov (United States)

    Zhao, Y H; Zhang, Z B; Zhao, C Q; Zhang, Y; Wang, Y F; Guan, W J; Zhu, Z Q

    2015-10-21

    The genes of top athletes are a valuable genetic resource for the human race, and could be exploited to identify novel genes related to sports ability, as well as other functions. We analyzed the expressed sequence tags from top half-pipe snowboarding athletes using the SMART complementary DNA (cDNA) library construction method to elucidate the characteristics of the athlete genome and the differential expression of the genes it contains. Overall, we established a full-length cDNA library from the lymphocytes of half-pipe snowboarding athletes and analyzed the inserted gene fragments. We also classified those genes according to molecular function, biological characteristics, cellular composition, protein types, and signal paths. A total of 201 functional genes were noted, which were distributed in 27 pathways. TXN, MDH1, ARL1, ARPC3, ACTG1, and other genes measured in sequence may be associated with physical ability. This suggests that the SMART cDNA library constructed from the genetic material from top athletes is an effective tool for preserving genetic sports resources and providing genetic markers of physical ability for athlete selection.

  17. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  18. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Directory of Open Access Journals (Sweden)

    Changqing Liu

    2013-05-01

    Full Text Available In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  19. Construction of a full-length enriched cDNA library and preliminary analysis of expressed sequence tags from Bengal Tiger Panthera tigris tigris.

    Science.gov (United States)

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-05-24

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  20. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Science.gov (United States)

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  1. Salmo salar and Esox lucius full-length cDNA sequences reveal changes in evolutionary pressures on a post-tetraploidization genome

    Directory of Open Access Journals (Sweden)

    Holt Robert A

    2010-04-01

    Full Text Available Abstract Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar, but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate.

  2. Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds

    Directory of Open Access Journals (Sweden)

    Sugantham Priyanka Annabel

    2010-10-01

    Full Text Available Abstract Background Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs. Results A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 106 clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. Conclusions The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding

  3. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  4. DNA Sequencing apparatus

    Science.gov (United States)

    Tabor, Stanley; Richardson, Charles C.

    1992-01-01

    An automated DNA sequencing apparatus having a reactor for providing at least two series of DNA products formed from a single primer and a DNA strand, each DNA product of a series differing in molecular weight and having a chain terminating agent at one end; separating means for separating the DNA products to form a series bands, the intensity of substantially all nearby bands in a different series being different, band reading means for determining the position an This invention was made with government support including a grant from the U.S. Public Health Service, contract number AI-06045. The U.S. government has certain rights in the invention.

  5. Using DNA looping to measure sequence dependent DNA elasticity

    Science.gov (United States)

    Kandinov, Alan; Raghunathan, Krishnan; Meiners, Jens-Christian

    2012-10-01

    We are using tethered particle motion (TPM) microscopy to observe protein-mediated DNA looping in the lactose repressor system in DNA constructs with varying AT / CG content. We use these data to determine the persistence length of the DNA as a function of its sequence content and compare the data to direct micromechanical measurements with constant-force axial optical tweezers. The data from the TPM experiments show a much smaller sequence effect on the persistence length than the optical tweezers experiments.

  6. Image analysis for DNA sequencing

    International Nuclear Information System (INIS)

    Palaniappan, K.; Huang, T.S.

    1991-01-01

    This paper reports that there is a great deal of interest in automating the process of DNA (deoxyribonucleic acid) sequencing to support the analysis of genomic DNA such as the Human and Mouse Genome projects. In one class of gel-based sequencing protocols autoradiograph images are generated in the final step and usually require manual interpretation to reconstruct the DNA sequence represented by the image. The need to handle a large volume of sequence information necessitates automation of the manual autoradiograph reading step through image analysis in order to reduce the length of time required to obtain sequence data and reduce transcription errors. Various adaptive image enhancement, segmentation and alignment methods were applied to autoradiograph images. The methods are adaptive to the local characteristics of the image such as noise, background signal, or presence of edges. Once the two-dimensional data is converted to a set of aligned one-dimensional profiles waveform analysis is used to determine the location of each band which represents one nucleotide in the sequence. Different classification strategies including a rule-based approach are investigated to map the profile signals, augmented with the original two-dimensional image data as necessary, to textual DNA sequence information

  7. Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale

    Directory of Open Access Journals (Sweden)

    Liu He

    2017-10-01

    Full Text Available Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS and single-molecule real-time (SMRT sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET and one sucrose transporter (SUT are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT and four cellulose synthase (Ces genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem.

  8. DNA Sequencing Sensors: An Overview

    Directory of Open Access Journals (Sweden)

    Jose Antonio Garrido-Cardenas

    2017-03-01

    Full Text Available The first sequencing of a complete genome was published forty years ago by the double Nobel Prize in Chemistry winner Frederick Sanger. That corresponded to the small sized genome of a bacteriophage, but since then there have been many complex organisms whose DNA have been sequenced. This was possible thanks to continuous advances in the fields of biochemistry and molecular genetics, but also in other areas such as nanotechnology and computing. Nowadays, sequencing sensors based on genetic material have little to do with those used by Sanger. The emergence of mass sequencing sensors, or new generation sequencing (NGS meant a quantitative leap both in the volume of genetic material that was able to be sequenced in each trial, as well as in the time per run and its cost. One can envisage that incoming technologies, already known as fourth generation sequencing, will continue to cheapen the trials by increasing DNA reading lengths in each run. All of this would be impossible without sensors and detection systems becoming smaller and more precise. This article provides a comprehensive overview on sensors for DNA sequencing developed within the last 40 years.

  9. Channel plate for DNA sequencing

    Science.gov (United States)

    Douthart, Richard J.; Crowell, Shannon L.

    1998-01-01

    This invention is a channel plate that facilitates data compaction in DNA sequencing. The channel plate has a length, a width and a thickness, and further has a plurality of channels that are parallel. Each channel has a depth partially through the thickness of the channel plate. Additionally an interface edge permits electrical communication across an interface through a buffer to a deposition membrane surface.

  10. Metric representation of DNA sequences.

    Science.gov (United States)

    Wu, Z B

    2000-07-01

    A metric representation of DNA sequences is borrowed from symbolic dynamics. In view of this method, the pattern seen in the chaos game representation of DNA sequences is explained as the suppression of certain nucleotide strings in the DNA sequences. Frequencies of short nucleotide strings and suppression of the shortest ones in the DNA sequences can be determined by using the metric representation.

  11. Evolution of DNA sequencing.

    Science.gov (United States)

    Tipu, Hamid Nawaz; Shabbir, Ambreen

    2015-03-01

    Sanger and coworkers introduced DNA sequencing in 1970s for the first time. It principally relied on termination of growing nucleotide chain when a dideoxythymidine triphosphate (ddTTP) was inserted in it. Detection of terminated sequences was done radiographically on Polyacrylamide Gel Electrophoresis (PAGE). Improvements that have evolved over time in original Sanger sequencing include replacement of radiography with fluorescence, use of separate fluorescent markers for each nucleotide, use of capillary electrophoresis instead of polyacrylamide gel electrophoresis and then introduction of capillary array electrophoresis. However, this technique suffered from few inherent limitations like decreased sensitivity for low level mutant alleles, complexities in analyzing highly polymorphic regions like Major Histocompatibility Complex (MHC) and high DNA concentrations required. Several Next Generation Sequencing (NGS) technologies have been introduced by Roche, Illumina and other commercial manufacturers that tend to overcome Sanger sequencing limitations and have been reviewed. Introduction of NGS in clinical research and medical diagnostics is expected to change entire diagnostic approach. These include study of cancer variants, detection of minimal residual disease, exome sequencing, detection of Single Nucleotide Polymorphisms (SNPs) and their disease association, epigenetic regulation of gene expression and sequencing of microorganisms genome.

  12. Complementary DNA-amplified fragment length polymorphism ...

    African Journals Online (AJOL)

    Complementary DNA-amplified fragment length polymorphism (AFLP-cDNA) analysis of differential gene expression from the xerophyte Ammopiptanthus mongolicus in response to cold, drought and cold together with drought.

  13. Construction and evaluation of normalized cDNA libraries enriched with full-length sequences for rapid discovery of new genes from Sisal (Agave sisalana Perr.) different developmental stages.

    Science.gov (United States)

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-10-12

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  14. Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr. Different Developmental Stages

    Directory of Open Access Journals (Sweden)

    Jun-Feng Li

    2012-10-01

    Full Text Available To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN. This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  15. Construction of full-length cDNA library and development of EST-derived simple sequence repeat (EST-SSR) markers in Senecio scandens.

    Science.gov (United States)

    Qian, Gang; Ping, Junjiao; Lu, Jian; Zhang, Zhen; Wang, Lei; Xu, Delin

    2014-12-01

    Senecio scandens Buch.-Ham. ex D. Don (Compositae) is a crucial source of Chinese traditional medicine with antibacterial properties. We constructed a cDNA library and obtained expressed sequence tags (ESTs) to show the distribution of gene ontology annotations for mRNAs, using an individual plant with superior antibacterial characteristics. Analysis of comparative genomics indicates that the putative uncharacterized proteins (21.07%) might be derived from "molecular function unknown" clones or rare transcripts. Furthermore, the Compositae had high cross-species transferability of EST-derived simple sequence repeats (EST-SSR), based on valid amplifications of 206 primer pairs developed from the newly assembled expressed sequence tag sequences in Artemisia annua L. Among those EST-SSR markers, 52 primers showed polymorphic amplifications between individuals with contrasting diverse antibacterial traits. Our sequence data and molecular markers will be cost-effective tools for further studies such as genome annotation, molecular breeding, and novel transcript profiles within Compositae species.

  16. Characterization of Erwinia amylovora strains from different host plants using repetitive-sequences PCR analysis, and restriction fragment length polymorphism and short-sequence DNA repeats of plasmid pEA29.

    Science.gov (United States)

    Barionovi, D; Giorgi, S; Stoeger, A R; Ruppitsch, W; Scortichini, M

    2006-05-01

    The three main aims of the study were the assessment of the genetic relationship between a deviating Erwinia amylovora strain isolated from Amelanchier sp. (Maloideae) grown in Canada and other strains from Maloideae and Rosoideae, the investigation of the variability of the PstI fragment of the pEA29 plasmid using restriction fragment length polymorphism (RFLP) analysis and the determination of the number of short-sequence DNA repeats (SSR) by DNA sequence analysis in representative strains. Ninety-three strains obtained from 12 plant genera and different geographical locations were examined by repetitive-sequences PCR using Enterobacterial Repetitive Intergenic Consensus, BOX and Repetitive Extragenic Palindromic primer sets. Upon the unweighted pair group method with arithmetic mean analysis, a deviating strain from Amelanchier sp. was analysed using amplified ribosomal DNA restriction analysis (ARDRA) analysis and the sequencing of the 16S rDNA gene. This strain showed 99% similarity to other E. amylovora strains in the 16S gene and the same banding pattern with ARDRA. The RFLP analysis of pEA29 plasmid using MspI and Sau3A restriction enzymes showed a higher variability than that previously observed and no clear-cut grouping of the strains was possible. The number of SSR units reiterated two to 12 times. The strains obtained from pear orchards showing for the first time symptoms of fire blight had a low number of SSR units. The strains from Maloideae exhibit a wider genetic variability than previously thought. The RFLP analysis of a fragment of the pEA29 plasmid would not seem a reliable method for typing E. amylovora strains. A low number of SSR units was observed with first epidemics of fire blight. The current detection techniques are mainly based on the genetic similarities observed within the strains from the cultivated tree-fruit crops. For a more reliable detection of the fire blight pathogen also in wild and ornamentals Rosaceous plants the genetic

  17. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    Science.gov (United States)

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  18. Cloning, sequencing and expression of cDNA encoding growth ...

    Indian Academy of Sciences (India)

    Using polymerase chain reaction (PCR) primers representing the conserved regions of fish GH sequences the 3′ region of catfish GH cDNA (540 bp) was cloned by random amplification of cDNA ends and the clone was used as a probe to isolate recombinant phages carrying the full-length cDNA sequence. The full-length ...

  19. The Dynamics of DNA Sequencing.

    Science.gov (United States)

    Morvillo, Nancy

    1997-01-01

    Describes a paper-and-pencil activity that helps students understand DNA sequencing and expands student understanding of DNA structure, replication, and gel electrophoresis. Appropriate for advanced biology students who are familiar with the Sanger method. (DDR)

  20. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  1. Characterization of Mycoplasma hyosynoviae strains by amplified fragment length polymorphism analysis, pulsed-field gel electrophoresis and 16S ribosomal DNA sequencing

    DEFF Research Database (Denmark)

    Kokotovic, Branko; Friis, N.F.; Ahrens, Peter

    2002-01-01

    , were investigated by analysis of amplified fragment length polymorphisms of the Bgl II and Mfe I restriction sites and by pulsed-field gel electrophoresis of a Bss HII digest of chromosomal DNA. Both methods allowed unambiguous differentiation of the analysed strains and showed similar discriminatory...... potential for the differentiation of M. hyosynoviae isolates. Concordant results obtained with the two whole-genome fingerprinting techniques evidence the considerable intraspecies genetic heterogeneity of M. hyosynoviae . Sixteen field strains of M. hyosynoviae and the type strain S16(T) were further...

  2. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  3. Graphene nanodevices for DNA sequencing

    Science.gov (United States)

    Heerema, Stephanie J.; Dekker, Cees

    2016-02-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with nanopores. Owing to its unique structure and properties, graphene provides interesting opportunities for the development of a new sequencing technology. In recent years, a wide range of creative ideas for graphene sequencers have been theoretically proposed and the first experimental demonstrations have begun to appear. Here, we review the different approaches to using graphene nanodevices for DNA sequencing, which involve DNA passing through graphene nanopores, nanogaps, and nanoribbons, and the physisorption of DNA on graphene nanostructures. We discuss the advantages and problems of each of these key techniques, and provide a perspective on the use of graphene in future DNA sequencing technology.

  4. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  5. Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding

    Directory of Open Access Journals (Sweden)

    Douglas Carl J

    2008-01-01

    Full Text Available Abstract Background The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar has been established as a model system for genomics studies of growth, development, and adaptation of woody perennial plants including secondary xylem formation, dormancy, adaptation to local environments, and biotic interactions. Results As part of the poplar genome sequencing project and the development of genomic resources for poplar, we have generated a full-length (FL-cDNA collection using the biotinylated CAP trapper method. We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa × P. deltoides hybrid. Following careful selection of candidate cDNA clones, we used a combined strategy of paired end reads and primer walking to generate a set of 4,664 high-accuracy, sequence-verified FLcDNAs, which clustered into 3,990 putative unique genes. Mapping FLcDNAs to the poplar genome sequence combined with BLAST comparisons to previously predicted protein coding sequences in the poplar genome identified 39 FLcDNAs that likely localize to gaps in the current genome sequence assembly. Another 173 FLcDNAs mapped to the genome sequence but were not included among the previously predicted genes in the poplar genome. Comparative sequence analysis against Arabidopsis thaliana and other species in the non-redundant database of GenBank revealed that 11.5% of the poplar FLcDNAs display no significant sequence similarity to other plant proteins. By mapping the poplar FLcDNAs against transcriptome data previously obtained with a 15.5 K cDNA microarray, we identified 153 FLcDNA clones

  6. Technology development for gene discovery and full-length sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  7. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  8. Generation and analysis of a large-scale expressed sequence tags from a full-length enriched cDNA library of Siberian tiger (Panthera tigris altaica).

    Science.gov (United States)

    Guo, Yu; Liu, Changqing; Lu, Taofeng; Liu, Dan; Bai, Chunyu; Li, Xiangchen; Ma, Yuehui; Guan, Weijun

    2014-05-15

    In this study, a full-length enriched cDNA library was successfully constructed from Siberian tiger, the world's most endangered species. The titers of primary and amplified libraries were 1.28×10(6)pfu/mL and 1.59×10(10)pfu/mL respectively. The proportion of recombinants from unamplified library was 91.3% and the average length of exogenous inserts was 1.06kb. A total of 279 individual ESTs with sizes ranging from 316 to 1258bps were then analyzed. Furthermore, 204 unigenes were successfully annotated and involved in 49 functions of the GO classification, cell (175, 85.5%), cellular process (165, 80.9%), and binding (152, 74.5%) are the dominant terms. 198 unigenes were assigned to 156 KEGG pathways, and the pathways with the most representation are metabolic pathways (18, 9.1%). The proportion pattern of each COG subcategory was similar among Panthera tigris altaica, P. tigris tigris and Homo sapiens, and general function prediction only cluster (44, 15.8%) represents the largest group, followed by translation, ribosomal structure and biogenesis (33, 11.8%), replication, recombination and repair (24, 8.6%), and only 7.2% ESTs classified as novel genes. Moreover, the recombinant plasmid pET32a-TAT-COL6A2 was constructed, coded for the Trx-TAT-COL6A2 fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-COL6A2 recombinant protein was 2.64±0.18mg/mL. This library will provide a useful platform for the functional genome and transcriptome research of for the P. tigris and other felid animals in the future. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Generation and analysis of a large-scale expressed sequence Tag database from a full-length enriched cDNA library of developing leaves of Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Min Lin

    Full Text Available BACKGROUND: Cotton (Gossypium hirsutum L. is one of the world's most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. METHODOLOGY/PRINCIPAL FINDINGS: In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR, which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. CONCLUSIONS/SIGNIFICANCE: These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence

  10. Duplication in DNA Sequences

    Science.gov (United States)

    Ito, Masami; Kari, Lila; Kincaid, Zachary; Seki, Shinnosuke

    The duplication and repeat-deletion operations are the basis of a formal language theoretic model of errors that can occur during DNA replication. During DNA replication, subsequences of a strand of DNA may be copied several times (resulting in duplications) or skipped (resulting in repeat-deletions). As formal language operations, iterated duplication and repeat-deletion of words and languages have been well studied in the literature. However, little is known about single-step duplications and repeat-deletions. In this paper, we investigate several properties of these operations, including closure properties of language families in the Chomsky hierarchy and equations involving these operations. We also make progress toward a characterization of regular languages that are generated by duplicating a regular language.

  11. Graphene nanodevices for DNA sequencing

    NARCIS (Netherlands)

    Heerema, S.J.; Dekker, C.

    2016-01-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with

  12. In situ detection of tandem DNA repeat length

    Energy Technology Data Exchange (ETDEWEB)

    Yaar, R.; Szafranski, P.; Cantor, C.R.; Smith, C.L. [Boston Univ., MA (United States)

    1996-11-01

    A simple method for scoring short tandem DNA repeats is presented. An oligonucleotide target, containing tandem repeats embedded in a unique sequence, was hybridized to a set of complementary probes, containing tandem repeats of known lengths. Single-stranded loop structures formed on duplexes containing a mismatched (different) number of tandem repeats. No loop structure formed on duplexes containing a matched (identical) number of tandem repeats. The matched and mismatched loop structures were enzymatically distinguished and differentially labeled by treatment with S1 nuclease and the Klenow fragment of DNA polymerase. 7 refs., 4 figs.

  13. Genotypic Characterization of Bradyrhizobium Strains Nodulating Endemic Woody Legumes of the Canary Islands by PCR-Restriction Fragment Length Polymorphism Analysis of Genes Encoding 16S rRNA (16S rDNA) and 16S-23S rDNA Intergenic Spacers, Repetitive Extragenic Palindromic PCR Genomic Fingerprinting, and Partial 16S rDNA Sequencing

    Science.gov (United States)

    Vinuesa, Pablo; Rademaker, Jan L. W.; de Bruijn, Frans J.; Werner, Dietrich

    1998-01-01

    We present a phylogenetic analysis of nine strains of symbiotic nitrogen-fixing bacteria isolated from nodules of tagasaste (Chamaecytisus proliferus) and other endemic woody legumes of the Canary Islands, Spain. These and several reference strains were characterized genotypically at different levels of taxonomic resolution by computer-assisted analysis of 16S ribosomal DNA (rDNA) PCR-restriction fragment length polymorphisms (PCR-RFLPs), 16S-23S rDNA intergenic spacer (IGS) RFLPs, and repetitive extragenic palindromic PCR (rep-PCR) genomic fingerprints with BOX, ERIC, and REP primers. Cluster analysis of 16S rDNA restriction patterns with four tetrameric endonucleases grouped the Canarian isolates with the two reference strains, Bradyrhizobium japonicum USDA 110spc4 and Bradyrhizobium sp. strain (Centrosema) CIAT 3101, resolving three genotypes within these bradyrhizobia. In the analysis of IGS RFLPs with three enzymes, six groups were found, whereas rep-PCR fingerprinting revealed an even greater genotypic diversity, with only two of the Canarian strains having similar fingerprints. Furthermore, we show that IGS RFLPs and even very dissimilar rep-PCR fingerprints can be clustered into phylogenetically sound groupings by combining them with 16S rDNA RFLPs in computer-assisted cluster analysis of electrophoretic patterns. The DNA sequence analysis of a highly variable 264-bp segment of the 16S rRNA genes of these strains was found to be consistent with the fingerprint-based classification. Three different DNA sequences were obtained, one of which was not previously described, and all belonged to the B. japonicum/Rhodopseudomonas rDNA cluster. Nodulation assays revealed that none of the Canarian isolates nodulated Glycine max or Leucaena leucocephala, but all nodulated Acacia pendula, C. proliferus, Macroptilium atropurpureum, and Vigna unguiculata. PMID:9603820

  14. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  15. DNA Sequencing by Capillary Electrophoresis

    Science.gov (United States)

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  16. A Demonstration of Automated DNA Sequencing.

    Science.gov (United States)

    Latourelle, Sandra; Seidel-Rogol, Bonnie

    1998-01-01

    Details a simulation that employs a paper-and-pencil model to demonstrate the principles behind automated DNA sequencing. Discusses the advantages of automated sequencing as well as the chemistry of automated DNA sequencing. (DDR)

  17. Enhanced throughput for infrared automated DNA sequencing

    Science.gov (United States)

    Middendorf, Lyle R.; Gartside, Bill O.; Humphrey, Pat G.; Roemer, Stephen C.; Sorensen, David R.; Steffens, David L.; Sutter, Scott L.

    1995-04-01

    Several enhancements have been developed and applied to infrared automated DNA sequencing resulting in significantly higher throughput. A 41 cm sequencing gel (31 cm well- to-read distance) combines high resolution of DNA sequencing fragments with optimized run times yielding two runs per day of 500 bases per sample. A 66 cm sequencing gel (56 cm well-to-read distance) produces sequence read lengths of up to 1000 bases for ds and ss templates using either T7 polymerase or cycle-sequencing protocols. Using a multichannel syringe to load 64 lanes allows 16 samples (compatible with 96-well format) to be visualized for each run. The 41 cm gel configuration allows 16,000 bases per day (16 samples X 500 bases/sample X 2 ten hour runs/day) to be sequenced with the advantages of infrared technology. Enhancements to internal labeling techniques using an infrared-labeled dATP molecule (Boehringer Mannheim GmbH, Penzberg, Germany; Sequenase (U.S. Biochemical) have also been made. The inclusion of glycerol in the sequencing reactions yields greatly improved results for some primer and template combinations. The inclusion of (alpha) -Thio-dNTP's in the labeling reaction increases signal intensity two- to three-fold.

  18. Apparatus for improved DNA sequencing

    Science.gov (United States)

    Douthart, Richard J.; Crowell, Shannon L.

    1996-01-01

    This invention is a means for the rapid sequencing of DNA samples. More specifically, it consists of a new design direct blotting electrophoresis unit. The DNA sequence is deposited on a membrane attached to a rotating drum. Initial data compaction is facilitated by the use of a machined multi-channeled plate called a ribbon channel plate. Each channel is an isolated mini gel system much like a gel filled capillary. The system as a whole, however, is in a slab gel like format with the advantages of uniformity and easy reusability. The system can be used in different embodiments. The drum system is unique in that after deposition the drum rotates the deposited DNA into a large non-buffer open space where processing and detection can occur. The drum can also be removed in toto to special workstations for downstream processing, multiplexing and detection.

  19. The sequence of sequencers: The history of sequencing DNA

    Science.gov (United States)

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  20. Length quantization of DNA partially expelled from heads of a bacteriophage T3 mutant

    Energy Technology Data Exchange (ETDEWEB)

    Serwer, Philip, E-mail: serwer@uthscsa.edu [Department of Biochemistry, The University of Texas Health Science Center, 7703 Floyd Curl Drive, San Antonio, TX 78229-3900 (United States); Wright, Elena T. [Department of Biochemistry, The University of Texas Health Science Center, 7703 Floyd Curl Drive, San Antonio, TX 78229-3900 (United States); Liu, Zheng; Jiang, Wen [Markey Center for Structural Biology, Department of Biological Sciences, Purdue University, West Lafayette, IN 47907 (United States)

    2014-05-15

    DNA packaging of phages phi29, T3 and T7 sometimes produces incompletely packaged DNA with quantized lengths, based on gel electrophoretic band formation. We discover here a packaging ATPase-free, in vitro model for packaged DNA length quantization. We use directed evolution to isolate a five-site T3 point mutant that hyper-produces tail-free capsids with mature DNA (heads). Three tail gene mutations, but no head gene mutations, are present. A variable-length DNA segment leaks from some mutant heads, based on DNase I-protection assay and electron microscopy. The protected DNA segment has quantized lengths, based on restriction endonuclease analysis: six sharp bands of DNA missing 3.7–12.3% of the last end packaged. Native gel electrophoresis confirms quantized DNA expulsion and, after removal of external DNA, provides evidence that capsid radius is the quantization-ruler. Capsid-based DNA length quantization possibly evolved via selection for stalling that provides time for feedback control during DNA packaging and injection. - Graphical abstract: Highlights: • We implement directed evolution- and DNA-sequencing-based phage assembly genetics. • We purify stable, mutant phage heads with a partially leaked mature DNA molecule. • Native gels and DNase-protection show leaked DNA segments to have quantized lengths. • Native gels after DNase I-removal of leaked DNA reveal the capsids to vary in radius. • Thus, we hypothesize leaked DNA quantization via variably quantized capsid radius.

  1. Generation of full-length cDNA libraries: focus on plants.

    Science.gov (United States)

    Seki, Motoaki; Kamiya, Asako; Carninci, Piero; Hayashizaki, Yoshihide; Shinozaki, Kazuo

    2009-01-01

    Full-length cDNAs are essential for the correct annotation of transcriptional units and gene products from genomic sequence data and for functional analysis of the genes. Full-length cDNA libraries are very important resources for isolation of the full-length cDNAs. The biotinylated cap trapper method using the trehalose-thermostabilized reverse transcriptase has been developed and has become an efficient method for construction of high-content full-length cDNA libraries. We have constructed full-length cDNA libraries from various plants and animals using this method. The protocol of the method is described in this chapter.

  2. Order and correlations in genomic DNA sequences. The spectral approach

    International Nuclear Information System (INIS)

    Lobzin, Vasilii V; Chechetkin, Vladimir R

    2000-01-01

    The structural analysis of genomic DNA sequences is discussed in the framework of the spectral approach, which is sufficiently universal due to the reciprocal correspondence and mutual complementarity of Fourier transform length scales. The spectral characteristics of random sequences of the same nucleotide composition possess the property of self-averaging for relatively short sequences of length M≥100-300. Comparison with the characteristics of random sequences determines the statistical significance of the structural features observed. Apart from traditional applications to the search for hidden periodicities, spectral methods are also efficient in studying mutual correlations in DNA sequences. By combining spectra for structure factors and correlation functions, not only integral correlations can be estimated but also their origin identified. Using the structural spectral entropy approach, the regularity of a sequence can be quantitatively assessed. A brief introduction to the problem is also presented and other major methods of DNA sequence analysis described. (reviews of topical problems)

  3. Entropic fluctuations in DNA sequences

    Science.gov (United States)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  4. Perspectives in Biochemistry: Methods for DNA Sequencing.

    Science.gov (United States)

    Wood, Anne T.

    1984-01-01

    Describes two frequently used DNA sequencing methods: Sander's enzymatic dideoxy method and Maxam and Gilbert's chemical sequencing method. Indicates that studying these methods provides students with knowledge of the chemical structure of DNA and how DNA sequence data are obtained. (JN)

  5. Complete nucleotide sequences and construction of full-length infectious cDNA clones of cucumber green mottle mosaic virus (CGMMV) in a versatile newly developed binary vector including both 35S and T7 promoters.

    Science.gov (United States)

    Park, Chan-Hwan; Ju, Hye-Kyoung; Han, Jae-Yeong; Park, Jong-Seo; Kim, Ik-Hyun; Seo, Eun-Young; Kim, Jung-Kyu; Hammond, John; Lim, Hyoun-Sub

    2017-04-01

    Seed-transmitted viruses have caused significant damage to watermelon crops in Korea in recent years, with cucumber green mottle mosaic virus (CGMMV) infection widespread as a result of infected seed lots. To determine the likely origin of CGMMV infection, we collected CGMMV isolates from watermelon and melon fields and generated full-length infectious cDNA clones. The full-length cDNAs were cloned into newly constructed binary vector pJY, which includes both the 35S and T7 promoters for versatile usage (agroinfiltration and in vitro RNA transcription) and a modified hepatitis delta virus ribozyme sequence to precisely cleave RNA transcripts at the 3' end of the tobamovirus genome. Three CGMMV isolates (OMpj, Wpj, and Mpj) were separately evaluated for infectivity in Nicotiana benthamiana, demonstrated by either Agroinfiltration or inoculation with in vitro RNA transcripts. CGMMV nucleotide identities to other tobamoviruses were calculated from pairwise alignments using DNAMAN. CGMMV identities were 49.89% to tobacco mosaic virus; 49.85% to pepper mild mottle virus; 50.47% to tomato mosaic virus; 60.9% to zucchini green mottle mosaic virus; and 60.96% to kyuri green mottle mosaic virus, confirming that CGMMV is a distinct species most similar to other cucurbit-infecting tobamoviruses. We further performed phylogenetic analysis to determine relationships of our new Korean CGMMV isolates to previously characterized isolates from Canada, China, India, Israel, Japan, Korea, Russia, Spain, and Taiwan available from NCBI. Analysis of CGMMV amino acid sequences showed three major clades, broadly typified as 'Russian,' 'Israeli,' and 'Asian' groups. All of our new Korean isolates fell within the 'Asian' clade. Neither the 128 nor 186 kDa RdRps of the three new isolates showed any detectable gene silencing suppressor function.

  6. Photoluminescence Enhancement of Poly(3-methylthiophene Nanowires upon Length Variable DNA Hybridization

    Directory of Open Access Journals (Sweden)

    Jingyuan Huang

    2018-01-01

    Full Text Available The use of low-dimensional inorganic or organic nanomaterials has advantages for DNA and protein recognition due to their sensitivity, accuracy, and physical size matching. In this research, poly(3-methylthiophene (P3MT nanowires (NWs are electrochemically prepared with dopant followed by functionalization with probe DNA (pDNA sequence through electrostatic interaction. Various lengths of pDNA sequences (10-, 20- and 30-mer are conjugated to the P3MT NWs respectively followed with hybridization with their complementary target DNA (tDNA sequences. The nanoscale photoluminescence (PL properties of the P3MT NWs are studied throughout the whole process at solid state. In addition, the correlation between the PL enhancement and the double helix DNA with various lengths is demonstrated.

  7. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Directory of Open Access Journals (Sweden)

    Kirkness Ewen

    2006-10-01

    Full Text Available Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromosome sequence and the search for polymorphisms therein. The genome has been only partially sequenced for one male dog, disallowing mapping of the sequence into specific chromosomes. However, by comparing the male genome sequence to the complete female dog genome sequence, candidate Y-chromosome sequence may be identified by exclusion. Results The male dog genome sequence was analysed by Blast search against the human genome to identify sequences with a best match to the human Y chromosome and to the female dog genome to identify those absent in the female genome. Candidate sequences were then tested for male specificity by PCR of five male and five female dogs. 32 sequences from the male genome, with a total length of 24 kbp, were identified as male specific, based on a match to the human Y chromosome, absence in the female dog genome and male specific PCR results. 14437 bp were then sequenced for 10 male dogs originating from Europe, Southwest Asia, Siberia, East Asia, Africa and America. Nine haplotypes were found, which were defined by 14 substitutions. The genetic distance between the haplotypes indicates that they originate from at least five wolf haplotypes. There was no obvious trend in the geographic distribution of the haplotypes. Conclusion We have identified 24159 bp of dog Y-chromosome sequence to be used for population genetic studies. We sequenced 14437 bp in a worldwide collection of dogs, identifying 14 SNPs for future SNP analyses, and

  8. DNA sequencing technologies: 2006-2016.

    Science.gov (United States)

    Mardis, Elaine R

    2017-02-01

    Recent advances in the field of genomics have largely been due to the ability to sequence DNA at increasing throughput and decreasing cost. DNA sequencing was first introduced in 1977, and next-generation sequencing technologies have been available only during the past decade, but the diverse experiments and corresponding analyses facilitated by these techniques have transformed biological and biomedical research. Here, I review developments in DNA sequencing technologies over the past 10 years and look to the future for further applications.

  9. RESEARCH ARTICLE Full length sequencing and novel ...

    Indian Academy of Sciences (India)

    Navya

    2016-12-16

    Dec 16, 2016 ... Before attempting association analyses between this gene and/or enzyme and phenotypic traits, a study on the genetic variability within this locus is required. The aim of this work was to sequence the entire coding region of. ACACA gene in Valle del Belice sheep breed in order to identify polymorphic sites.

  10. One-dimensional TRFLP-SSCP is an effective DNA fingerprinting strategy for soil Archaea that is able to simultaneously differentiate broad taxonomic clades based on terminal fragment length polymorphisms and closely related sequences based on single stranded conformation polymorphisms.

    Science.gov (United States)

    Swanson, Colby A; Sliwinski, Marek K

    2013-09-01

    DNA fingerprinting methods provide a means to rapidly compare microbial assemblages from environmental samples without the need to first cultivate species in the laboratory. The profiles generated by these techniques are able to identify statistically significant temporal and spatial patterns, correlations to environmental gradients, and biological variability to estimate the number of replicates for clone libraries or next generation sequencing (NGS) surveys. Here we describe an improved DNA fingerprinting technique that combines terminal restriction fragment length polymorphisms (TRFLP) and single stranded conformation polymorphisms (SSCP) so that both can be used to profile a sample simultaneously rather than requiring two sequential steps as in traditional two-dimensional (2-D) gel electrophoresis. For the purpose of profiling Archaeal 16S rRNA genes from soil, the dynamic range of this combined 1-D TRFLP-SSCP approach was superior to TRFLP and SSCP. 1-D TRFLP-SSCP was able to distinguish broad taxonomic clades with genetic distances greater than 10%, such as Euryarchaeota and the Thaumarchaeal clades g_Ca. Nitrososphaera (formerly 1.1b) and o_NRP-J (formerly 1.1c) better than SSCP. In addition, 1-D TRFLP-SSCP was able to simultaneously distinguish closely related clades within a genus such as s_SCA1145 and s_SCA1170 better than TRFLP. We also tested the utility of 1-D TRFLP-SSCP fingerprinting of environmental assemblages by comparing this method to the generation of a 16S rRNA clone library of soil Archaea from a restored Tallgrass prairie. This study shows 1-D TRFLP-SSCP fingerprinting provides a rapid and phylogenetically informative screen of Archaeal 16S rRNA genes in soil samples. © 2013.

  11. Leukocyte telomere length variation due to DNA extraction method.

    Science.gov (United States)

    Denham, Joshua; Marques, Francine Z; Charchar, Fadi J

    2014-12-04

    Telomere length is indicative of biological age. Shorter telomeres have been associated with several disease and health states. There are inconsistencies throughout the literature amongst relative telomere length measured by quantitative PCR (qPCR) and different extraction methods or kits used. We quantified whole-blood leukocyte telomere length using the telomere to single copy gene (T/S) ratio by qPCR in 20 young (18-25 yrs) men after extracting DNA using three common extraction methods: Lahiri and Nurnberger (high salt) method, PureLink Genomic DNA Mini kit (Life Technologies) and QiaAmp DNA Mini kit (Qiagen). Telomere length differences of DNA extracted from the three extraction methods was assessed by one-way analysis of variance (ANOVA). DNA purity differed between extraction methods used (P=0.01). Telomere length was impacted by the DNA extraction method used (P=0.01). Telomeres extracted using the Lahiri and Nurnberger method (mean T/S ratio: 2.43, range: 1.57-3.02) and PureLink Genomic DNA Mini Kit (mean T/S ratio: 2.57, range: 2.24-2.80) did not differ (P=0.13). Likewise, QiaAmp and Purelink-extracted telomeres were not statistically different (P=0.14). The Lahiri-extracted telomeres, however, were significantly shorter than those extracted using the QiaAmp DNA Mini Kit (mean T/S ratio: 2.71, range: 2.32-3.02; P=0.003). DNA purity was associated with telomere length. There are discrepancies between the length of leukocyte telomeres extracted from the same individuals according to the DNA extraction method used. DNA purity could be responsible for the discrepancy in telomere length but this will require validation studies. We recommend using the same DNA extraction kit when quantifying leukocyte telomere length by qPCR or when comparing different cohorts to avoid erroneous associations between telomere length and traits of interest.

  12. Characterization of North American Armillaria species: Genetic relationships determined by ribosomal DNA sequences and AFLP markers

    Science.gov (United States)

    M. -S. Kim; N. B. Klopfenstein; J. W. Hanna; G. I. McDonald

    2006-01-01

    Phylogenetic and genetic relationships among 10 North American Armillaria species were analysed using sequence data from ribosomal DNA (rDNA), including intergenic spacer (IGS-1), internal transcribed spacers with associated 5.8S (ITS + 5.8S), and nuclear large subunit rDNA (nLSU), and amplified fragment length polymorphism (AFLP) markers. Based on rDNA sequence data,...

  13. Levenshtein error-correcting barcodes for multiplexed DNA sequencing.

    Science.gov (United States)

    Buschmann, Tilo; Bystrykh, Leonid V

    2013-09-11

    High-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called multiplexing approach relies on a specific DNA tag or barcode that is attached to the sequencing or amplification primer and hence appears at the beginning of the sequence in every read. After sequencing, each sample read is identified on the basis of the respective barcode sequence.Alterations of DNA barcodes during synthesis, primer ligation, DNA amplification, or sequencing may lead to incorrect sample identification unless the error is revealed and corrected. This can be accomplished by implementing error correcting algorithms and codes. This barcoding strategy increases the total number of correctly identified samples, thus improving overall sequencing efficiency. Two popular sets of error-correcting codes are Hamming codes and Levenshtein codes. Levenshtein codes operate only on words of known length. Since a DNA sequence with an embedded barcode is essentially one continuous long word, application of the classical Levenshtein algorithm is problematic. In this paper we demonstrate the decreased error correction capability of Levenshtein codes in a DNA context and suggest an adaptation of Levenshtein codes that is proven of efficiently correcting nucleotide errors in DNA sequences. In our adaption we take the DNA context into account and redefine the word length whenever an insertion or deletion is revealed. In simulations we show the superior error correction capability of the new method compared to traditional Levenshtein and Hamming based codes in the presence of multiple errors. We present an adaptation of Levenshtein codes to DNA contexts capable of correction of a pre-defined number of insertion, deletion, and substitution mutations. Our improved method is additionally capable

  14. Method for sequencing DNA base pairs

    Science.gov (United States)

    Sessler, Andrew M.; Dawson, John

    1993-01-01

    The base pairs of a DNA structure are sequenced with the use of a scanning tunneling microscope (STM). The DNA structure is scanned by the STM probe tip, and, as it is being scanned, the DNA structure is separately subjected to a sequence of infrared radiation from four different sources, each source being selected to preferentially excite one of the four different bases in the DNA structure. Each particular base being scanned is subjected to such sequence of infrared radiation from the four different sources as that particular base is being scanned. The DNA structure as a whole is separately imaged for each subjection thereof to radiation from one only of each source.

  15. DNA Sequencing in Undergraduate Laboratory Courses.

    Science.gov (United States)

    Hamilton, Robert G.

    1997-01-01

    Discusses strategies to duplicate current research protocols using biochemical methods of analysis. Describes the use of the Silver Sequence kit that provides a technically simple and relatively inexpensive DNA sequencing exercise. (JRH)

  16. "First generation" automated DNA sequencing technology.

    Science.gov (United States)

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  17. Continuous flow thermal cycler microchip for DNA cycle sequencing.

    Science.gov (United States)

    Wang, Hong; Chen, Jifeng; Zhu, Li; Shadpour, Hamed; Hupert, Mateusz L; Soper, Steven A

    2006-09-01

    We report here on the use of a polymer-based continuous flow thermal cycler (CFTC) microchip for Sanger cycle sequencing using dye terminator chemistry. The CFTC chip consisted of a 20-loop spiral microfluidic channel hot-embossed into polycarbonate (PC) that had three well-defined temperature zones poised at 95, 55, and 60 degrees C for denaturation, renaturation, and DNA extension, respectively. The sequencing cocktail was hydrodynamically pumped through the microreactor channel at different linear velocities ranging from 1 to 12 mm/s. At a linear velocity of 4 mm/s resulting in a 36-s extension time, a read length of >600 bp could be obtained in a total reaction time of 14.6 min. Further increases in the flow rate resulted in a reduction in the total reaction time but also produced a decrease in the sequencing read length. The CFTC chip could be reused for subsequent sequencing runs (>30) with negligible amounts of carryover contamination or degradation in the sequencing read length. The CFTC microchip was subsequently coupled to a solid-phase reversible immobilization (SPRI) microchip made from PC for purification of the DNA sequencing ladders (i.e., removal of excess dye-labeled dideoxynucleotides, DNA template, and salts) prior to gel electrophoresis. Coupling of the CFTC chip to the SPRI microchip showed read lengths similar to that obtained from benchtop instruments but did not require manual manipulation of the cycle sequencing reactions following amplification.

  18. DNA Length Modulates the Affinity of Fragments of Genomic DNA for the Nuclear Matrix In Vitro.

    Science.gov (United States)

    García-Vilchis, David; Aranda-Anzaldo, Armando

    2017-12-01

    Classical observations have shown that during the interphase the chromosomal DNA of metazoans is organized in supercoiled loops attached to a compartment known as the nuclear matrix (NM). Fragments of chromosomal DNA able to bind the isolated NM in vitro are known as matrix associated/attachment/addressed regions or MARs. No specific consensus sequence or motif has been found that may constitute a universal, defining feature of MARs. On the other hand, high-salt resistant DNA-NM interactions in situ define true DNA loop anchorage regions or LARs, that might correspond to a subset of the potential MARs but are not necessarily identical to MARs characterized in vitro, since there are several examples of MARs able to bind the NM in vitro but which are not actually bound to the NM in situ. In the present work we assayed the capacity of two LARs, as well as of shorter fragments within such LARs, for binding to the NM in vitro. Paradoxically the isolated (≈2 kb) LARs cannot bind to the NM in vitro while their shorter (≈300 pb) sub-fragments and other non-related but equally short DNA fragments, bind to the NM in a high-salt resistant fashion. Our results suggest that the ability of a given DNA fragment for binding to the NM in vitro primarily depends on the length of the fragment, suggesting that binding to the NM is modulated by the local topology of the DNA fragment in suspension that it is known to depend on the DNA length. J. Cell. Biochem. 118: 4487-4497, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  19. Construction and analysis of full-length and normalized cDNA libraries from citrus.

    Science.gov (United States)

    Marques, M Carmen; Perez-Amador, Miguel A

    2012-01-01

    We have developed an integrated method to generate a normalized cDNA collection enriched in full-length and rare transcripts from citrus, using different species and multiple tissues and developmental stages. Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. In this regard, the availability of full-length cDNA clones facilitates functional analysis of the corresponding genes enabling manipulation of their expression and the generation of a variety of tagged versions of the native protein. The development of full-length cDNA sequences has the power to improve the quality of genome annotation, as well as provide tools for functional characterization of genes.

  20. Construction and analysis of full-length cDNA library of Cryptosporidium parvum.

    Science.gov (United States)

    Yamagishi, Junya; Wakaguri, Hiroyuki; Sugano, Sumio; Kawano, Suguru; Fujisaki, Kozo; Sugimoto, Chihiro; Watanabe, Junichi; Suzuki, Yutaka; Kimata, Isao; Xuan, Xuenan

    2011-06-01

    A full-length cDNA library was constructed from the sporozoite of Cryptosporidium parvum. Normalized clones were subjected to Solexa shotgun sequencing, and then complete sequences for 1066 clones were reconfigured. Detailed analyses of the sequences revealed that 13.5% of the transcripts were spliced; the average and median 5' UTR lengths were 213.5 and 122 nucleotides, respectively. There were 148 inconsistencies out of 562 examined genes between the experimentally described cDNA sequence and the predicted sequence from its genome. In addition, we identified 118 sequences that had little homology against annotated genes of C. parvum as prospective candidates for addable genes. These observations should improve the reliability of C. parvum transcriptome and provide a versatile resource for further studies. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  1. mtDNA point and length heteroplasmy in high- and low radiation areas of Kerala

    International Nuclear Information System (INIS)

    Forster, L.; Forster, P.; Gurney, S.M.; Spencer, M.; Huang, C.; Röhl, A.; Brinkmann, B.

    2010-01-01

    A coastal peninsula in Kerala (India) contains the world's highest level of natural radioactivity in a densely populated area, offering an opportunity to characterize radiation-associated DNA mutations. Here, we focus on mitochondrial DNA (mtDNA) mutations, which are passed exclusively from the mother to her children. To analyse point mutations, we sampled 248 pedigrees (988 individuals) in the high-radiation peninsula and in nearby low-radiation islands as a control population. Then, in an extended sample of 1,172 mtDNA sequences (containing some non-Indians for comparison), we also analysed length mutations, which in mtDNA can lead to the phenomenon of length heteroplasmy, i.e. the existence of different DNA types in the same cell. We wished to find out how fast mtDNA mutates between generations, and whether the mutation rate is increased in radioactive conditions compared to the low-irradiation sample

  2. Multiple tag labeling method for DNA sequencing

    Science.gov (United States)

    Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.

    1995-01-01

    A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.

  3. Nuclear DNA sequences from late Pleistocene megafauna.

    Science.gov (United States)

    Greenwood, A D; Capelli, C; Possnert, G; Pääbo, S

    1999-11-01

    We report the retrieval and characterization of multi- and single-copy nuclear DNA sequences from Alaskan and Siberian mammoths (Mammuthus primigenius). In addition, a nuclear copy of a mitochondrial gene was recovered. Furthermore, a 13,000-year-old ground sloth and a 33,000-year-old cave bear yielded multicopy nuclear DNA sequences. Thus, multicopy and single-copy genes can be analyzed from Pleistocene faunal remains. The results also show that under some circumstances, nucleotide sequence differences between alleles found within one individual can be distinguished from DNA sequence variation caused by postmortem DNA damage. The nuclear sequences retrieved from the mammoths suggest that mammoths were more similar to Asian elephants than to African elephants.

  4. Directed PCR-free engineering of highly repetitive DNA sequences

    Directory of Open Access Journals (Sweden)

    Preissler Steffen

    2011-09-01

    Full Text Available Abstract Background Highly repetitive nucleotide sequences are commonly found in nature e.g. in telomeres, microsatellite DNA, polyadenine (poly(A tails of eukaryotic messenger RNA as well as in several inherited human disorders linked to trinucleotide repeat expansions in the genome. Therefore, studying repetitive sequences is of biological, biotechnological and medical relevance. However, cloning of such repetitive DNA sequences is challenging because specific PCR-based amplification is hampered by the lack of unique primer binding sites resulting in unspecific products. Results For the PCR-free generation of repetitive DNA sequences we used antiparallel oligonucleotides flanked by restriction sites of Type IIS endonucleases. The arrangement of recognition sites allowed for stepwise and seamless elongation of repetitive sequences. This facilitated the assembly of repetitive DNA segments and open reading frames encoding polypeptides with periodic amino acid sequences of any desired length. By this strategy we cloned a series of polyglutamine encoding sequences as well as highly repetitive polyadenine tracts. Such repetitive sequences can be used for diverse biotechnological applications. As an example, the polyglutamine sequences were expressed as His6-SUMO fusion proteins in Escherichia coli cells to study their aggregation behavior in vitro. The His6-SUMO moiety enabled affinity purification of the polyglutamine proteins, increased their solubility, and allowed controlled induction of the aggregation process. We successfully purified the fusions proteins and provide an example for their applicability in filter retardation assays. Conclusion Our seamless cloning strategy is PCR-free and allows the directed and efficient generation of highly repetitive DNA sequences of defined lengths by simple standard cloning procedures.

  5. An integer programming approach to DNA sequence assembly.

    Science.gov (United States)

    Chang, Youngjung; Sahinidis, Nikolaos V

    2011-08-10

    De novo sequence assembly is a ubiquitous combinatorial problem in all DNA sequencing technologies. In the presence of errors in the experimental data, the assembly problem is computationally challenging, and its solution may not lead to a unique reconstruct. The enumeration of all alternative solutions is important in drawing a reliable conclusion on the target sequence, and is often overlooked in the heuristic approaches that are currently available. In this paper, we develop an integer programming formulation and global optimization solution strategy to solve the sequence assembly problem with errors in the data. We also propose an efficient technique to identify all alternative reconstructs. When applied to examples of sequencing-by-hybridization, our approach dramatically increases the length of DNA sequences that can be handled with global optimality certificate to over 10,000, which is more than 10 times longer than previously reported. For some problem instances, alternative solutions exhibited a wide range of different ability in reproducing the target DNA sequence. Therefore, it is important to utilize the methodology proposed in this paper in order to obtain all alternative solutions to reliably infer the true reconstruct. These alternative solutions can be used to refine the obtained results and guide the design of further experiments to correctly reconstruct the target DNA sequence. Copyright © 2011 Elsevier Ltd. All rights reserved.

  6. Blind sequence-length estimation of low-SNR cyclostationary sequences

    CSIR Research Space (South Africa)

    Vlok, JD

    2014-06-01

    Full Text Available performance bound Estimation algorithm 1 takes the index k of the maximum value of the mean-square correlation sequence ρ(k) as the estimated sequence length Nest. The sequence length will therefore be estimated correctly if the peak of ρ(k) is located at k... the estimated sequence length Nest, and technique 1 can therefore only provide the correct answer as long as k = N is considered within the range of k. The positions of segments within the intercepted signal and the value of L will also influence the performance...

  7. An approach to sequence DNA without tagging

    Science.gov (United States)

    Niu, Sanjun; Saraf, Ravi F.

    2002-10-01

    Microarray technology is playing an increasingly important role in biology and medicine and its application to genomics for gene expression analysis has already reached the market with a variety of commercially available instruments. In these combinatorial analysis methods, known probe single-strand DNA (ssDNA) 'primers' are attached in clusters of typically 100 µm × 100 µm pixels. Each pixel of the array has a slightly different sequence. On exposure to 'unknown' target ssDNA, the pixels with the right complementary probe ssDNA sequence convert to double-stranded DNA (dsDNA) by a hybridization reaction. To transduct the conversion of the pixel to dsDNA, the target ssDNA is labelled with a photoluminescent tag during the polymerase chain reaction (PCR) amplification process. Due to the statistical distribution of the tags in the target ssDNA, it becomes significantly difficult to implement these methods as a diagnostic tool in a pathology laboratory. A method to sequence DNA without tagging the molecule is developed. The fabrication process is compatible with current microelectronics and (emerging) soft-material fabrication technologies, allowing the method to be integrable with micro-electromechanical systems (MEMS) and lab-on-a-chip devices. An estimated sensitivity of 10-12 g on a 1 cm2 device area is obtained.

  8. Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs.

    Directory of Open Access Journals (Sweden)

    Carol Soderlund

    2009-11-01

    Full Text Available Full-length cDNA (FLcDNA sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 5' and 3' UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs, only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org.

  9. [Construction and identification of a full-length cDNA library from Spirometra erinaceieuropaei].

    Science.gov (United States)

    Lv, Gang; Lu, Ya-Jun; Fan, Zhi-Gang; Shi, Da-Zhong; Gan, Xiu-Feng; Zhong, Sai-Feng

    2010-10-30

    The full-length pBluescript II SK cDNA library of adult Spirometra erinaceieuropaei was constructed by using the SMART method. Data showed that 95.5% of the library was recombinant and the titer of the library was 1.06 x 10(6). The average insert size of the library was about 1.4 kb. Forty-eight randomly selected clones were sequenced. A set of 36 effective expressed sequence tags (ESTs) with the average size of 674 bp was obtained after excluding clones shorter than 450 bp. The unigenes occupied 58.3% of the 36 ESTs. The rate of full-length cDNAs were 57.7% (15/26). The high-quality of full-length cDNA library could be used for large scale EST sequencing.

  10. Chromatid interchanges at intrachromosomal telomeric DNA sequences

    International Nuclear Information System (INIS)

    Fernandez, J.L.; Vazquez-Gundin, F.; Bilbao, A.; Gosalvez, J.; Goyanes, V.

    1997-01-01

    Chinese hamster Don cells were exposed to X-rays, mitomycin C and teniposide (VM-26) to induce chromatid exchanges (quadriradials and triradials). After fluorescence in situ hybridization (FISH) of telomere sequences it was found that interstitial telomere-like DNA sequence arrays presented around five times more breakage-rearrangements than the genome overall. This high recombinogenic capacity was independent of the clastogen, suggesting that this susceptibility is not related to the initial mechanisms of DNA damage. (author)

  11. Mitochondrial DNA sequence variation in Hippopotamus amphibius ...

    African Journals Online (AJOL)

    Mitochondrial DNA sequence variation in Hippopotamus amphibius from Kruger National Park, Republic of South Africa. ... A test of the hypothesis that calves are more likely to share a mtDNA haplotype with an adult female in the same herd than an adult female from a different herd was not significant. Keywords: ...

  12. Mitochondrial DNA sequence evolution in shorebird populations

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons

  13. Nonmonotonic DNA-length-dependent mobility in pluronic gels

    Science.gov (United States)

    You, Seungyong; Wei, Ling; Shanbhag, Sachin; Van Winkle, David H.

    2017-04-01

    Two-dimensional electrophoresis was used to analyze the mobility of DNA fragments in micellar gels of pluronic F127 (E O100P O70E O100 ) and pluronic P123 (E O20P O70E O20 ) . The 20-3500 base pair DNA fragments were separated by size first in agarose gels, and then in pluronic gels at room temperature. In agarose gels, the DNA mobility decreases monotonically with increasing DNA length. In pluronic gels, however, the mobility varies nonmonotonically according to fragment lengths that are strongly correlated with the diameter of the spherical micelles. Brownian dynamics (BD) simulations with short-ranged intra-DNA hydrodynamic interactions were performed to numerically calculate the length-dependent mobility in pluronic lattices. The rising and falling trends, as well as the oscillations of mobility, were captured by the coarse-grained BD simulations. Molecular dynamics simulations in pluronic F127, with explicitly modeled micelle coronas, justified that the hydrodynamic interactions mediated by the complex fluid of hydrated poly(ethylene oxide) are a possible reason for the initial rise of mobility with DNA length.

  14. Full-length sequencing and identification of novel polymorphisms in ...

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 96; Issue 4. Full-length sequencing and identification of novel polymorphisms in the ACACA gene of Valle del Belice sheep breed. ROSALIA DI GERLANDO SALVATORE MASTRANGELO LINA TORTORICI MARCO TOLONE ANNA MARIA SUTERA MARIA TERESA SARDINA ...

  15. Full-length sequencing and identification of novel polymorphisms in ...

    Indian Academy of Sciences (India)

    Rosalia Di Gerlando

    2017-08-16

    Aug 16, 2017 ... Full-length sequencing and identification of novel polymorphisms in the ACACA gene of Valle del Belice sheep breed. ROSALIA DI GERLANDO, SALVATORE MASTRANGELO, LINA TORTORICI, MARCO TOLONE,. ANNA MARIA SUTERA, MARIA TERESA SARDINA. ∗ and BALDASSARE PORTOLANO.

  16. On site DNA barcoding by nanopore sequencing.

    Directory of Open Access Journals (Sweden)

    Michele Menegon

    Full Text Available Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet's biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities.

  17. DNA sequencing using fluorescence background electroblotting membrane

    Science.gov (United States)

    Caldwell, K.D.; Chu, T.J.; Pitt, W.G.

    1992-05-12

    A method for the multiplex sequencing on DNA is disclosed which comprises the electroblotting or specific base terminated DNA fragments, which have been resolved by gel electrophoresis, onto the surface of a neutral non-aromatic polymeric microporous membrane exhibiting low background fluorescence which has been surface modified to contain amino groups. Polypropylene membranes are preferably and the introduction of amino groups is accomplished by subjecting the membrane to radio or microwave frequency plasma discharge in the presence of an aminating agent, preferably ammonia. The membrane, containing physically adsorbed DNA fragments on its surface after the electroblotting, is then treated with crosslinking means such as UV radiation or a glutaraldehyde spray to chemically bind the DNA fragments to the membrane through amino groups contained on the surface. The DNA fragments chemically bound to the membrane are subjected to hybridization probing with a tagged probe specific to the sequence of the DNA fragments. The tagging may be by either fluorophores or radioisotopes. The tagged probes hybridized to the target DNA fragments are detected and read by laser induced fluorescence detection or autoradiograms. The use of aminated low fluorescent background membranes allows the use of fluorescent detection and reading even when the available amount of DNA to be sequenced is small. The DNA bound to the membranes may be reprobed numerous times. No Drawings

  18. DNA sequencing using fluorescence background electroblotting membrane

    Science.gov (United States)

    Caldwell, Karin D.; Chu, Tun-Jen; Pitt, William G.

    1992-01-01

    A method for the multiplex sequencing on DNA is disclosed which comprises the electroblotting or specific base terminated DNA fragments, which have been resolved by gel electrophoresis, onto the surface of a neutral non-aromatic polymeric microporous membrane exhibiting low background fluorescence which has been surface modified to contain amino groups. Polypropylene membranes are preferably and the introduction of amino groups is accomplished by subjecting the membrane to radio or microwave frequency plasma discharge in the presence of an aminating agent, preferably ammonia. The membrane, containing physically adsorbed DNA fragments on its surface after the electroblotting, is then treated with crosslinking means such as UV radiation or a glutaraldehyde spray to chemically bind the DNA fragments to the membrane through said smino groups contained on the surface thereof. The DNA fragments chemically bound to the membrane are subjected to hybridization probing with a tagged probe specific to the sequence of the DNA fragments. The tagging may be by either fluorophores or radioisotopes. The tagged probes hybridized to said target DNA fragments are detected and read by laser induced fluorescence detection or autoradiograms. The use of aminated low fluorescent background membranes allows the use of fluorescent detection and reading even when the available amount of DNA to be sequenced is small. The DNA bound to the membrances may be reprobed numerous times.

  19. Nanogrid rolling circle DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Church, George M.; Porreca, Gregory J.; Shendure, Jay; Rosenbaum, Abraham Meir

    2017-04-18

    The present invention relates to methods for sequencing a polynucleotide immobilized on an array having a plurality of specific regions each having a defined diameter size, including synthesizing a concatemer of a polynucleotide by rolling circle amplification, wherein the concatemer has a cross-sectional diameter greater than the diameter of a specific region, immobilizing the concatemer to the specific region to make an immobilized concatemer, and sequencing the immobilized concatemer.

  20. Studies of DNA dumbbells VIII. Melting analysis of DNA dumbbells with dinucleotide repeat stem sequences.

    Science.gov (United States)

    Mandell, Kathleen E; Vallone, Peter M; Owczarzy, Richard; Riccelli, Peter V; Benight, Albert S

    2006-06-15

    Melting curves and circular dichroism spectra were measured for a number of DNA dumbbell and linear molecules containing dinucleotide repeat sequences of different lengths. To study effects of different sequences on the melting and spectroscopic properties, six DNA dumbbells whose stems contain the central sequences (AA)(10), (AC)(10), (AG)(10), (AT)(10), (GC)(10), and (GG)(10) were prepared. These represent the minimal set of 10 possible dinucleotide repeats. To study effects of dinucleotide repeat length, dumbbells with the central sequences (AG)(n), n = 5 and 20, were prepared. Control molecules, dumbbells with a random central sequence, (RN)(n), n = 5, 10, and 20, were also prepared. The central sequence of each dumbbell was flanked on both sides by the same 12 base pairs and T(4) end-loops. Melting curves were measured by optical absorbance and differential scanning calorimetry in solvents containing 25, 55, 85, and 115 mM Na(+). CD spectra were collected from 20 to 45 degrees C and [Na(+)] from 25 to 115 mM. The spectral database did not reveal any apparent temperature dependence in the pretransition region. Analysis of the melting thermodynamics evaluated as a function of Na(+) provided a means for quantitatively estimating the counterion release with melting for the different sequences. Results show a very definite sequence dependence, indicating the salt-dependent properties of duplex DNA are also sequence dependent. Linear DNA molecules containing the (AG)(n) and (RN)(n), sequences, n = 5, 10, 20, and 30, were also prepared and studied. The linear DNA molecules had the exact sequences of the dumbbell stems. That is, the central repeat sequence in each linear duplex was flanked on both sides by the same 12-bp sequence. Melting and CD studies were also performed on the linear DNA molecules. Comparison of results obtained for the same sequences in dumbbell and linear molecular environments reveals several interesting features of the interplay between

  1. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  2. Nanopore-CMOS Interfaces for DNA Sequencing.

    Science.gov (United States)

    Magierowski, Sebastian; Huang, Yiyun; Wang, Chengjie; Ghafar-Zadeh, Ebrahim

    2016-08-06

    DNA sequencers based on nanopore sensors present an opportunity for a significant break from the template-based incumbents of the last forty years. Key advantages ushered by nanopore technology include a simplified chemistry and the ability to interface to CMOS technology. The latter opportunity offers substantial promise for improvement in sequencing speed, size and cost. This paper reviews existing and emerging means of interfacing nanopores to CMOS technology with an emphasis on massively-arrayed structures. It presents this in the context of incumbent DNA sequencing techniques, reviews and quantifies nanopore characteristics and models and presents CMOS circuit methods for the amplification of low-current nanopore signals in such interfaces.

  3. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  4. A novel method for comparative analysis of DNA sequences by Ramanujan-Fourier transform.

    Science.gov (United States)

    Yin, Changchuan; Yin, Xuemeng E; Wang, Jiasong

    2014-12-01

    Alignment-free sequence analysis approaches provide important alternatives over multiple sequence alignment (MSA) in biological sequence analysis because alignment-free approaches have low computation complexity and are not dependent on high level of sequence identity. However, most of the existing alignment-free methods do not employ true full information content of sequences and thus can not accurately reveal similarities and differences among DNA sequences. We present a novel alignment-free computational method for sequence analysis based on Ramanujan-Fourier transform (RFT), in which complete information of DNA sequences is retained. We represent DNA sequences as four binary indicator sequences and apply RFT on the indicator sequences to convert them into frequency domain. The Euclidean distance of the complete RFT coefficients of DNA sequences are used as similarity measures. To address the different lengths of RFT coefficients in Euclidean space, we pad zeros to short DNA binary sequences so that the binary sequences equal the longest length in the comparison sequence data. Thus, the DNA sequences are compared in the same dimensional frequency space without information loss. We demonstrate the usefulness of the proposed method by presenting experimental results on hierarchical clustering of genes and genomes. The proposed method opens a new channel to biological sequence analysis, classification, and structural module identification.

  5. Trehalose as a good candidate for enriching full-length cDNAs in cDNA library construction.

    Science.gov (United States)

    Chen, Lei; Cao, Lixue; Zhou, Longhai; Jing, Yudong; Chen, Zuozhou; Deng, Cheng; Shen, Yu; Chen, Liangbiao

    2007-01-10

    It has been reported that the disaccharide trehalose is capable of increasing the thermostability and thermoactivity of reverse transcriptase, and therefore improving the length of cDNA synthesis. However, no test has been done on how the disaccharide trehalose performs in the context of the entire cDNA synthesis processes, or whether it can seamlessly integrate into the commercially available cDNA synthesis kit. In this report, we optimized a protocol to incorporate trehalose in the Stratagene's cDNA library construction kit in order to demonstrate great improvement in cDNA's length (average length of 1.8 kb in the trehalose group versus 1.0 kb in the control). Sequence analysis of the cDNA clones showed that the addition of trehalose did not increase the error rate of the RT products but greatly increase the quantity of full-length in cDNA library.

  6. A comprehensive deep sequencing strategy for full-length genomes of influenza A.

    Directory of Open Access Journals (Sweden)

    Dirk Höper

    Full Text Available Driven by the impact of influenza A viruses on human and animal health, much research is conducted on this pathogen. To support this research, we designed an all influenza A-embracing reverse transcription-PCR (RT-PCR for the generation of DNA from influenza A virus negative strand RNA genome segments for full-length genome deep sequencing on a Genome Sequencer FLX instrument. For high reliability, the RT-PCRs are designed such that every genome segment is divided into two amplicons and for the most variable segments redundancy is included. Moreover, to minimize the risk of contamination of diagnostic real-time PCRs by sequencing amplicons, RT-PCR does not generate amplicons that are amenable to RT-qPCR detection. With the presented protocol we were able to generate virtually all amplicons (99.3% success rate from isolates representing all so far known 16 hemagglutinin and 9 neuraminidase subtypes and from an additional 2009 pandemic influenza A H1N1 virus. Three isolates were sequenced to analyze the suitability of the DNA for sequencing. Moreover, we provide a short R script that disambiguates the sequences of the primers used. We show that using unambiguous primer sequences for read trimming prior to assembly with the genome sequencer assembler software results in higher quality of the final genome sequences. Using the disambiguated primer sequences, high quality full-length sequences for the three isolates used for sequencing trials could be established from the raw data in de novo assemblies.

  7. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  8. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules.

    Science.gov (United States)

    Mayjonade, Baptiste; Gouzy, Jérôme; Donnadieu, Cécile; Pouilly, Nicolas; Marande, William; Callot, Caroline; Langlade, Nicolas; Muños, Stéphane

    2016-10-01

    De novo sequencing of complex genomes is one of the main challenges for researchers seeking high-quality reference sequences. Many de novo assemblies are based on short reads, producing fragmented genome sequences. Third-generation sequencing, with read lengths >10 kb, will improve the assembly of complex genomes, but these techniques require high-molecular-weight genomic DNA (gDNA), and gDNA extraction protocols used for obtaining smaller fragments for short-read sequencing are not suitable for this purpose. Methods of preparing gDNA for bacterial artificial chromosome (BAC) libraries could be adapted, but these approaches are time-consuming, and commercial kits for these methods are expensive. Here, we present a protocol for rapid, inexpensive extraction of high-molecular-weight gDNA from bacteria, plants, and animals. Our technique was validated using sunflower leaf samples, producing a mean read length of 12.6 kb and a maximum read length of 80 kb.

  9. Anchoring a Defined Sequence to the 55' Ends of mRNAs : The Bolt to Clone Rare Full Length mRNAs and Generate cDNA Libraries porn a Few Cells.

    Science.gov (United States)

    Baptiste, J; Milne Edwards, D; Delort, J; Mallet, J

    1993-01-01

    Among numerous applications, the polymerase chain reaction (PCR) (1,2) provides a convenient means to clone 5' ends of rare mRNAs and to generate cDNA libraries from tissue available in amounts too low to be processed by conventional methods. Basically, the amplification of cDNAs by the PCR requires the availability of the sequences of two stretches of the molecule to be amplified. A sequence can easily be imposed at the 5' end of the first-strand cDNAs (corresponding to the 3' end of the mRNAs) by priming the reverse transcription with a specific primer (for cloning the 5' end of rare messenger) or with an oligonucleotide tailored with a poly (dT) stretch (for cDNA library construction), taking advantage of the poly (A) sequence that is located at the 3' end of mRNAs. Several strategies have been devised to tag the 3' end of the ss-cDNAs (corresponding to the 55' end of the mRNAs). We (3) and others have described strategies based on the addition of a homopolymeric dG (4,5) or dA (6,7) tail using terminal deoxyribonucleotide transferase (TdT) ("anchor-PCR" [4]). However, this strategy has important limitations. The TdT reaction is difficult to control and has a low efficiency (unpublished observations). But most importantly, the return primers containing a homopolymeric (dC or dT) tail generate nonspecific amplifications, a phenomenon that prevents the isolation of low abundance mRNA species and/or interferes with the relative abundance of primary clones in the library. To circumvent these drawbacks, we have used two approaches. First, we devised a strategy based on a cRNA enrichment procedure, which has been useful to eliminate nonspecific-PCR products and to allow detection and cloning of cDNAs of low abundance (3). More recently, to avoid the nonspecific amplification resulting from the annealing of the homopolymeric tail oligonucleotide, we have developed a novel anchoring strategy that is based on the ligation of an oligonucleotide to the 35' end of ss

  10. Substrate and target sequence length influence RecTE(Psy recombineering efficiency in Pseudomonas syringae.

    Directory of Open Access Journals (Sweden)

    Zhongmeng Bao

    Full Text Available We are developing a new recombineering system to assist experimental manipulation of the Pseudomonas syringae genome. P. syringae is a globally dispersed plant pathogen and an important model species used to study the molecular biology of bacteria-plant interactions. We previously identified orthologs of the lambda Red bet/exo and Rac recET genes in P. syringae and confirmed that they function in recombineering using ssDNA and dsDNA substrates. Here we investigate the properties of dsDNA substrates more closely to determine how they influence recombineering efficiency. We find that the length of flanking homologies and length of the sequences being inserted or deleted have a large effect on RecTE(Psy mediated recombination efficiency. These results provide information about the design elements that should be considered when using recombineering.

  11. DNA Sequencing in Cultural Heritage.

    Science.gov (United States)

    Vai, Stefania; Lari, Martina; Caramelli, David

    2016-02-01

    During the last three decades, DNA analysis on degraded samples revealed itself as an important research tool in anthropology, archaeozoology, molecular evolution, and population genetics. Application on topics such as determination of species origin of prehistoric and historic objects, individual identification of famous personalities, characterization of particular samples important for historical, archeological, or evolutionary reconstructions, confers to the paleogenetics an important role also for the enhancement of cultural heritage. A really fast improvement in methodologies in recent years led to a revolution that permitted recovering even complete genomes from highly degraded samples with the possibility to go back in time 400,000 years for samples from temperate regions and 700,000 years for permafrozen remains and to analyze even more recent material that has been subjected to hard biochemical treatments. Here we propose a review on the different methodological approaches used so far for the molecular analysis of degraded samples and their application on some case studies.

  12. DNA origami-based nanoribbons: assembly, length distribution, and twist

    International Nuclear Information System (INIS)

    Jungmann, Ralf; Scheible, Max; Kuzyk, Anton; Pardatscher, Guenther; Simmel, Friedrich C; Castro, Carlos E

    2011-01-01

    A variety of polymerization methods for the assembly of elongated nanoribbons from rectangular DNA origami structures are investigated. The most efficient method utilizes single-stranded DNA oligonucleotides to bridge an intermolecular scaffold seam between origami monomers. This approach allows the fabrication of origami ribbons with lengths of several micrometers, which can be used for long-range ordered arrangement of proteins. It is quantitatively shown that the length distribution of origami ribbons obtained with this technique follows the theoretical prediction for a simple linear polymerization reaction. The design of flat single layer origami structures with constant crossover spacing inevitably results in local underwinding of the DNA helix, which leads to a global twist of the origami structures that also translates to the nanoribbons.

  13. DNA origami-based nanoribbons: assembly, length distribution, and twist

    Energy Technology Data Exchange (ETDEWEB)

    Jungmann, Ralf; Scheible, Max; Kuzyk, Anton; Pardatscher, Guenther; Simmel, Friedrich C [Lehrstuhl fuer Bioelektronik, Physik-Department and ZNN/WSI, Technische Universitaet Muenchen, Am Coulombwall 4a, 85748 Garching (Germany); Castro, Carlos E, E-mail: simmel@ph.tum.de [Labor fuer Biomolekulare Nanotechnologie, Physik-Department and ZNN/WSI, Technische Universitaet Muenchen, Am Coulombwall 4a, 85748 Garching (Germany)

    2011-07-08

    A variety of polymerization methods for the assembly of elongated nanoribbons from rectangular DNA origami structures are investigated. The most efficient method utilizes single-stranded DNA oligonucleotides to bridge an intermolecular scaffold seam between origami monomers. This approach allows the fabrication of origami ribbons with lengths of several micrometers, which can be used for long-range ordered arrangement of proteins. It is quantitatively shown that the length distribution of origami ribbons obtained with this technique follows the theoretical prediction for a simple linear polymerization reaction. The design of flat single layer origami structures with constant crossover spacing inevitably results in local underwinding of the DNA helix, which leads to a global twist of the origami structures that also translates to the nanoribbons.

  14. Probing the Conformational Distributions of Sub-Persistence Length DNA

    Energy Technology Data Exchange (ETDEWEB)

    Mastroianni, Alexander; Sivak, David; Geissler, Phillip; Alivisatos, Paul

    2009-06-08

    We have measured the bending elasticity of short double-stranded DNA (dsDNA) chains through small-angle X-ray scattering from solutions of dsDNA-linked dimers of gold nanoparticles. This method, which does not require exertion of external forces or binding to a substrate, reports on the equilibrium distribution of bending fluctuations, not just an average value (as in ensemble FRET) or an extreme value (as in cyclization), and in principle provides a more robust data set for assessing the suitability of theoretical models. Our experimental results for dsDNA comprising 42-94 basepairs (bp) are consistent with a simple worm-like chain model of dsDNA elasticity, whose behavior we have determined from Monte Carlo simulations that explicitly represent nanoparticles and their alkane tethers. A persistence length of 50 nm (150 bp) gave a favorable comparison, consistent with the results of single-molecule force-extension experiments on much longer dsDNA chains, but in contrast to recent suggestions of enhanced flexibility at these length scales.

  15. Tandemly repeated sequence in 5'end of mtDNA control region of ...

    African Journals Online (AJOL)

    Extensive length variability was observed in 5' end sequence of the mitochondrial DNA control region of the Japanese Spanish mackerel (Scomberomorus niphonius). This length variability was due to the presence of varying numbers of a 56-bp tandemly repeated sequence and a 46-bp insertion/deletion (indel).

  16. Scintillating optical fiber detectors for DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Bendali, M.; Mastrippolito, R.; Charon, Y.; Leblanc, M.; Tricoire, H.; Valentin, L. (Inst. de Physique Nucleaire, 91 - Orsay (France) Lab. de Physique Nucleaire, Univ. Paris 7, 75 (France)); Martin, B. (Lab. de Neurobiologie Cellulaire et Moleculaire, 91 - Gif-sur-Yvette (France))

    1991-12-01

    We have developed a two-dimensional detector (SOFI) for {sup 32}P emitting molecules used in molecular biology by combining scinitillating optical fibers (SOFs) and a multianode photomultiplier (MAPM). A good efficiency (15%) was obtained by suppressing the internal cross talk of the MAPM with a new electronic device. Using this improvement we are developing two new detectors using SOFs for DNA sequencing. We shall present the basic principle of these detectors and the results in efficiency and position accuracy obtained with the first prototypes. The advantage of these detectors over currently available DNA sequencers will be discussed. (orig.).

  17. A Bioluminometric Method of DNA Sequencing

    Science.gov (United States)

    Ronaghi, Mostafa; Pourmand, Nader; Stolc, Viktor; Arnold, Jim (Technical Monitor)

    2001-01-01

    Pyrosequencing is a bioluminometric single-tube DNA sequencing method that takes advantage of co-operativity between four enzymes to monitor DNA synthesis. In this sequencing-by-synthesis method, a cascade of enzymatic reactions yields detectable light, which is proportional to incorporated nucleotides. Pyrosequencing has the advantages of accuracy, flexibility and parallel processing. It can be easily automated. Furthermore, the technique dispenses with the need for labeled primers, labeled nucleotides and gel-electrophoresis. In this chapter, the use of this technique for different applications is discussed.

  18. Thermodynamics of sequence-specific binding of PNA to DNA

    DEFF Research Database (Denmark)

    Ratilainen, T; Holmén, A; Tuite, E

    2000-01-01

    For further characterization of the hybridization properties of peptide nucleic acids (PNAs), the thermodynamics of hybridization of mixed sequence PNA-DNA duplexes have been studied. We have characterized the binding of PNA to DNA in terms of binding affinity (perfectly matched duplexes) and seq......For further characterization of the hybridization properties of peptide nucleic acids (PNAs), the thermodynamics of hybridization of mixed sequence PNA-DNA duplexes have been studied. We have characterized the binding of PNA to DNA in terms of binding affinity (perfectly matched duplexes......) and sequence specificity of binding (singly mismatched duplexes) using mainly absorption hypochromicity melting curves and isothermal titration calorimetry. For perfectly sequence-matched duplexes of varying lengths (6-20 bp), the average free energy of binding (DeltaG degrees ) was determined to be -6...... relative to that of the perfectly matched sequence with a corresponding free energy penalty of about 15 kJ mol(-1) bp(-1). The average cost of a single mismatch is therefore estimated to be on the order of or larger than the gain of two matched base pairs, resulting in an apparent binding constant of only...

  19. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  20. Nanopore Technology: A Simple, Inexpensive, Futuristic Technology for DNA Sequencing.

    Science.gov (United States)

    Gupta, P D

    2016-10-01

    In health care, importance of DNA sequencing has been fully established. Sanger's Capillary Electrophoresis DNA sequencing methodology is time consuming, cumbersome, hence become more expensive. Lately, because of its versatility DNA sequencing became house hold name, and therefore, there is an urgent need of simple, fast, inexpensive, DNA sequencing technology. In the beginning of this century efforts were made, and Nanopore DNA sequencing technology was developed; still it is infancy, nevertheless, it is the futuristic technology.

  1. The impact of sequence length and number of sequences on promoter prediction performance.

    Science.gov (United States)

    Carvalho, Sávio G; Guerra-Sá, Renata; de C Merschmann, Luiz H

    2015-01-01

    The advent of rapid evolution on sequencing capacity of new genomes has evidenced the need for data analysis automation aiming at speeding up the genomic annotation process and reducing its cost. Given that one important step for functional genomic annotation is the promoter identification, several studies have been taken in order to propose computational approaches to predict promoters. Different classifiers and characteristics of the promoter sequences have been used to deal with this prediction problem. However, several works in literature have addressed the promoter prediction problem using datasets containing sequences of 250 nucleotides or more. As the sequence length defines the amount of dataset attributes, even considering a limited number of properties to characterize the sequences, datasets with a high number of attributes are generated for training classifiers. Once high-dimensional datasets can degrade the classifiers predictive performance or even require an infeasible processing time, predicting promoters by training classifiers from datasets with a reduced number of attributes, it is essential to obtain good predictive performance with low computational cost. To the best of our knowledge, there is no work in literature that verified in a systematic way the relation between the sequences length and the predictive performance of classifiers. Thus, in this work, we have evaluated the impact of sequence length variation and training dataset size (number of sequences) on the predictive performance of classifiers. We have built sixteen datasets composed of different sized sequences (ranging in length from 12 to 301 nucleotides) and evaluated them using the SVM, Random Forest and k-NN classifiers. The best predictive performances reached by SVM and Random Forest remained relatively stable for datasets composed of sequences varying in length from 301 to 41 nucleotides, while k-NN achieved its best performance for the dataset composed of 101 nucleotides. We

  2. Cloning and sequencing of complete τ-crystallin cDNA from ...

    Indian Academy of Sciences (India)

    Unknown

    length τ-crystallin cDNA from crocodilian lens and α-enolase from other tissues. ... human (Acc. No. NM_001428). The sequences were used to construct a phylogenetic tree depicting gene lineage, using the clustering program DNAML.

  3. The DNA sequence specificity of bleomycin cleavage in a systematically altered DNA sequence.

    Science.gov (United States)

    Gautam, Shweta D; Chen, Jon K; Murray, Vincent

    2017-08-01

    Bleomycin is an anti-tumour agent that is clinically used to treat several types of cancers. Bleomycin cleaves DNA at specific DNA sequences and recent genome-wide DNA sequencing specificity data indicated that the sequence 5'-RTGT*AY (where T* is the site of bleomycin cleavage, R is G/A and Y is T/C) is preferentially cleaved by bleomycin in human cells. Based on this DNA sequence, we constructed a plasmid clone to explore this bleomycin cleavage preference. By systematic variation of single nucleotides in the 5'-RTGT*AY sequence, we were able to investigate the effect of nucleotide changes on bleomycin cleavage efficiency. We observed that the preferred consensus DNA sequence for bleomycin cleavage in the plasmid clone was 5'-YYGT*AW (where W is A/T). The most highly cleaved sequence was 5'-TCGT*AT and, in fact, the seven most highly cleaved sequences conformed to the consensus sequence 5'-YYGT*AW. A comparison with genome-wide results was also performed and while the core sequence was similar in both environments, the surrounding nucleotides were different.

  4. Concentration and length dependence of DNA looping in transcriptional regulation.

    Directory of Open Access Journals (Sweden)

    Lin Han

    2009-05-01

    Full Text Available In many cases, transcriptional regulation involves the binding of transcription factors at sites on the DNA that are not immediately adjacent to the promoter of interest. This action at a distance is often mediated by the formation of DNA loops: Binding at two or more sites on the DNA results in the formation of a loop, which can bring the transcription factor into the immediate neighborhood of the relevant promoter. These processes are important in settings ranging from the historic bacterial examples (bacterial metabolism and the lytic-lysogeny decision in bacteriophage, to the modern concept of gene regulation to regulatory processes central to pattern formation during development of multicellular organisms. Though there have been a variety of insights into the combinatorial aspects of transcriptional control, the mechanism of DNA looping as an agent of combinatorial control in both prokaryotes and eukaryotes remains unclear. We use single-molecule techniques to dissect DNA looping in the lac operon. In particular, we measure the propensity for DNA looping by the Lac repressor as a function of the concentration of repressor protein and as a function of the distance between repressor binding sites. As with earlier single-molecule studies, we find (at least two distinct looped states and demonstrate that the presence of these two states depends both upon the concentration of repressor protein and the distance between the two repressor binding sites. We find that loops form even at interoperator spacings considerably shorter than the DNA persistence length, without the intervention of any other proteins to prebend the DNA. The concentration measurements also permit us to use a simple statistical mechanical model of DNA loop formation to determine the free energy of DNA looping, or equivalently, the for looping.

  5. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis.

    Science.gov (United States)

    Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, J E; Millar, B C

    2006-01-06

    At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted.

  6. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    Directory of Open Access Journals (Sweden)

    Moore JE

    2006-01-01

    Full Text Available Abstract Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted.

  7. Optimal length of decomposition sequences composed of imperfect gates

    Science.gov (United States)

    Nam, Y. S.; Blümel, R.

    2017-05-01

    Quantum error correcting circuitry is both a resource for correcting errors and a source for generating errors. A balance has to be struck between these two aspects. Perfect quantum gates do not exist in nature. Therefore, it is important to investigate how flaws in the quantum hardware affect quantum computing performance. We do this in two steps. First, in the presence of realistic, faulty quantum hardware, we establish how quantum error correction circuitry achieves reduction in the extent of quantum information corruption. Then, we investigate fault-tolerant gate sequence techniques that result in an approximate phase rotation gate, and establish the existence of an optimal length L_{ {opt}} of the length L of the decomposition sequence. The existence of L_{ {opt}} is due to the competition between the increase in gate accuracy with increasing L, but the decrease in gate performance due to the diffusive proliferation of gate errors due to faulty basis gates. We present an analytical formula for the gate fidelity as a function of L that is in satisfactory agreement with the results of our simulations and allows the determination of L_{ {opt}} via the solution of a transcendental equation. Our result is universally applicable since gate sequence approximations also play an important role, e.g., in atomic and molecular physics and in nuclear magnetic resonance.

  8. Simple sequence repeats showing 'length preference' have regulatory functions in humans.

    Science.gov (United States)

    Krishnan, Jaya; Athar, Fathima; Rani, Tirupaati Swaroopa; Mishra, Rakesh Kumar

    2017-09-10

    Simple sequence repeats (SSRs), simple tandem repeats (STRs) or microsatellites are short tandem repeats of 1-6 nucleotide motifs. They are twice as abundant as the protein coding DNA in the human genome and yet little is known about their functional relevance. Analysis of genomes across various taxa show that despite the instability associated with longer stretches of repeats, few SSRs with specific longer repeat lengths are enriched in the genomes indicating a positive selection. This conserved feature of length dependent enrichment hints at not only sequence but also length dependent functionality for SSRs. In the present study, we selected 23 SSRs of the human genome that show specific repeat length dependent enrichment and analysed their cis-regulatory potential using promoter modulation, boundary and barrier assays. We find that the 23 SSR sequences, which are mostly intergenic and intronic, possess distinct cis-regulatory potential. They modulate minimal promoter activity in transient luciferase assays and are capable of functioning as enhancer-blockers and barrier elements. The results of our functional assays propose cis-gene regulatory roles for these specific length enriched SSRs and opens avenues for further investigations. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Cloning, sequencing and expression of cDNA encoding growth ...

    Indian Academy of Sciences (India)

    Unknown

    317. 2.4 cDNA sequencing and analysis. The nucleotide sequence of the cloned H. fossilis GH. cDNA was determined by Sanger's dideoxy chain termi- nation method, using Perkin Elmer bigdye terminator kit in an ABI Prism 377 automated DNA sequencer. All other computational analysis of the GH cDNA was done using.

  10. Local Renyi entropic profiles of DNA sequences.

    Science.gov (United States)

    Vinga, Susana; Almeida, Jonas S

    2007-10-16

    In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

  11. Local Renyi entropic profiles of DNA sequences

    Directory of Open Access Journals (Sweden)

    Vinga Susana

    2007-10-01

    Full Text Available Abstract Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM. Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

  12. DNA sequencing versus standard prenatal aneuploidy screening.

    Science.gov (United States)

    Bianchi, Diana W; Parker, R Lamar; Wentworth, Jeffrey; Madankumar, Rajeevi; Saffer, Craig; Das, Anita F; Craig, Joseph A; Chudova, Darya I; Devers, Patricia L; Jones, Keith W; Oliver, Kelly; Rava, Richard P; Sehnert, Amy J

    2014-02-27

    In high-risk pregnant women, noninvasive prenatal testing with the use of massively parallel sequencing of maternal plasma cell-free DNA (cfDNA testing) accurately detects fetal autosomal aneuploidy. Its performance in low-risk women is unclear. At 21 centers in the United States, we collected blood samples from women with singleton pregnancies who were undergoing standard aneuploidy screening (serum biochemical assays with or without nuchal translucency measurement). We performed massively parallel sequencing in a blinded fashion to determine the chromosome dosage for each sample. The primary end point was a comparison of the false positive rates of detection of fetal trisomies 21 and 18 with the use of standard screening and cfDNA testing. Birth outcomes or karyotypes were the reference standard. The primary series included 1914 women (mean age, 29.6 years) with an eligible sample, a singleton fetus without aneuploidy, results from cfDNA testing, and a risk classification based on standard screening. For trisomies 21 and 18, the false positive rates with cfDNA testing were significantly lower than those with standard screening (0.3% vs. 3.6% for trisomy 21, Paneuploidy (5 for trisomy 21, 2 for trisomy 18, and 1 for trisomy 13; negative predictive value, 100% [95% confidence interval, 99.8 to 100]). The positive predictive values for cfDNA testing versus standard screening were 45.5% versus 4.2% for trisomy 21 and 40.0% versus 8.3% for trisomy 18. In a general obstetrical population, prenatal testing with the use of cfDNA had significantly lower false positive rates and higher positive predictive values for detection of trisomies 21 and 18 than standard screening. (Funded by Illumina; ClinicalTrials.gov number, NCT01663350.).

  13. Simulating efficiently the evolution of DNA sequences.

    Science.gov (United States)

    Schöniger, M; von Haeseler, A

    1995-02-01

    Two menu-driven FORTRAN programs are described that simulate the evolution of DNA sequences in accordance with a user-specified model. This general stochastic model allows for an arbitrary stationary nucleotide composition and any transition-transversion bias during the process of base substitution. In addition, the user may define any hypothetical model tree according to which a family of sequences evolves. The programs suggest the computationally most inexpensive approach to generate nucleotide substitutions. Either reproducible or non-repeatable simulations, depending on the method of initializing the pseudo-random number generator, can be performed. The corresponding options are offered by the interface menu.

  14. Genomic signal processing for DNA sequence clustering

    Directory of Open Access Journals (Sweden)

    Gerardo Mendizabal-Ruiz

    2018-01-01

    Full Text Available Genomic signal processing (GSP methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

  15. Genomic signal processing for DNA sequence clustering.

    Science.gov (United States)

    Mendizabal-Ruiz, Gerardo; Román-Godínez, Israel; Torres-Ramos, Sulema; Salido-Ruiz, Ricardo A; Vélez-Pérez, Hugo; Morales, J Alejandro

    2018-01-01

    Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

  16. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  17. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  18. Next generation sequencing of DNA-launched Chikungunya vaccine virus

    Energy Technology Data Exchange (ETDEWEB)

    Hidajat, Rachmat; Nickols, Brian [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States); Forrester, Naomi [Institute for Human Infections and Immunity, Sealy Center for Vaccine Development and Department of Pathology, University of Texas Medical Branch, GNL, 301 University Blvd., Galveston, TX 77555 (United States); Tretyakova, Irina [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States); Weaver, Scott [Institute for Human Infections and Immunity, Sealy Center for Vaccine Development and Department of Pathology, University of Texas Medical Branch, GNL, 301 University Blvd., Galveston, TX 77555 (United States); Pushko, Peter, E-mail: ppushko@medigen-usa.com [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States)

    2016-03-15

    Chikungunya virus (CHIKV) represents a pandemic threat with no approved vaccine available. Recently, we described a novel vaccination strategy based on iDNA® infectious clone designed to launch a live-attenuated CHIKV vaccine from plasmid DNA in vitro or in vivo. As a proof of concept, we prepared iDNA plasmid pCHIKV-7 encoding the full-length cDNA of the 181/25 vaccine. The DNA-launched CHIKV-7 virus was prepared and compared to the 181/25 virus. Illumina HiSeq2000 sequencing revealed that with the exception of the 3′ untranslated region, CHIKV-7 viral RNA consistently showed a lower frequency of single-nucleotide polymorphisms than the 181/25 RNA including at the E2-12 and E2-82 residues previously identified as attenuating mutations. In the CHIKV-7, frequencies of reversions at E2-12 and E2-82 were 0.064% and 0.086%, while in the 181/25, frequencies were 0.179% and 0.133%, respectively. We conclude that the DNA-launched virus has a reduced probability of reversion mutations, thereby enhancing vaccine safety. - Highlights: • Chikungunya virus (CHIKV) is an emerging pandemic threat. • In vivo DNA-launched attenuated CHIKV is a novel vaccine technology. • DNA-launched virus was sequenced using HiSeq2000 and compared to the 181/25 virus. • DNA-launched virus has lower frequency of SNPs at E2-12 and E2-82 attenuation loci.

  19. Next generation sequencing of DNA-launched Chikungunya vaccine virus

    International Nuclear Information System (INIS)

    Hidajat, Rachmat; Nickols, Brian; Forrester, Naomi; Tretyakova, Irina; Weaver, Scott; Pushko, Peter

    2016-01-01

    Chikungunya virus (CHIKV) represents a pandemic threat with no approved vaccine available. Recently, we described a novel vaccination strategy based on iDNA® infectious clone designed to launch a live-attenuated CHIKV vaccine from plasmid DNA in vitro or in vivo. As a proof of concept, we prepared iDNA plasmid pCHIKV-7 encoding the full-length cDNA of the 181/25 vaccine. The DNA-launched CHIKV-7 virus was prepared and compared to the 181/25 virus. Illumina HiSeq2000 sequencing revealed that with the exception of the 3′ untranslated region, CHIKV-7 viral RNA consistently showed a lower frequency of single-nucleotide polymorphisms than the 181/25 RNA including at the E2-12 and E2-82 residues previously identified as attenuating mutations. In the CHIKV-7, frequencies of reversions at E2-12 and E2-82 were 0.064% and 0.086%, while in the 181/25, frequencies were 0.179% and 0.133%, respectively. We conclude that the DNA-launched virus has a reduced probability of reversion mutations, thereby enhancing vaccine safety. - Highlights: • Chikungunya virus (CHIKV) is an emerging pandemic threat. • In vivo DNA-launched attenuated CHIKV is a novel vaccine technology. • DNA-launched virus was sequenced using HiSeq2000 and compared to the 181/25 virus. • DNA-launched virus has lower frequency of SNPs at E2-12 and E2-82 attenuation loci.

  20. What Advances Are Being Made in DNA Sequencing?

    Science.gov (United States)

    ... diagnosis in the future. For more information about DNA sequencing technologies and their use: Genetics Home Reference discusses ... illustration of the decline in the cost of DNA sequencing , including that caused by the introduction of new ...

  1. Role of DNA deletion length in mutation and cell survival

    International Nuclear Information System (INIS)

    Braby, L.A.; Morgan, T.L.

    1992-01-01

    A model is presented which is based on the assumption that malignant transformation, mutation, chromosome aberration, and reproductive death of cells are all manifestations of radiation induced deletions in the DNA of the cell, and that the size of the deletion in relation to the spacing of essential genes determines the consequences of that deletion. It is assumed that two independent types of potentially lethal lesions can result in DNA deletions, and that the relative numbers of these types of damage is dependent on radiation quality. The repair of the damage reduces the length of a deletion, but does not always eliminate it. The predictions of this model are in good agreement with a wide variety of experimental evidence. (author)

  2. The DNA sequence of equine herpesvirus-1.

    Science.gov (United States)

    Telford, E A; Watson, M S; McBride, K; Davison, A J

    1992-07-01

    The complete DNA sequence was determined of a pathogenic British isolate of equine herpesvirus-1, a respiratory virus which can cause abortion and neurological disease. The genome is 150,223 bp in size, has a base composition of 56.7% G + C, and contains 80 open reading frames likely to encode protein. Since four open reading frames are duplicated in the major inverted repeat, two are probably expressed as a spliced mRNA, and one may contain an internal transcriptional promoter, the genome is considered to contain 76 distinct genes. The genes are arranged collinearly with those in the genomes of the two previously sequenced alphaherpesviruses, varicella-zoster virus, and herpes simplex virus type-1, and comparisons of predicted amino acid sequences allowed the functions of many equine herpesvirus 1 proteins to be assigned.

  3. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    OpenAIRE

    Natanaelsson, Christian; Oskarsson, Mattias CR; Angleby, Helen; Lundeberg, Joakim; Kirkness, Ewen; Savolainen, Peter

    2006-01-01

    Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromo...

  4. Prediction of fine-tuned promoter activity from DNA sequence.

    Science.gov (United States)

    Siwo, Geoffrey; Rider, Andrew; Tan, Asako; Pinapati, Richard; Emrich, Scott; Chawla, Nitesh; Ferdig, Michael

    2016-01-01

    The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring

  5. Statistical length of DNA based on AFM image measured by a computer

    International Nuclear Information System (INIS)

    Chen Xinqing; Qiu Xijun; Zhang Yi; Hu Jun; Wu Shiying; Huang Yibo; Ai Xiaobai; Li Minqian

    2001-01-01

    Taking advantage of image processing technology, the contour length of DNA molecule was measured automatically by a computer. Based on the AFM image of DNA, the topography of DNA was simulated into a curve. Then the DNA length was measured automatically by inserting mode. It was shown that the experimental length of a naturally deposited DNA (180.4 +- 16.4 nm) was well consistent with the theoretical length (185.0 nm). Comparing to other methods, the present approach had advantages of precision and automatism. The stretched DNA was also measured. It present approach had advantages of precision and automatism. The stretched DNA was also measured. It was shown that the experimental length (343.6 +- 20.7 nm) was much longer than the theoretical length (307.0 nm). This result indicated that the stretching process had a distinct effect on the DNA length. However, the method provided here avoided the DNA-stretching effect

  6. DNA Duplex Length and Salt Concentration Dependence of Enthalpy−Entropy Compensation Parameters for DNA Melting

    KAUST Repository

    Starikov, E. B.

    2009-08-20

    Systematical differential calorimetry experiments on DNA oligomers with different lengths and placed in water solutions with various added salt concentrations may, in principle, unravel important information about the structure and dynamics of the DNA and their water-counterion surrounding. With this in mind, to reinterpret the most recent results of calorimetric experiments on DNA oligomers of such a kind, the recent enthalpy-entropy compensation theory has been used. It is demonstrated that the application of the latter could enable direct estimation of thermodynamic parameters of the microphase transitions connected to the changes in DNA dynamical regimes versus the length of the biopolymers and the ionic strengths of their water solutions, and this calls for much more systematical experimental and theoretical studies in this field. © 2009 American Chemical Society.

  7. A colorimetric platform for sensitively differentiating telomere DNA with different lengths, monitoring G-quadruplex and dsDNA based on silver nanoclusters and unmodified gold nanoparticles

    Science.gov (United States)

    Qu, Fei; Chen, Zeqiu; You, Jinmao; Song, Cuihua

    2018-05-01

    Human telomere DNA plays a vital role in genome integrity control and carcinogenesis as an indication for extensive cell proliferation. Herein, silver nanoclusters (Ag NCs) templated by polymer and unmodified gold nanoparticles (Au NPs) are designed as a new colorimetric platform for sensitively differentiating telomere DNA with different lengths, monitoring G-quadruplex and dsDNA. Ag NCs can produce the aggregation of Au NPs, so the color of Au NPs changes to blue and the absorption peak moves to 700 nm. While the telomere DNA can protect Au NPs from aggregation, the color turns to red again and the absorption band blue shift. Benefiting from the obvious color change, we can differentiate the length of telomere DNA by naked eyes. As the length of telomere DNA is longer, the variation of color becomes more noticeable. The detection limits of telomere DNA containing 10, 22, 40, 64 bases are estimated to be 1.41, 1.21, 0.23 and 0.22 nM, respectively. On the other hand, when telomere DNA forms G-quadruplex in the presence of K+, or dsDNA with complementary sequence, both G-quadruplex and dsDNA can protect Au NPs better than the unfolded telomere DNA. Hence, a new colorimetric platform for monitoring structure conversion of DNA is established by Ag NCs-Au NPs system, and to prove this type of application, a selective K+ sensor is developed.

  8. cDNA sequence quality data - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project cDNA sequence quality data Data detail Data name cDNA sequence quality... data DOI 10.18908/lsdba.nbdc00838-003 Description of data contents Phred's quality score. P...tion Download License Update History of This Database Site Policy | Contact Us cDNA sequence quality

  9. Predicting DNA hybridization kinetics from sequence

    Science.gov (United States)

    Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu

    2018-01-01

    Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.

  10. Method for priming and DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Mugasimangalam, R.C.; Ulanovsky, L.E.

    1997-12-01

    A method is presented for improving the priming specificity of an oligonucleotide primer that is non-unique in a nucleic acid template which includes selecting a continuous stretch of several nucleotides in the template DNA where one of the four bases does not occur in the stretch. This also includes bringing the template DNA in contract with a non-unique primer partially or fully complimentary to the sequence immediately upstream of the selected sequence stretch. This results in polymerase-mediated differential extension of the primer in the presence of a subset of deoxyribonucleotide triphosphates that does not contain the base complementary to the base absent in the selected sequence stretch. These reactions occur at a temperature sufficiently low for allowing the extension of the non-unique primer. The method causes polymerase-mediated extension reactions in the presence of all four natural deoxyribonucleotide triphosphates or modifications. At this high temperature discrimination occurs against priming sites of the non-unique primer where the differential extension has not made the primer sufficiently stable to prime. However, the primer extended at the selected stretch is sufficiently stable to prime.

  11. Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting

    Energy Technology Data Exchange (ETDEWEB)

    Winston Chen, C.H.; Taranenko, N.I.; Zhu, Y.F.; Chung, C.N.; Allman, S.L.

    1997-03-01

    Since laser mass spectrometry has the potential for achieving very fast DNA analysis, the authors recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Snager`s enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. The preliminary results indicate laser mass spectrometry can possibly be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, the authors applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.

  12. Elucidating population histories using genomic DNA sequences.

    Science.gov (United States)

    Vigilant, Linda

    2009-04-01

    In 1993, Cliff Jolly suggested that rather than debating species definitions and classifications, energy would be better spent investigating multidimensional patterns of variation and gene flow among populations. Until now, however, genetic studies of wild primate populations have been limited to very small portions of the genome. Access to complete genome sequences of humans, chimpanzees, macaques, and other primates makes it possible to design studies surveying substantial amounts of DNA sequence variation at multiple genetic loci in representatives of closely related but distinct wild primate populations. Such data can be analyzed with new approaches that estimate not only when populations diverged but also the relative amounts and directions of subsequent gene flow. These analyses will reemphasize the difficulty of achieving consistent species and subspecies definitions by revealing the extent of variation in the amount and duration of gene flow accompanying population divergences.

  13. DNA sequencing using biotinylated dideoxynucleotides and mass spectrometry

    Science.gov (United States)

    Edwards, John R.; Itagaki, Yasuhiro; Ju, Jingyue

    2001-01-01

    Matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MS) has been explored widely for DNA sequencing. The major requirement for this method is that the DNA sequencing fragments must be free from alkaline and alkaline earth salts as well as other contaminants for accurately measuring the masses of the DNA fragments. We report here the development of a novel MS DNA sequencing method that generates Sanger-sequencing fragments in one tube using biotinylated dideoxynucleotides. The DNA sequencing fragments that carry a biotin at the 3′-end are made free from salts and other components in the sequencing reaction by capture with streptavidin-coated magnetic beads. Only correctly terminated biotinylated DNA fragments are subsequently released and loaded onto a mass spectrometer to obtain accurate DNA sequencing data. Compared with gel electrophoresis-based sequencing systems, MS produces a very high resolution of DNA-sequencing fragments, fast separation on microsecond time scales, and completely eliminates the compressions associated with gel electrophoresis. The high resolution of MS allows accurate mutation and heterozygote detection. This optimized solid-phase DNA-sequencing chemistry plus future improvements in detector sensitivity for large DNA fragments in MS instrumentation will further improve MS for DNA sequencing. PMID:11691941

  14. Mitochondrial DNA sequence of Onychostoma rara.

    Science.gov (United States)

    Zeng, Chun-Fang; Li, Xiao-Ling; Li, Chuan-Wu; Huang, Xiang-Rong; Wan, Yi-Wen

    2015-01-01

    The complete mitochondrial genome sequence of Onychostoma rara was determined to be 16,590 bp in length and contains 13 protein-coding genes (PCGs), 22 tRNA genes, large (rrnL) and small (rrnS) rRNA and the non-coding control region. Its total A + T content is 55.65%. We also analyzed the structure of control region, 6 CSBs (CSB-1, CSB-2, CSB-3, CSB-D, CSB-E and CSB-F) and 2 bp tandem repeat were detected.

  15. Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

    Science.gov (United States)

    McCutchen-Maloney, Sandra L.

    2002-01-01

    Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.

  16. Silicene nanoribbon as a new DNA sequencing device

    Science.gov (United States)

    Alesheikh, Sara; Shahtahmassebi, Nasser; Roknabadi, Mahmood Rezaee; Pilevar Shahri, Raheleh

    2018-02-01

    The importance of applying DNA sequencing in different fields, results in looking for fast and cheap methods. Nanotechnology helps this development by introducing nanostructures used for DNA sequencing. In this work we study the interaction between zigzag silicene nanoribbon and DNA nucleobases using DFT and non equilibrium Green's function approach, to investigate the possibility of using zigzag silicene nanoribbons as a biosensor for DNA sequencing.

  17. Full-length sequencing and identification of novel polymorphisms in ...

    Indian Academy of Sciences (India)

    The aim of this work was to sequence the entirecoding region of ACACA gene in Valle del Belice sheep breed to identify polymorphic sites. A total of 51 coding exons of ACACA gene were sequenced in 32 individuals of Valle del Belice sheep breed. Sequencing analysis and alignment of obtained sequences showed the ...

  18. Identification of Meconopsis species by a DNA barcode sequence ...

    African Journals Online (AJOL)

    Deoxyribonucleic acid (DNA) barcoding is a novel technology that uses a standard DNA sequence to facilitate species identification. Species identification is necessary for the authentication of traditional plant based medicines. Although a consensus has not been agreed regarding which DNA sequences can be used as ...

  19. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Rogelio Alcántara-Silva

    2017-03-01

    Full Text Available Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .

  20. Characterization of the complete nuclear ribosomal DNA sequences of Eurytrema pancreaticum.

    Science.gov (United States)

    Su, X; Zhang, Y; Zheng, X; Wang, X X; Li, Y; Li, Q; Wang, C R

    2017-06-27

    Eurytrema pancreaticum is one of the most common trematodes of cattle and sheep, and also infects humans occasionally, causing great economic losses and medical costs. In this study, the sequences of the complete nuclear ribosomal DNA (rDNA) repeat units of five E. pancreaticum individuals were determined for the first time. They were 8306-8310 bp in length, including the small subunit (18S) rDNA, internal transcribed spacer 1 (ITS1), 5.8S rDNA, internal transcribed spacer 2 (ITS2), large subunit (28S) rDNA and intergenic spacer (IGS). There were no length variations in any of the investigated 18S (1996 bp), ITS1 (1103 bp), 5.8S (160 bp), ITS2 (231 bp) or 28S (3669 bp) rDNA sequences, whereas the IGS rDNA sequences of E. pancreaticum had a 4-bp length variation, ranging from 1147 to 1151 bp. The intraspecific variations within E. pancreaticum were 0-0.2% for 18S rDNA, 0-0.5% for ITS1, 0% for 5.8S rDNA and ITS2, 0-0.2% for 28S rDNA and 2.9-20.2% for IGS. There were nine types of repeat sequences in ITS1, two types in 28S rDNA, but none in IGS. A phylogenetic analysis based on the 18S rDNA sequences classified E. pancreaticum in the family Dicrocoeliidae of Plagiorchiata, closely related to the suborder Opisthorchiata. These results provide useful information for the further study of Dicrocoeliidae trematodes.

  1. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  2. SWORDS: A statistical tool for analysing large DNA sequences

    Indian Academy of Sciences (India)

    In this article, we present some simple yet effective statistical techniques for analysing and comparing large DNA sequences. These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in public domain ...

  3. SWORDS: A statistical tool for analysing large DNA sequences

    Indian Academy of Sciences (India)

    Unknown

    In this article, we present some simple yet effective statistical techniques for analysing and comparing large. DNA sequences. These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in public domain ...

  4. Analysis of mutation/rearrangement frequencies and methylation patterns at a given DNA locus using restriction fragment length polymorphism.

    Science.gov (United States)

    Boyko, Alex; Kovalchuk, Igor

    2010-01-01

    Restriction fragment length polymorphism (RFLP) is a difference in DNA sequences of organisms belonging to the same species. RFLPs are typically detected as DNA fragments of different lengths after digestion with various restriction endonucleases. The comparison of RFLPs allows investigators to analyze the frequency of occurrence of mutations, such as point mutations, deletions, insertions, and gross chromosomal rearrangements, in the progeny of stressed plants. The assay involves restriction enzyme digestion of DNA followed by hybridization of digested DNA using a radioactively or enzymatically labeled probe. Since DNA can be digested with methylation sensitive enzymes, the assay can also be used to analyze a methylation pattern of a particular locus. Here, we describe RFLP analysis using methylation-insensitive and methylation-sensitive enzymes.

  5. High-Coverage Long Read DNA Sequencing with the Oxford Nanopore MinION

    OpenAIRE

    Jain, Miten

    2017-01-01

    Nanopore sequencing was conceived in 1989 by Dave Deamer (UCSC). Over two decades of development from research laboratories and, later on, Oxford Nanopore Technologies resulted in the MinION nanopore sequencer. This work describes the developments in MinION nanopore sequencing and software, and technical milestones achieved since the MinION’s release in 2014. These developments include establishing DNA reads that exceed 200 kb+ lengths and direct, simultaneous detection of nucleotide modifica...

  6. Feature Extraction From DNA Sequences by Multifractal Analysis

    National Research Council Canada - National Science Library

    Zhang, H

    2001-01-01

    This paper presents feature extraction and estimation of multifractal measures of DNA sequences using a multifractal methodology and demonstrates a new scheme for identifying biological functionality...

  7. Sequencing the hypervariable regions of human mitochondrial DNA using massively parallel sequencing: Enhanced data acquisition for DNA samples encountered in forensic testing.

    Science.gov (United States)

    Davis, Carey; Peters, Dixie; Warshauer, David; King, Jonathan; Budowle, Bruce

    2015-03-01

    Mitochondrial DNA testing is a useful tool in the analysis of forensic biological evidence. In cases where nuclear DNA is damaged or limited in quantity, the higher copy number of mitochondrial genomes available in a sample can provide information about the source of a sample. Currently, Sanger-type sequencing (STS) is the primary method to develop mitochondrial DNA profiles. This method is laborious and time consuming. Massively parallel sequencing (MPS) can increase the amount of information obtained from mitochondrial DNA samples while improving turnaround time by decreasing the numbers of manipulations and more so by exploiting high throughput analyses to obtain interpretable results. In this study 18 buccal swabs, three different tissue samples from five individuals, and four bones samples from casework were sequenced at hypervariable regions I and II using STS and MPS. Sample enrichment for STS and MPS was PCR-based. Library preparation for MPS was performed using Nextera® XT DNA Sample Preparation Kit and sequencing was performed on the MiSeq™ (Illumina, Inc.). MPS yielded full concordance of base calls with STS results, and the newer methodology was able to resolve length heteroplasmy in homopolymeric regions. This study demonstrates short amplicon MPS of mitochondrial DNA is feasible, can provide information not possible with STS, and lays the groundwork for development of a whole genome sequencing strategy for degraded samples. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  8. An automated annotation tool for genomic DNA sequences using ...

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  9. An automated annotation tool for genomic DNA sequences using

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  10. Hypervariable minisatellite DNA sequences in the Indian peafowl Pavo cristatus.

    Science.gov (United States)

    Hanotte, O; Burke, T; Armour, J A; Jeffreys, A J

    1991-04-01

    We report here for the first time the large-scale isolation of hypervariable minisatellite DNA sequences from a non-human species, the Indian peafowl (Pavo cristatus). A size-selected genomic DNA fraction, rich in hypervariable minisatellites, was cloned into Charomid 9-36. This library was screened using two multilocus hypervariable probes, 33.6 and 33.15 and also, in a "probe-walking" approach, with five of the peafowl minisatellites initially isolated. Forty-eight positively hybridizing clones were characterized and found to originate from 30 different loci, 18 of which were polymorphic. Five of these variable minisatellite loci were studied further. They all showed Mendelian inheritance. The heterozygosities of these loci were relatively low (range 22-78%) in comparison with those of previously cloned human loci, as expected in view of inbreeding in our semicaptive study population. No new length allele mutations were observed in families and the mean mutation rate per locus is low (less than 0.004, 95% confidence maximum). These loci were also investigated by cross-species hybridization in related taxa. The ability of the probes to detect hypervariable sequences in other species within the same avian family was found to vary, from those probes that are species-specific to those that are apparently general to the family. We also illustrate the potential usefulness of these probes for paternity analysis in a study of sexual selection, and discuss the general application of specific hypervariable probes in behavioral and evolutionary studies.

  11. Random Coding Bounds for DNA Codes Based on Fibonacci Ensembles of DNA Sequences

    Science.gov (United States)

    2008-07-01

    Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington, DC...COVERED (From - To) 6 Jul 08 – 11 Jul 08 4. TITLE AND SUBTITLE RANDOM CODING BOUNDS FOR DNA CODES BASED ON FIBONACCI ENSEMBLES OF DNA SEQUENCES...sequences which are generalizations of the Fibonacci sequences. 15. SUBJECT TERMS DNA Codes, Fibonacci Ensembles, DNA Computing, Code Optimization 16

  12. 18S rDNA sequences and the holometabolous insects.

    Science.gov (United States)

    Carmean, D; Kimsey, L S; Berbee, M L

    1992-12-01

    The Holometabola (insects with complete metamorphosis: beetles, wasps, flies, fleas, butterflies, lacewings, and others) is a monophyletic group that includes the majority of the world's animal species. Holometabolous orders are well defined by morphological characters, but relationships among orders are unclear. In a search for a region of DNA that will clarify the interordinal relationships we sequenced approximately 1080 nucleotides of the 5' end of the 18S ribosomal RNA gene from representatives of 14 families of insects in the orders Hymenoptera (sawflies and wasps), Neuroptera (lacewing and antlion), Siphonaptera (flea), and Mecoptera (scorpionfly). We aligned the sequences with the published sequences of insects from the orders Coleoptera (beetle) and Diptera (mosquito and Drosophila), and the outgroups aphid, shrimp, and spider. Unlike the other insects examined in this study, the neuropterans have A-T rich insertions or expansion regions: one in the antlion was approximately 260 bp long. The dipteran 18S rDNA evolved rapidly, with over 3 times as many substitutions among the aligned sequences, and 2-3 times more unalignable nucleotides than other Holometabola, in violation of an insect-wide molecular clock. When we excluded the long-branched taxa (Diptera, shrimp, and spider) from the analysis, the most parsimonious (minimum-length) trees placed the beetle basal to other holometabolous orders, and supported a morphologically monophyletic clade including the fleas+scorpionflies (96% bootstrap support). However, most interordinal relationships were not significantly supported when tested by maximum likelihood or bootstrapping and were sensitive to the taxa included in the analysis. The most parsimonious and maximum-likelihood trees both separated the Coleoptera and Neuroptera, but this separation was not statistically significant.(ABSTRACT TRUNCATED AT 250 WORDS)

  13. Mouse tetranectin: cDNA sequence, tissue-specific expression, and chromosomal mapping

    DEFF Research Database (Denmark)

    Ibaraki, K; Kozak, C A; Wewer, U M

    1995-01-01

    regulation, mouse tetranectin cDNA was cloned from a 16-day-old mouse embryo library. Sequence analysis revealed a 992-bp cDNA with an open reading frame of 606 bp, which is identical in length to the human tetranectin cDNA. The deduced amino acid sequence showed high homology to the human cDNA with 76......(s) of tetranectin. The sequence analysis revealed a difference in both sequence and size of the noncoding regions between mouse and human cDNAs. Northern analysis of the various tissues from mouse, rat, and cow showed the major transcript(s) to be approximately 1 kb, which is similar in size to that observed...

  14. DNA sequencing by denaturation: principle and thermodynamic simulations.

    Science.gov (United States)

    Chen, Ying-Ja; Huang, Xiaohua

    2009-01-01

    We describe a new DNA sequencing method called sequencing by denaturation (SBD). A Sanger dideoxy sequencing reaction is performed on the templates on a solid surface to generate a ladder of DNA fragments randomly terminated by fluorescently labeled dideoxyribonucleotides. The labeled DNA fragments are sequentially denatured from the templates and the process is monitored by measuring the change in fluorescence intensities from the surface. By analyzing the denaturation profiles, the base sequence of the template can be determined. Using thermodynamic principles, we simulated the denaturation profiles of a series of oligonucleotides ranging from 12 to 32 bases and developed a base-calling algorithm to decode the sequences. These simulations demonstrate that DNA molecules up to 20 bases can be sequenced by SBD. Experimental measurements of the melting profiles of DNA fragments in solution confirm that DNA sequences can be determined by SBD. The potential limitations and advantages of SBD are discussed. With SBD, millions of sequencing reactions can be performed on a small area on a surface in parallel with a very small amount of sequencing reagents. Therefore, DNA sequencing by SBD could potentially result in a significant increase in speed and reduction in cost in large-scale genome resequencing.

  15. Complete genome sequence of the mitochondrial DNA of the river lamprey, Lethenteron japonicum.

    Science.gov (United States)

    Kawai, Yuri L; Yura, Kei; Shindo, Miyuki; Kusakabe, Rie; Hayashi, Keiko; Hata, Kenichiro; Nakabayashi, Kazuhiko; Okamura, Kohji

    2015-01-01

    Lampreys are eel-like jawless fishes evolutionarily positioned between invertebrates and vertebrates, and have been used as model organisms to explore vertebrate evolution. In this study we determined the complete genome sequence of the mitochondrial DNA of the Japanese river lamprey, Lethenteron japonicum, using next-generation sequencers. The sequence was 16,272 bp in length. The gene content and order were identical to those of the sea lamprey, Petromyzon marinus, which has been the reference among lamprey species. However, the sequence similarity was less than 90%, suggesting the need for the whole-genome sequencing of L. japonicum.

  16. Nanopores: A journey towards DNA sequencing

    Science.gov (United States)

    Wanunu, Meni

    2013-01-01

    Much more than ever, nucleic acids are recognized as key building blocks in many of life's processes, and the science of studying these molecular wonders at the single-molecule level is thriving. A new method of doing so has been introduced in the mid 1990's. This method is exceedingly simple: a nanoscale pore that spans across an impermeable thin membrane is placed between two chambers that contain an electrolyte, and voltage is applied across the membrane using two electrodes. These conditions lead to a steady stream of ion flow across the pore. Nucleic acid molecules in solution can be driven through the pore, and structural features of the biomolecules are observed as measurable changes in the trans-membrane ion current. In essence, a nanopore is a high-throughput ion microscope and a single-molecule force apparatus. Nanopores are taking center stage as a tool that promises to read a DNA sequence, and this promise has resulted in overwhelming academic, industrial, and national interest. Regardless of the fate of future nanopore applications, in the process of this 16-year-long exploration, many studies have validated the indispensability of nanopores in the toolkit of single-molecule biophysics. This review surveys past and current studies related to nucleic acid biophysics, and will hopefully provoke a discussion of immediate and future prospects for the field. PMID:22658507

  17. Multifractal analysis of DNA sequences using a novel chaos-game representation

    Science.gov (United States)

    Gutiérrez, J. M.; Rodríguez, M. A.; Abramson, G.

    2001-11-01

    We present a generalization of the standard chaos-game representation method introduced by Jeffrey. To this aim, a DNA symbolic sequence is mapped onto a singular measure on the attractor of a particular IFS model, which is a perfect statistical representation of the sequence. A multifractal analysis of the resulting measure is introduced and an interpretation of singularities in terms of mutual information and redundancy (statistical dependence) among subsequence symbols within the DNA sequence is provided. The multifractal spectrum is also shown to be more sensitive for detecting dependence structures within the DNA sequence than the averaged contribution given by redundancy. This method presents several advantages with respect to other representations such as walks or interfaces, which may introduce spurious effects. In contrast with the results obtained by other standard methods, here we note that no general statement can be made on the influence of coding and non-coding content on the correlation length of a given sequence.

  18. Levenshtein error-correcting barcodes for multiplexed DNA sequencing

    NARCIS (Netherlands)

    Buschmann, Tilo; Bystrykh, Leonid V.

    2013-01-01

    Background: High-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called

  19. Sequence-specific packaging of DNA in human sperm chromatin

    Energy Technology Data Exchange (ETDEWEB)

    Gatewood, J.M.; Cook, G.R.; Balhorn, R.; Bradbury, E.M.; Schmid, C.W.

    1987-05-22

    The DNA in human sperm chromatin is packaged into nucleoprotamine (approx.85%) and nucleohistone (approx.15%). Whether these two chromatin fractions are sequence-specific subsets of the spermatozoon genome is the question addressed in this report. Sequence-specific packaging would suggest distinct structural and functional roles for nucleohistone and nucleoprotamine in late spermatogenesis or early development or both. After removal of histones with 0.65 M NaCl, exposed DNA was cleaved with Bam HI restriction endonuclease and separated by centrifugation from insoluble nucleoprotamine. The DNA sequence distribution of nucleohistone DNA in the supernatant and nucleoprotamine DNA in the pellet was compared by cloning size-selected single-copy sequences and by using the derived clones as probes of nucleohistone DNA and nucleoprotamine DNA. Two clones derived from nucleohistone DNA preferentially hybridized to nucleohistone DNA, and two clones derived from nucleoprotamine DNA preferentially hybridized to nucleoprotamine DNA, which demonstrated the existence of sequence-specific nucleohistone and nucleoprotamine components within the human spermatozoon.

  20. Cloning of human purine-nucleoside phosphorylase cDNA sequences by complementation in Escherichia coli.

    OpenAIRE

    Goddard, J M; Caput, D; Williams, S R; Martin, D W

    1983-01-01

    We have obtained cDNA clones that contain the entire coding region of the human purine-nucleoside phosphorylase (PNP; EC 2.4.2.1) mRNA. The cDNA sequences were generated by reverse transcription of PNP-enriched mRNA obtained by immunoadsorption of HeLa cell polyribosomes with monospecific antibody to human PNP. cDNA molecules that were close in length to PNP mRNA were separated by agarose gel electrophoresis and inserted into the Pst I site of the plasmid pBR322. Plasmid DNA from the pooled c...

  1. Molecular design of sequence specific DNA alkylating agents.

    Science.gov (United States)

    Minoshima, Masafumi; Bando, Toshikazu; Shinohara, Ken-ichi; Sugiyama, Hiroshi

    2009-01-01

    Sequence-specific DNA alkylating agents have great interest for novel approach to cancer chemotherapy. We designed the conjugates between pyrrole (Py)-imidazole (Im) polyamides and DNA alkylating chlorambucil moiety possessing at different positions. The sequence-specific DNA alkylation by conjugates was investigated by using high-resolution denaturing polyacrylamide gel electrophoresis (PAGE). The results showed that polyamide chlorambucil conjugates alkylate DNA at flanking adenines in recognition sequences of Py-Im polyamides, however, the reactivities and alkylation sites were influenced by the positions of conjugation. In addition, we synthesized conjugate between Py-Im polyamide and another alkylating agent, 1-(chloromethyl)-5-hydroxy-1,2-dihydro-3H-benz[e]indole (seco-CBI). DNA alkylation reactivies by both alkylating polyamides were almost comparable. In contrast, cytotoxicities against cell lines differed greatly. These comparative studies would promote development of appropriate sequence-specific DNA alkylating polyamides against specific cancer cells.

  2. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    Science.gov (United States)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  3. Isolation of full-length putative rat lysophospholipase cDNA using improved methods for mRNA isolation and cDNA cloning

    International Nuclear Information System (INIS)

    Han, J.H.; Stratowa, C.; Rutter, W.J.

    1987-01-01

    The authors have cloned a full-length putative rat pancreatic lysophospholipase cDNA by an improved mRNA isolation method and cDNA cloning strategy using [ 32 P]-labelled nucleotides. These new methods allow the construction of a cDNA library from the adult rat pancreas in which the majority of recombinant clones contained complete sequences for the corresponding mRNAs. A previously recognized but unidentified long and relatively rare cDNA clone containing the entire sequence from the cap site at the 5' end to the poly(A) tail at the 3' end of the mRNA was isolated by single-step screening of the library. The size, amino acid composition, and the activity of the protein expressed in heterologous cells strongly suggest this mRNA codes for lysophospholipase

  4. Affordable Hands-On DNA Sequencing and Genotyping: An Exercise for Teaching DNA Analysis to Undergraduates

    Science.gov (United States)

    Shah, Kushani; Thomas, Shelby; Stein, Arnold

    2013-01-01

    In this report, we describe a 5-week laboratory exercise for undergraduate biology and biochemistry students in which students learn to sequence DNA and to genotype their DNA for selected single nucleotide polymorphisms (SNPs). Students use miniaturized DNA sequencing gels that require approximately 8 min to run. The students perform G, A, T, C…

  5. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  6. DNA polymerases drive DNA sequencing-by-synthesis technologies: both past and present

    Science.gov (United States)

    Chen, Cheng-Yao

    2014-01-01

    Next-generation sequencing (NGS) technologies have revolutionized modern biological and biomedical research. The engines responsible for this innovation are DNA polymerases; they catalyze the biochemical reaction for deriving template sequence information. In fact, DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. Escherichia coli DNA polymerase I proteolytic (Klenow) fragment was originally utilized in Sanger’s dideoxy chain-terminating DNA sequencing chemistry. From these humble beginnings followed an explosion of organism-specific, genome sequence information accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic bacteria/archaea were modified and tested in today’s standard capillary electrophoresis (CE) and NGS sequencing platforms. These enzymes were selected for their efficient incorporation of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage ϕ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphate-labeled nucleoside polyphosphates. Furthermore, ϕ29 enzyme has also been utilized in emerging DNA sequencing technologies including nanopore-, and protein-transistor-based sequencing. DNA polymerase is, and will continue to be, a crucial component of sequencing technologies. PMID:25009536

  7. DNA Polymerases Drive DNA Sequencing-by-Synthesis Technologies: Both Past and Present

    Directory of Open Access Journals (Sweden)

    Cheng-Yao eChen

    2014-06-01

    Full Text Available Next-generation sequencing (NGS technologies have revolutionized modern biological and biomedical research. The engines responsible for this innovation are DNA polymerases; they catalyze the biochemical reaction for deriving template sequence information. In fact, DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. E. coli DNA polymerase I proteolytic (Klenow fragment was originally utilized in Sanger's dideoxy chain terminating DNA sequencing chemistry. From these humble beginnings followed an explosion of organism-specific, genome sequence information accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic bacteria/archaea were modified and tested in today's standard capillary electrophoresis (CE and NGS sequencing platforms. These enzymes were selected for their efficient incorporation of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage ⱷ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphate-labeled nucleoside polyphosphates. Furthermore, ⱷ29 enzyme has also been utilized in emerging DNA sequencing technologies including nanopore-, and protein-transistor-based sequencing. DNA polymerase is, and will continue to be, a crucial component of sequencing technologies.

  8. DNA polymerase having modified nucleotide binding site for DNA sequencing

    Science.gov (United States)

    Tabor, Stanley; Richardson, Charles

    1997-01-01

    Modified gene encoding a modified DNA polymerase wherein the modified polymerase incorporates dideoxynucleotides at least 20-fold better compared to the corresponding deoxynucleotides as compared with the corresponding naturally-occurring DNA polymerase.

  9. Low-Energy Electron-Induced Strand Breaks in Telomere-Derived DNA Sequences-Influence of DNA Sequence and Topology.

    Science.gov (United States)

    Rackwitz, Jenny; Bald, Ilko

    2018-03-26

    During cancer radiation therapy high-energy radiation is used to reduce tumour tissue. The irradiation produces a shower of secondary low-energy (DNA very efficiently by dissociative electron attachment. Recently, it was suggested that low-energy electron-induced DNA strand breaks strongly depend on the specific DNA sequence with a high sensitivity of G-rich sequences. Here, we use DNA origami platforms to expose G-rich telomere sequences to low-energy (8.8 eV) electrons to determine absolute cross sections for strand breakage and to study the influence of sequence modifications and topology of telomeric DNA on the strand breakage. We find that the telomeric DNA 5'-(TTA GGG) 2 is more sensitive to low-energy electrons than an intermixed sequence 5'-(TGT GTG A) 2 confirming the unique electronic properties resulting from G-stacking. With increasing length of the oligonucleotide (i.e., going from 5'-(GGG ATT) 2 to 5'-(GGG ATT) 4 ), both the variety of topology and the electron-induced strand break cross sections increase. Addition of K + ions decreases the strand break cross section for all sequences that are able to fold G-quadruplexes or G-intermediates, whereas the strand break cross section for the intermixed sequence remains unchanged. These results indicate that telomeric DNA is rather sensitive towards low-energy electron-induced strand breakage suggesting significant telomere shortening that can also occur during cancer radiation therapy. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. [Construction and preliminary analysis of a full-length cDNA library for Paris polyphylla var. yunnanensis].

    Science.gov (United States)

    Zhao, Shuang; Dong, Xu; Ma, Teng

    2014-01-01

    A full-length cDNA library of Paris polyphylla var. yunnanensis was constructed in order to research the genes relating to growing development and the genes regulation of its secondary metabolite biosynthesis. The total RNA was extracted from Paris polyphylla var. yunnanensis using modified Trizol method. The SMART (switching mechanism at 5' end of RNA transcript )technology was appliedl to construct the full-length cDNA library. The library titer,recombinant rate and length of insert fragments were determined,the sequences of the library were analyzed by Blastx and were compared to GenBank database. The capacity of the library was 2. 5 x 107 cfu/mL, the recombinant rate was 98.5% and the average size of the inserted fragment was 1.5 kb. 9 ESTs (Expressed Sequence Tags) were relating to growing development and 5 ESTs were relating to regulation of secondary metabolite biosynthesis among 149 ESTs obtained from 192 clones sequenced. A full-length cDNA library of Paris polyphylla var. yunnanensis is constructed by SMART technology successfully, and the library has enough capacity, high recombinant rate and long insert fragment for the further research to screen and identify the functional genes of Paris polyphylla var. yunnanensis.

  11. Primary structure of a lipoxygenase from barley grain as deduced from its cDNA sequence

    NARCIS (Netherlands)

    Mechelen, J.R. van; Smits, M.; Douma, A.C.; Rouster, J.; Cameron-Mills, V.; Heidekamp, F.; Valk, B.E.

    1995-01-01

    A full length cDNA sequence for a barley grain lipoxygenase was obtained. It includes a 5' untranslated region of 69 nucleotides, an open reading frame of 2586 nucleotides encoding a protein of 862 amino acid residues and a 3' untranslated region of 142 nucleotides. The molecular mass of the encoded

  12. Isolation and characterization of full-length cDNA clones coding for cholinesterase from fetal human tissues

    International Nuclear Information System (INIS)

    Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.; Goldberg, O.; Soreq, H.

    1987-01-01

    To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from λgt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A) + RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species

  13. [Study on factors influencing DNA sequencing by automatic genetic analyzer].

    Science.gov (United States)

    Yan, Shaofei; Wang, Wei; Xu, Jin; Bai, Li; Gan, Xin; Li, Fengqin

    2015-05-01

    To acquire accurate and successful DNA sequencing in a cost-effective way by ABI3500xl automatic genetic analyzer. BigDye was diluted to 8, 16 and 32 times in PCR product sequencing. Three different methods including CENTRI-SEP kit, BigDye cleaning beads and ethanol-NaAc-EDTA were used to purify the sequencing PCR products. The results of DNA sequencing were correct when BigDye was diluted up to 16 times. The misreading of nucleic acid bases was found as BigDye was diluted to 32 times. All three purification methods provided acceptable DNA sequencing results. In terms of method for purification of PCR products, the CENTRI-SEP Kit was the most expensive but time-saving (0.5 h), while ethanol-NaAc-EDTA method was the most economical but time-consuming (2 h). The BigDye cleaning beads method was of a suitable purification time (1 h) but not fit for high-throughput DNA sequencing. BigDye should be diluted up to 16 times in DNA sequencing by ABI3500xl DNA analyzer. Although all three purification methods may promise DNA sequencing results with good quality, it is necessary to choose an appropriate one to keep the balance between time and cost on the basis of the lab condition.

  14. Structural biology of disease-associated repetitive DNA sequences and protein-DNA complexes involved in DNA damage and repair

    Energy Technology Data Exchange (ETDEWEB)

    Gupta, G.; Santhana Mariappan, S.V.; Chen, X.; Catasti, P.; Silks, L.A. III; Moyzis, R.K.; Bradbury, E.M.; Garcia, A.E.

    1997-07-01

    This project is aimed at formulating the sequence-structure-function correlations of various microsatellites in the human (and other eukaryotic) genomes. Here the authors have been able to develop and apply structure biology tools to understand the following: the molecular mechanism of length polymorphism microsatellites; the molecular mechanism by which the microsatellites in the noncoding regions alter the regulation of the associated gene; and finally, the molecular mechanism by which the expansion of these microsatellites impairs gene expression and causes the disease. Their multidisciplinary structural biology approach is quantitative and can be applied to all coding and noncoding DNA sequences associated with any gene. Both NIH and DOE are interested in developing quantitative tools for understanding the function of various human genes for prevention against diseases caused by genetic and environmental effects.

  15. Adenoviral DNA replication: DNA sequences and enzymes required for initiation in vitro

    International Nuclear Information System (INIS)

    Stillman, B.W.; Tamanoi, F.

    1983-01-01

    In this paper evidence is provided that the 140,000-dalton DNA polymerase is encoded by the adenoviral genome and is required for the initiation of DNA replication in vitro. The DNA sequences in the template DNA that are required for the initiation of replication have also been identified, using both plasmid DNAs and synthetic oligodeoxyribonucleotides. 48 references, 7 figures, 1 table

  16. Advanced microinstrumentation for rapid DNA sequencing and large DNA fragment separation

    Energy Technology Data Exchange (ETDEWEB)

    Balch, J.; Davidson, J.; Brewer, L.; Gingrich, J.; Koo, J.; Mariella, R.; Carrano, A.

    1995-01-25

    Our efforts to develop novel technology for a rapid DNA sequencer and large fragment analysis system based upon gel electrophoresis are described. We are using microfabrication technology to build dense arrays of high speed micro electrophoresis lanes that will ultimately increase the sequencing rate of DNA by at least 100 times the rate of current sequencers. We have demonstrated high resolution DNA fragment separation needed for sequencing in polyacrylamide microgels formed in glass microchannels. We have built prototype arrays of microchannels having up to 48 channels. Significant progress has also been made in developing a sensitive fluorescence detection system based upon a confocal microscope design that will enable the diagnostics and detection of DNA fragments in ultrathin microchannel gels. Development of a rapid DNA sequencer and fragment analysis system will have a major impact on future DNA instrumentation used in clinical, molecular and forensic analysis of DNA fragments.

  17. An auditory display tool for DNA sequence analysis.

    Science.gov (United States)

    Temple, Mark D

    2017-04-24

    DNA Sonification refers to the use of an auditory display to convey the information content of DNA sequence data. Six sonification algorithms are presented that each produce an auditory display. These algorithms are logically designed from the simple through to the more complex. Three of these parse individual nucleotides, nucleotide pairs or codons into musical notes to give rise to 4, 16 or 64 notes, respectively. Codons may also be parsed degenerately into 20 notes with respect to the genetic code. Lastly nucleotide pairs can be parsed as two separate frames or codons can be parsed as three reading frames giving rise to multiple streams of audio. The most informative sonification algorithm reads the DNA sequence as codons in three reading frames to produce three concurrent streams of audio in an auditory display. This approach is advantageous since start and stop codons in either frame have a direct affect to start or stop the audio in that frame, leaving the other frames unaffected. Using these methods, DNA sequences such as open reading frames or repetitive DNA sequences can be distinguished from one another. These sonification tools are available through a webpage interface in which an input DNA sequence can be processed in real time to produce an auditory display playable directly within the browser. The potential of this approach as an analytical tool is discussed with reference to auditory displays derived from test sequences including simple nucleotide sequences, repetitive DNA sequences and coding or non-coding genes. This study presents a proof-of-concept that some properties of a DNA sequence can be identified through sonification alone and argues for their inclusion within the toolkit of DNA sequence browsers as an adjunct to existing visual and analytical tools.

  18. Lack of association of colonic epithelium telomere length and oxidative DNA damage in Type 2 diabetes under good metabolic control

    Directory of Open Access Journals (Sweden)

    Kennedy Hugh

    2008-10-01

    Full Text Available Abstract Background Telomeres are DNA repeat sequences necessary for DNA replication which shorten at cell division at a rate directly related to levels of oxidative stress. Critical telomere shortening predisposes to cell senescence and to epithelial malignancies. Type 2 diabetes is characterised by increased oxidative DNA damage, telomere attrition, and an increased risk of colonic malignancy. We hypothesised that the colonic mucosa in Type 2 diabetes would be characterised by increased DNA damage and telomere shortening. Methods We examined telomere length (by flow fluorescent in situ hybridization and oxidative DNA damage (flow cytometry of 8 – oxoguanosine in the colonic mucosal cells of subjects with type 2 diabetes (n = 10; mean age 62.2 years, mean HbA1c 6.9% and 22 matched control subjects. No colonic pathology was apparent in these subjects at routine gastrointestinal investigations. Results Mean colonic epithelial telomere length in the diabetes group was not significantly different from controls (10.6 [3.6] vs. 12.1 [3.4] Molecular Equivalent of Soluble Fluorochrome Units [MESF]; P = 0.5. Levels of oxidative DNA damage were similar in both T2DM and control groups (2.6 [0.6] vs. 2.5 [0.6] Mean Fluorescent Intensity [MFI]; P = 0.7. There was no significant relationship between oxidative DNA damage and telomere length in either group (both p > 0.1. Conclusion Colonic epithelium in Type 2 diabetes does not differ significantly from control colonic epithelium in oxidative DNA damage or telomere length. There is no evidence in this study for increased oxidative DNA damage or significant telomere attrition in colonic mucosa as a carcinogenic mechanism.

  19. An automated annotation tool for genomic DNA sequences using ...

    Indian Academy of Sciences (India)

    Unknown

    Introduction. DNA sequencing has evolved from a complicated labo- ratory process to an automated technique using high- throughput sequencers with fluorescent-dye-based chemistry. This technological advance coupled with the replacement of the traditional mapping and sequencing of clones in series to an integrated ...

  20. Illumina Sequencing of Bisulfite-Converted DNA Libraries.

    Science.gov (United States)

    Lizardi, Paul M; Yan, Qin; Wajapeyee, Narendra

    2017-11-01

    Here we describe a standard MethylC-seq protocol using single-read sequencing on an Illumina Genome Analyzer II platform. The protocol involves ligation of methylated sequencing adaptors to sonicated genomic DNA, gel purification, sodium bisulfite conversion, polymerase chain reaction (PCR) amplification, and sequencing. © 2017 Cold Spring Harbor Laboratory Press.

  1. Simulations Using Random-Generated DNA and RNA Sequences

    Science.gov (United States)

    Bryce, C. F. A.

    1977-01-01

    Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…

  2. Multiplexed Sequence Encoding: A Framework for DNA Communication

    Science.gov (United States)

    Zakeri, Bijan; Carr, Peter A.; Lu, Timothy K.

    2016-01-01

    Synthetic DNA has great propensity for efficiently and stably storing non-biological information. With DNA writing and reading technologies rapidly advancing, new applications for synthetic DNA are emerging in data storage and communication. Traditionally, DNA communication has focused on the encoding and transfer of complete sets of information. Here, we explore the use of DNA for the communication of short messages that are fragmented across multiple distinct DNA molecules. We identified three pivotal points in a communication—data encoding, data transfer & data extraction—and developed novel tools to enable communication via molecules of DNA. To address data encoding, we designed DNA-based individualized keyboards (iKeys) to convert plaintext into DNA, while reducing the occurrence of DNA homopolymers to improve synthesis and sequencing processes. To address data transfer, we implemented a secret-sharing system—Multiplexed Sequence Encoding (MuSE)—that conceals messages between multiple distinct DNA molecules, requiring a combination key to reveal messages. To address data extraction, we achieved the first instance of chromatogram patterning through multiplexed sequencing, thereby enabling a new method for data extraction. We envision these approaches will enable more widespread communication of information via DNA. PMID:27050646

  3. A mathematical model and numerical method for thermoelectric DNA sequencing

    Science.gov (United States)

    Shi, Liwei; Guilbeau, Eric J.; Nestorova, Gergana; Dai, Weizhong

    2014-05-01

    Single nucleotide polymorphisms (SNPs) are single base pair variations within the genome that are important indicators of genetic predisposition towards specific diseases. This study explores the feasibility of SNP detection using a thermoelectric sequencing method that measures the heat released when DNA polymerase inserts a deoxyribonucleoside triphosphate into a DNA strand. We propose a three-dimensional mathematical model that governs the DNA sequencing device with a reaction zone that contains DNA template/primer complex immobilized to the surface of the lower channel wall. The model is then solved numerically. Concentrations of reactants and the temperature distribution are obtained. Results indicate that when the nucleoside is complementary to the next base in the DNA template, polymerization occurs lengthening the complementary polymer and releasing thermal energy with a measurable temperature change, implying that the thermoelectric conceptual device for sequencing DNA may be feasible for identifying specific genes in individuals.

  4. Full-length cDNA cloning of Toll-like receptor 4 in dogs and cats.

    Science.gov (United States)

    Asahina, Yuka; Yoshioka, Noriyuki; Kano, Rui; Moritomo, Tadaaki; Hasegawa, Atsuhiko

    2003-12-15

    In the present study, full length of canine and feline Toll-like receptor 4 (TLR4) cDNAs were sequenced, and the expression of canine and feline TLR4 mRNAs in dog and cat tissues were investigated. The full-length cDNA of TLR4 of dog and cat was 2709 bp encoding 637 amino acids and 3113 bp encoding 833 amino acids, respectively. The similarity of canine and feline TLR4 were 83.6% at the nucleotide sequence level and 77.6% at the amino acid sequence level. At the amino acid sequence level, canine and feline TLR4 showed sequence similarities of approximately 62-78% with those of Homo sapiens, Mus musculus, Bos taurus and Equus caballus, respectively. Southern hybridization analyses with TLR4 cDNA probes gave one distinct band in BamHI, EcoRI and HindIII digests of genomic DNA from dogs and cats, respectively, indicating the likely presence of a single TLR4 gene in each species. By RT-PCR analysis, mRNA of canine TLR4 was expressed highly in peripheral blood leukocytes (PBL), moderately in spleen, stomach and small intestine, at low levels in liver, with no expression in kidney, large intestine and skin. On the other hand, mRNA of feline TLR4 was expressed highly in lung, bladder and PBL, moderately in kidney, liver, spleen and large intestine and at low levels in pancreas and small intestine.

  5. Simple Elastic Network Models for Exhaustive Analysis of Long Double-Stranded DNA Dynamics with Sequence Geometry Dependence.

    Directory of Open Access Journals (Sweden)

    Shuhei Isami

    Full Text Available Simple elastic network models of DNA were developed to reveal the structure-dynamics relationships for several nucleotide sequences. First, we propose a simple all-atom elastic network model of DNA that can explain the profiles of temperature factors for several crystal structures of DNA. Second, we propose a coarse-grained elastic network model of DNA, where each nucleotide is described only by one node. This model could effectively reproduce the detailed dynamics obtained with the all-atom elastic network model according to the sequence-dependent geometry. Through normal-mode analysis for the coarse-grained elastic network model, we exhaustively analyzed the dynamic features of a large number of long DNA sequences, approximately ∼150 bp in length. These analyses revealed positive correlations between the nucleosome-forming abilities and the inter-strand fluctuation strength of double-stranded DNA for several DNA sequences.

  6. Full-length genome sequence of Mossman virus, a novel paramyxovirus isolated from rodents in Australia

    International Nuclear Information System (INIS)

    Miller, Philippa J.; Boyle, David B.; Eaton, Bryan T.; Wang Linfa

    2003-01-01

    Mossman virus (MoV) was isolated on two occasions from wild rats trapped in Queensland, Australia, during the early 1970s. Together with Nariva virus and J-virus MoV belongs to a group of novel paramyxoviruses isolated from rodents during the last 40 years, none of which had been characterized at the molecular level until now. cDNA subtraction strategies used to isolate virus-specific cDNA derived from both MoV-infected cells and crude MoV pellets were pivotal steps in rapid characterization of the complete genome sequence. Analysis of the full-length genome and its encoded proteins confirmed that MoV is a novel member of the subfamily Paramyxovirinae which cannot be assigned to an existing genus. MoV appears to be more closely related to another unclassified paramyxovirus Tupaia paramyxovirus (TPMV), isolated from the tree shrew Tupaia belangeri. Together with Salem virus (SalV), a further unclassified paramyxovirus that was isolated from a horse, MoV and TPMV make up a new collection of paramyxoviruses situated evolutionally between the genus Morbillivirus and the newly established genus Henipavirus

  7. Laser desorption mass spectrometry for DNA analysis and sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Chen, C.H.; Taranenko, N.I.; Tang, K.; Allman, S.L.

    1995-03-01

    Laser desorption mass spectrometry has been considered as a potential new method for fast DNA sequencing. Our approach is to use matrix-assisted laser desorption to produce parent ions of DNA segments and a time-of-flight mass spectrometer to identify the sizes of DNA segments. Thus, the approach is similar to gel electrophoresis sequencing using Sanger`s enzymatic method. However, gel, radioactive tagging, and dye labeling are not required. In addition, the sequencing process can possibly be finished within a few hundred microseconds instead of hours and days. In order to use mass spectrometry for fast DNA sequencing, the following three criteria need to be satisfied. They are (1) detection of large DNA segments, (2) sensitivity reaching the femtomole region, and (3) mass resolution good enough to separate DNA segments of a single nucleotide difference. It has been very difficult to detect large DNA segments by mass spectrometry before due to the fragile chemical properties of DNA and low detection sensitivity of DNA ions. We discovered several new matrices to increase the production of DNA ions. By innovative design of a mass spectrometer, we can increase the ion energy up to 45 KeV to enhance the detection sensitivity. Recently, we succeeded in detecting a DNA segment with 500 nucleotides. The sensitivity was 100 femtomole. Thus, we have fulfilled two key criteria for using mass spectrometry for fast DNA sequencing. The major effort in the near future is to improve the resolution. Different approaches are being pursued. When high resolution of mass spectrometry can be achieved and automation of sample preparation is developed, the sequencing speed to reach 500 megabases per year can be feasible.

  8. Backbone assignment of the binary complex of the full length Sulfolobus solfataricus DNA polymerase IV and DNA.

    Science.gov (United States)

    Lee, Eunjeong; Fowler, Jason D; Suo, Zucai; Wu, Zhengrong

    2017-04-01

    Sulfolobus solfataricus DNA polymerase IV (Dpo4), a model Y-family DNA polymerase, bypasses a wide range of DNA lesions in vitro and in vivo. In this paper, we report the backbone chemical shift assignments of the full length Dpo4 in its binary complex with a 14/14-mer DNA substrate. Upon DNA binding, several β-stranded regions in the isolated catalytic core and little finger/linker fragments of Dpo4 become more structured. This work serves as a foundation for our ongoing investigation of conformational dynamics of Dpo4 and future determination of the first solution structures of a DNA polymerase and its binary and ternary complexes.

  9. Cloning, sequencing, and expression of cDNA for human β-glucuronidase

    International Nuclear Information System (INIS)

    Oshima, A.; Kyle, J.W.; Miller, R.D.

    1987-01-01

    The authors report here the cDNA sequence for human placental β-glucuronidase (β-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH 2 -terminal amino acid sequence determined for human spleen β-glucuronidase agreed with that inferred from the DNA sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human β-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human β-glucuronidase, demonstrate the existence of two populations of mRNA for β-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length

  10. An Optimal Seed Based Compression Algorithm for DNA Sequences

    Directory of Open Access Journals (Sweden)

    Pamela Vinitha Eric

    2016-01-01

    Full Text Available This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.

  11. Statistical assignment of DNA sequences using Bayesian phylogenetics

    DEFF Research Database (Denmark)

    Terkelsen, Kasper Munch; Boomsma, Wouter Krogh; Huelsenbeck, John P.

    2008-01-01

    We provide a new automated statistical method for DNA barcoding based on a Bayesian phylogenetic analysis. The method is based on automated database sequence retrieval, alignment, and phylogenetic analysis using a custom-built program for Bayesian phylogenetic analysis. We show on real data......-analysis of previously published ancient DNA data and show that, with high statistical confidence, most of the published sequences are in fact of Neanderthal origin. However, there are several cases of chimeric sequences that are comprised of a combination of both Neanderthal and modern human DNA....

  12. An Optimal Seed Based Compression Algorithm for DNA Sequences.

    Science.gov (United States)

    Eric, Pamela Vinitha; Gopalakrishnan, Gopakumar; Karunakaran, Muralikrishnan

    2016-01-01

    This paper proposes a seed based lossless compression algorithm to compress a DNA sequence which uses a substitution method that is similar to the LempelZiv compression scheme. The proposed method exploits the repetition structures that are inherent in DNA sequences by creating an offline dictionary which contains all such repeats along with the details of mismatches. By ensuring that only promising mismatches are allowed, the method achieves a compression ratio that is at par or better than the existing lossless DNA sequence compression algorithms.

  13. Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia.

    Science.gov (United States)

    Carninci, Piero; Waki, Kazunori; Shiraki, Toshiyuki; Konno, Hideaki; Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Arakawa, Takahiro; Ishii, Yoshiyuki; Sasaki, Daisuke; Bono, Hidemasa; Kondo, Shinji; Sugahara, Yuichi; Saito, Rintaro; Osato, Naoki; Fukuda, Shiro; Sato, Kenjiro; Watahiki, Akira; Hirozane-Kishikawa, Tomoko; Nakamura, Mari; Shibata, Yuko; Yasunishi, Ayako; Kikuchi, Noriko; Yoshiki, Atsushi; Kusakabe, Moriaki; Gustincich, Stefano; Beisel, Kirk; Pavan, William; Aidinis, Vassilis; Nakagawara, Akira; Held, William A; Iwata, Hiroo; Kono, Tomohiro; Nakauchi, Hiromitsu; Lyons, Paul; Wells, Christine; Hume, David A; Fagiolini, Michela; Hensch, Takao K; Brinkmeier, Michelle; Camper, Sally; Hirota, Junji; Mombaerts, Peter; Muramatsu, Masami; Okazaki, Yasushi; Kawai, Jun; Hayashizaki, Yoshihide

    2003-06-01

    We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3'-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5' end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5'-end clusters identify regions that are potential promoters for 8637 known genes and 5'-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete.

  14. Transcriptional blockages in a cell-free system by sequence-selective DNA alkylating agents.

    Science.gov (United States)

    Ferguson, L R; Liu, A P; Denny, W A; Cullinane, C; Talarico, T; Phillips, D R

    2000-04-14

    There is considerable interest in DNA sequence-selective DNA-binding drugs as potential inhibitors of gene expression. Five compounds with distinctly different base pair specificities were compared in their effects on the formation and elongation of the transcription complex from the lac UV5 promoter in a cell-free system. All were tested at drug levels which killed 90% of cells in a clonogenic survival assay. Cisplatin, a selective alkylator at purine residues, inhibited transcription, decreasing the full-length transcript, and causing blockage at a number of GG or AG sequences, making it probable that intrastrand crosslinks are the blocking lesions. A cyclopropylindoline known to be an A-specific alkylator also inhibited transcription, with blocks at adenines. The aniline mustard chlorambucil, that targets primarily G but also A sequences, was also effective in blocking the formation of full-length transcripts. It produced transcription blocks either at, or one base prior to, AA or GG sequences, suggesting that intrastrand crosslinks could again be involved. The non-alkylating DNA minor groove binder Hoechst 33342 (a bisbenzimidazole) blocked formation of the full-length transcript, but without creating specific blockage sites. A bisbenzimidazole-linked aniline mustard analogue was a more effective transcription inhibitor than either chlorambucil or Hoechst 33342, with different blockage sites occurring immediately as compared with 2 h after incubation. The blockages were either immediately prior to AA or GG residues, or four to five base pairs prior to such sites, a pattern not predicted from in vitro DNA-binding studies. Minor groove DNA-binding ligands are of particular interest as inhibitors of gene expression, since they have the potential ability to bind selectively to long sequences of DNA. The results suggest that the bisbenzimidazole-linked mustard does cause alkylation and transcription blockage at novel DNA sites. in addition to sites characteristic of

  15. Characteristics of alternating current hopping conductivity in DNA sequences

    International Nuclear Information System (INIS)

    Song-Shan, Ma; Hui, Xu; Huan-You, Wang; Rui, Guo

    2009-01-01

    This paper presents a model to describe alternating current (AC) conductivity of DNA sequences, in which DNA is considered as a one-dimensional (1D) disordered system, and electrons transport via hopping between localized states. It finds that AC conductivity in DNA sequences increases as the frequency of the external electric field rises, and it takes the form of ø ac (ω) ∼ ω 2 ln 2 (1/ω). Also AC conductivity of DNA sequences increases with the increase of temperature, this phenomenon presents characteristics of weak temperature-dependence. Meanwhile, the AC conductivity in an off-diagonally correlated case is much larger than that in the uncorrelated case of the Anderson limit in low temperatures, which indicates that the off-diagonal correlations in DNA sequences have a great effect on the AC conductivity, while at high temperature the off-diagonal correlations no longer play a vital role in electric transport. In addition, the proportion of nucleotide pairs p also plays an important role in AC electron transport of DNA sequences. For p < 0.5, the conductivity of DNA sequence decreases with the increase of p, while for p ≥ 0.5, the conductivity increases with the increase of p. (cross-disciplinary physics and related areas of science and technology)

  16. Characterization of six rat strains (Rattus norvegicus by mitochondrial DNA restriction fragment length polymorphism

    Directory of Open Access Journals (Sweden)

    Hilsdorf A.W.

    1999-01-01

    Full Text Available Restriction fragment length polymorphism (RFLP was used to examine the extent of mtDNA polymorphism among six strains of rats (Rattus norvegicus - Wistar, Wistar Munich, Brown Norway, Wistar Kyoto, SHR and SHR-SP. A survey of 26 restriction enzymes has revealed a low level of genetic divergence among strains. The sites of cleavage by EcoRI, NcoI and XmnI were shown to be polymorphic. The use of these three enzymes allows the 6 strains to be classified into 4 haplotypes and identifies specific markers for each one. The percentage of sequence divergence among all pairs of haplotypes ranged from 0.035 to 0.33%, which is the result of a severe population constriction undergone by the strains. These haplotypes are easily demonstrable and therefore RFLP analysis can be employed for genetic monitoring of rats within animal facilities or among different laboratories.

  17. Plasmonic Nanopores for Trapping, Controlling Displacement, and Sequencing of DNA.

    Science.gov (United States)

    Belkin, Maxim; Chao, Shu-Han; Jonsson, Magnus P; Dekker, Cees; Aksimentiev, Aleksei

    2015-11-24

    With the aim of developing a DNA sequencing methodology, we theoretically examine the feasibility of using nanoplasmonics to control the translocation of a DNA molecule through a solid-state nanopore and to read off sequence information using surface-enhanced Raman spectroscopy. Using molecular dynamics simulations, we show that high-intensity optical hot spots produced by a metallic nanostructure can arrest DNA translocation through a solid-state nanopore, thus providing a physical knob for controlling the DNA speed. Switching the plasmonic field on and off can displace the DNA molecule in discrete steps, sequentially exposing neighboring fragments of a DNA molecule to the pore as well as to the plasmonic hot spot. Surface-enhanced Raman scattering from the exposed DNA fragments contains information about their nucleotide composition, possibly allowing the identification of the nucleotide sequence of a DNA molecule transported through the hot spot. The principles of plasmonic nanopore sequencing can be extended to detection of DNA modifications and RNA characterization.

  18. Nucleotide sequence analysis of regions of adenovirus 5 DNA containing the origins of DNA replication

    International Nuclear Information System (INIS)

    Steenbergh, P.H.

    1979-01-01

    The purpose of the investigations described is the determination of nucleotide sequences at the molecular ends of the linear adenovirus type 5 DNA. Knowledge of the primary structure at the termini of this DNA molecule is of particular interest in the study of the mechanism of replication of adenovirus DNA. The initiation- and termination sites of adenovirus DNA replication are located at the ends of the DNA molecule. (Auth.)

  19. An Algorithm for Sequencing by Hybridization Based on an Alternating DNA Chip.

    Science.gov (United States)

    Radom, Marcin; Formanowicz, Piotr

    2017-02-28

    Sequencing by hybridization allows the reconstruction of the DNA string of a given length from smaller fragments. These fragments are obtained in the hybridization experiment in which the DNA hybridizes to a DNA chip. In a classical approach, the chip consists of all oligonucleotides of a given length, with only one type of oligonucleotide for each probe of the chip. In this paper, we propose an algorithm solving the non-classical case of SBH, where the chip probes consist set of oligonucleotides described by some specific pattern. We will present the definition of such a non-classical DNA chip and the algorithm solving a sequencing problem related to such a chip. Unlike recent metaheuristic approaches to the classical SBH problem, the proposed algorithm tries to find an exact sequence, and even in the presence of all the hybridization errors in spectrum is very often able to do so in a short time. If only negative errors from repetitions are allowed, then the algorithm is able to reconstruct sequences having length of thousands nucleotides.

  20. ATRF Houses the Latest DNA Sequencing Technologies | Poster

    Science.gov (United States)

    By Ashley DeVine, Staff Writer By the end of October, the Advanced Technology Research Facility (ATRF) will be one of the few facilities in the world to house all of the latest DNA sequencing technologies.

  1. Mouse brain full-length cDNA library construction by negative selection of intact mRNAs.

    Science.gov (United States)

    Wu, Ning; Wu, Huijuan; Li, Yandong; Matand, Kanyand

    2010-06-01

    Synthesis of full-length cDNA libraries is an essential step for the study of gene function. The method for selecting the intact mRNA directly affects the number of full-length transcripts. We have developed a novel method for intact mRNA selection based on the elimination of uncapped mRNAs. A negative-selection strategy that removes both uncapped mRNA and other non-mRNA molecules that present a phosphate at the 5'-end has been applied in the mRNA purification procedures. Briefly, after performing a standard mRNA purification, a biotinylated oligoribonucleotide is ligated to the 5-end phosphate of uncapped mRNAs. Streptavidin extraction is then performed to remove truncated and non-mRNAs from the intact mRNAs. By comparing random sequencing results of mouse brain full-length and standard cDNA libraries, there was a significant increase of full-length clones with the modified procedure. The results showed that the full-length library contained more than 68% full-length clones with the 5'-end positions ranging between -485 to +100 compared to the standard library with 33% of full-length clones and 5'-end positions ranging between -233 to +100. The data were analyzed using the t-test with the significance level set at plibraries in both 5'-end position and mRNA size (p<0.05).

  2. DNA sequencing using polymerase substrate-binding kinetics.

    Science.gov (United States)

    Previte, Michael John Robert; Zhou, Chunhong; Kellinger, Matthew; Pantoja, Rigo; Chen, Cheng-Yao; Shi, Jin; Wang, BeiBei; Kia, Amirali; Etchin, Sergey; Vieceli, John; Nikoomanzar, Ali; Bomati, Erin; Gloeckner, Christian; Ronaghi, Mostafa; He, Molly Min

    2015-01-23

    Next-generation sequencing (NGS) has transformed genomic research by decreasing the cost of sequencing. However, whole-genome sequencing is still costly and complex for diagnostics purposes. In the clinical space, targeted sequencing has the advantage of allowing researchers to focus on specific genes of interest. Routine clinical use of targeted NGS mandates inexpensive instruments, fast turnaround time and an integrated and robust workflow. Here we demonstrate a version of the Sequencing by Synthesis (SBS) chemistry that potentially can become a preferred targeted sequencing method in the clinical space. This sequencing chemistry uses natural nucleotides and is based on real-time recording of the differential polymerase/DNA-binding kinetics in the presence of correct or mismatch nucleotides. This ensemble SBS chemistry has been implemented on an existing Illumina sequencing platform with integrated cluster amplification. We discuss the advantages of this sequencing chemistry for targeted sequencing as well as its limitations for other applications.

  3. Length polymorphism scanning is an efficient approach for revealing chloroplast DNA variation.

    Science.gov (United States)

    Matthew E. Horning; Richard C. Cronn

    2006-01-01

    Phylogeographic and population genetic screens of chloroplast DNA (cpDNA) provide insights into seedbased gene flow in angiosperms, yet studies are frequently hampered by the low mutation rate of this genome. Detection methods for intraspecific variation can be either direct (DNA sequencing) or indirect (PCR-RFLP), although no single method incorporates the best...

  4. Preparation of full-length cDNA libraries: focus on metazoans.

    Science.gov (United States)

    Harada, Masako; Hayashizaki, Yoshihide

    2009-01-01

    Critical steps in a cDNA library preparation include efficient cDNA synthesis, selection of full-length cDNAs, normalizing their abundance, and the subtraction of redundant transcripts. The use of trehalose and sorbiol stabilizes the activity of the reverse transcriptase leading to efficient cDNA synthesis and the cap-trapping method is used for efficient full-length cDNA selection. Through the incorporation of additional normalization and subtraction steps that eliminate the size bias and expressed gene frequency, it is possible to attain cDNA libraries that include larger or rarely expressed genes. This chapter describes an efficient method to construct a full-length cDNA library, with a focus on metazoan samples.

  5. Capillary gel electrophoresis for rapid, high resolution DNA sequencing.

    OpenAIRE

    Swerdlow, H; Gesteland, R

    1990-01-01

    Capillary gel electrophoresis has been demonstrated for the separation and detection of DNA sequencing samples. Enzymatic dideoxy nucleotide chain termination was employed, using fluorescently tagged oligonucleotide primers and laser based on-column detection (limit of detection is 6,000 molecules per peak). Capillary gel separations were shown to be three times faster, with better resolution (2.4 x), and higher separation efficiency (5.4 x) than a conventional automated slab gel DNA sequenci...

  6. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments.

    Science.gov (United States)

    Dabney, Jesse; Knapp, Michael; Glocke, Isabelle; Gansauge, Marie-Theres; Weihmann, Antje; Nickel, Birgit; Valdiosera, Cristina; García, Nuria; Pääbo, Svante; Arsuaga, Juan-Luis; Meyer, Matthias

    2013-09-24

    Although an inverse relationship is expected in ancient DNA samples between the number of surviving DNA fragments and their length, ancient DNA sequencing libraries are strikingly deficient in molecules shorter than 40 bp. We find that a loss of short molecules can occur during DNA extraction and present an improved silica-based extraction protocol that enables their efficient retrieval. In combination with single-stranded DNA library preparation, this method enabled us to reconstruct the mitochondrial genome sequence from a Middle Pleistocene cave bear (Ursus deningeri) bone excavated at Sima de los Huesos in the Sierra de Atapuerca, Spain. Phylogenetic reconstructions indicate that the U. deningeri sequence forms an early diverging sister lineage to all Western European Late Pleistocene cave bears. Our results prove that authentic ancient DNA can be preserved for hundreds of thousand years outside of permafrost. Moreover, the techniques presented enable the retrieval of phylogenetically informative sequences from samples in which virtually all DNA is diminished to fragments shorter than 50 bp.

  7. An extended sequence specificity for UV-induced DNA damage.

    Science.gov (United States)

    Chung, Long H; Murray, Vincent

    2018-01-01

    The sequence specificity of UV-induced DNA damage was determined with a higher precision and accuracy than previously reported. UV light induces two major damage adducts: cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). Employing capillary electrophoresis with laser-induced fluorescence and taking advantages of the distinct properties of the CPDs and 6-4PPs, we studied the sequence specificity of UV-induced DNA damage in a purified DNA sequence using two approaches: end-labelling and a polymerase stop/linear amplification assay. A mitochondrial DNA sequence that contained a random nucleotide composition was employed as the target DNA sequence. With previous methodology, the UV sequence specificity was determined at a dinucleotide or trinucleotide level; however, in this paper, we have extended the UV sequence specificity to a hexanucleotide level. With the end-labelling technique (for 6-4PPs), the consensus sequence was found to be 5'-GCTC*AC (where C* is the breakage site); while with the linear amplification procedure, it was 5'-TCTT*AC. With end-labelling, the dinucleotide frequency of occurrence was highest for 5'-TC*, 5'-TT* and 5'-CC*; whereas it was 5'-TT* for linear amplification. The influence of neighbouring nucleotides on the degree of UV-induced DNA damage was also examined. The core sequences consisted of pyrimidine nucleotides 5'-CTC* and 5'-CTT* while an A at position "1" and C at position "2" enhanced UV-induced DNA damage. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  8. Novel Graphical Representation and Numerical Characterization of DNA Sequences

    Directory of Open Access Journals (Sweden)

    Chun Li

    2016-02-01

    Full Text Available Modern sequencing technique has provided a wealth of data on DNA sequences, which has made the analysis and comparison of sequences a very important but difficult task. In this paper, by regarding the dinucleotide as a 2-combination of the multiset { ∞ · A , ∞ · G , ∞ · C , ∞ · T } , a novel 3-D graphical representation of a DNA sequence is proposed, and its projections on planes (x,y, (y,z and (x,z are also discussed. In addition, based on the idea of “piecewise function”, a cell-based descriptor vector is constructed to numerically characterize the DNA sequence. The utility of our approach is illustrated by the examination of phylogenetic analysis on four datasets.

  9. PREDICTION OF CHROMATIN STATES USING DNA SEQUENCE PROPERTIES

    KAUST Repository

    Bahabri, Rihab R.

    2013-06-01

    Activities of DNA are to a great extent controlled epigenetically through the internal struc- ture of chromatin. This structure is dynamic and is influenced by different modifications of histone proteins. Various combinations of epigenetic modification of histones pinpoint to different functional regions of the DNA determining the so-called chromatin states. How- ever, the characterization of chromatin states by the DNA sequence properties remains largely unknown. In this study we aim to explore whether DNA sequence patterns in the human genome can characterize different chromatin states. Using DNA sequence motifs we built binary classifiers for each chromatic state to eval- uate whether a given genomic sequence is a good candidate for belonging to a particular chromatin state. Of four classification algorithms (C4.5, Naive Bayes, Random Forest, and SVM) used for this purpose, the decision tree based classifiers (C4.5 and Random Forest) yielded best results among those we evaluated. Our results suggest that in general these models lack sufficient predictive power, although for four chromatin states (insulators, het- erochromatin, and two types of copy number variation) we found that presence of certain motifs in DNA sequences does imply an increased probability that such a sequence is one of these chromatin states.

  10. Maternal Plasma DNA and RNA Sequencing for Prenatal Testing

    NARCIS (Netherlands)

    Tamminga, Saskia; van Maarle, Merel; Henneman, Lidewij; Oudejans, Cees B. M.; Cornel, Martina C.; Sistermans, Erik A.

    2016-01-01

    Cell-free DNA (cf DNA) testing has recently become indispensable in diagnostic testing and screening. In the prenatal setting, this type of testing is often called noninvasive prenatal testing (NIPT). With a number of techniques, using either next-generation sequencing or single nucleotide

  11. DNA Sequences of RAPD Fragments in the Egyptian cotton ...

    African Journals Online (AJOL)

    Random Amplified Polymorphic DNAs (RAPDs) is a DNA polymorphism assay based on the amplification of random DNA segments with single primers of arbitrary nucleotide sequence. Despite the fact that the RAPD technique has become a very powerful tool and has found use in numerous applications, yet, the nature of ...

  12. Effects of sequence on DNA wrapping around histones

    Science.gov (United States)

    Ortiz, Vanessa

    2011-03-01

    A central question in biophysics is whether the sequence of a DNA strand affects its mechanical properties. In epigenetics, these are thought to influence nucleosome positioning and gene expression. Theoretical and experimental attempts to answer this question have been hindered by an inability to directly resolve DNA structure and dynamics at the base-pair level. In our previous studies we used a detailed model of DNA to measure the effects of sequence on the stability of naked DNA under bending. Sequence was shown to influence DNA's ability to form kinks, which arise when certain motifs slide past others to form non-native contacts. Here, we have now included histone-DNA interactions to see if the results obtained for naked DNA are transferable to the problem of nucleosome positioning. Different DNA sequences interacting with the histone protein complex are studied, and their equilibrium and mechanical properties are compared among themselves and with the naked case. NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359).

  13. Efficient and specific internal cleavage of a retroviral palindromic DNA sequence by tetrameric HIV-1 integrase.

    Directory of Open Access Journals (Sweden)

    Olivier Delelis

    Full Text Available BACKGROUND: HIV-1 integrase (IN catalyses the retroviral integration process, removing two nucleotides from each long terminal repeat and inserting the processed viral DNA into the target DNA. It is widely assumed that the strand transfer step has no sequence specificity. However, recently, it has been reported by several groups that integration sites display a preference for palindromic sequences, suggesting that a symmetry in the target DNA may stabilise the tetrameric organisation of IN in the synaptic complex. METHODOLOGY/PRINCIPAL FINDINGS: We assessed the ability of several palindrome-containing sequences to organise tetrameric IN and investigated the ability of IN to catalyse DNA cleavage at internal positions. Only one palindromic sequence was successfully cleaved by IN. Interestingly, this symmetrical sequence corresponded to the 2-LTR junction of retroviral DNA circles-a palindrome similar but not identical to the consensus sequence found at integration sites. This reaction depended strictly on the cognate retroviral sequence of IN and required a full-length wild-type IN. Furthermore, the oligomeric state of IN responsible for this cleavage differed from that involved in the 3'-processing reaction. Palindromic cleavage strictly required the tetrameric form, whereas 3'-processing was efficiently catalysed by a dimer. CONCLUSIONS/SIGNIFICANCE: Our findings suggest that the restriction-like cleavage of palindromic sequences may be a general physiological activity of retroviral INs and that IN tetramerisation is strongly favoured by DNA symmetry, either at the target site for the concerted integration or when the DNA contains the 2-LTR junction in the case of the palindromic internal cleavage.

  14. Length heterogeneity of amplified circular rDNA molecules in oocytes of the house cricket Acheta domesticus (Orthoptera: Gryllidae).

    Science.gov (United States)

    Cave, M D

    1979-02-13

    Amplification of the genes coding for rRNA occurs in the oocytes of a wide variety of organisms. The amplification process appears to be mediated through a rolling-circle mechanism. The approximate molecular weight of the smallest rDNA circles is equivalent to the estimated combined molecular weight of DNA which codes for a single ribosomal RNA precursor molecule and an associated non-transcribed spacer DNA sequence. RNA-DNA hybridization studies carried out on oocytes of the house cricket, Acheta domesticus, suggest that DNA coding for rRNA accounts for only a small fraction of the rDNA satellite, all of which is amplified in the oocyte. In order to test the possibility that the remainder of the amplified rDNA represents spacer and to determine whether a rolling-circle mechanism might also be involved in amplification in A. domesticus oocytes, rDNA was isolated from ovaries of A. domesticus and spread for electron microscopy. A large proportion of the rDNA isolated from ovaries is circular, while main-band DNA and rDNA prepared from other tissues demonstrates few if any circles. The mean size of the smallest rDNA circles is approximately 8 times longer than the length estimated for DNA which codes for 18S and 28 S rRNA. Denaturation mapping shows the rDNA circles to contain two major readily denaturing regions located about equidistant from one another on the circle. Each readily denaturing region accounts for 4--6% of the total DNA in the circle. The fact that only 12% of the average molecule is required to code for A. domesticus 18S and 28S rRNA is consistent with the hybridization data. Considerable size heterogeneity exists in the length of the smallest class of rDNA molecules. In the rDNA of other species such heterogeneity has been shown to reside in the non-transcribed spacer.

  15. Googling DNA sequences on the World Wide Web.

    Science.gov (United States)

    Hajibabaei, Mehrdad; Singer, Gregory A C

    2009-11-10

    New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.

  16. DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

    Science.gov (United States)

    Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

    2012-01-01

    DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.

  17. Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data

    NARCIS (Netherlands)

    Farmery, James H. R.; Smith, Mike L.; Lynch, Andy G.; Huissoon, Aarnoud; Furnell, Abigail; Mead, Adam; Levine, Adam P.; Manzur, Adnan; Thrasher, Adrian; Greenhalgh, Alan; Parker, Alasdair; Sanchis-Juan, Alba; Richter, Alex; Gardham, Alice; Lawrie, Allan; Sohal, Aman; Creaser-Myers, Amanda; Frary, Amy; Greinacher, Andreas; Themistocleous, Andreas; Peacock, Andrew J.; Marshall, Andrew; Mumford, Andrew; Rice, Andrew; Webster, Andrew; Brady, Angie; Koziell, Ania; Manson, Ania; Chandra, Anita; Hensiek, Anke; Veld, Anna Huis In't; Maw, Anna; Kelly, Anne M.; Moore, Anthony; Vonk Noordegraaf, Anton; Attwood, Antony; Herwadkar, Archana; Ghofrani, Ardi; Houweling, Arjan C.; Girerd, Barbara; Furie, Bruce; Treacy, Carmen M.; Millar, Carolyn M.; Sewell, Carrock; Roughley, Catherine; Titterton, Catherine; Williamson, Catherine; Hadinnapola, Charaka; Deshpande, Charu; Toh, Cheng-Hock; Bacchelli, Chiara; Patch, Chris; Geet, Chris Van; Babbs, Christian; Bryson, Christine; Penkett, Christopher J.; Rhodes, Christopher J.; Watt, Christopher; Bethune, Claire; Booth, Claire; Lentaigne, Claire; McJannet, Coleen; Church, Colin; French, Courtney; Samarghitean, Crina; Halmagyi, Csaba; Gale, Daniel; Greene, Daniel; Hart, Daniel; Allsup, David; Bennett, David; Edgar, David; Kiely, David G.; Gosal, David; Perry, David J.; Keeling, David; Montani, David; Shipley, Debbie; Whitehorn, Deborah; Fletcher, Debra; Krishnakumar, Deepa; Grozeva, Detelina; Kumararatne, Dinakantha; Thompson, Dorothy; Josifova, Dragana; Maher, Eamonn; Wong, Edwin K. S.; Murphy, Elaine; Dewhurst, Eleanor; Louka, Eleni; Rosser, Elisabeth; Chalmers, Elizabeth; Colby, Elizabeth; Drewe, Elizabeth; McDermott, Elizabeth; Thomas, Ellen; Staples, Emily; Clement, Emma; Matthews, Emma; Wakeling, Emma; Oksenhendler, Eric; Turro, Ernest; Reid, Evan; Wassmer, Evangeline; Raymond, F. Lucy; Hu, Fengyuan; Kennedy, Fiona; Soubrier, Florent; Flinter, Frances; Kovacs, Gabor; Polwarth, Gary; Ambegaonkar, Gautum; Arno, Gavin; Hudson, Gavin; Woods, Geoff; Coghlan, Gerry; Hayman, Grant; Arumugakani, Gururaj; Schotte, Gwen; Cook, H. Terry; Alachkar, Hana; Lango Allen, Hana; Lango-Allen, Hana; Stark, Hannah; Stauss, Hans; Schulze, Harald; Boggard, Harm J.; Baxendale, Helen; Dolling, Helen; Firth, Helen; Gall, Henning; Watson, Henry; Longhurst, Hilary; Markus, Hugh S.; Watkins, Hugh; Simeoni, Ilenia; Emmerson, Ingrid; Roberts, Irene; Quinti, Isabella; Wanjiku, Ivy; Gibbs, J. Simon R.; Thaventhiran, James; Whitworth, James; Hurst, Jane; Collins, Janine; Suntharalingam, Jay; Payne, Jeanette; Thachil, Jecko; Martin, Jennifer M.; Martin, Jennifer; Carmichael, Jenny; Maimaris, Jesmeen; Paterson, Joan; Pepke-Zaba, Joanna; Heemskerk, Johan W. M.; Gebhart, Johanna; Davis, John; Pasi, John; Bradley, John R.; Wharton, John; Stephens, Jonathan; Rankin, Julia; Anderson, Julie; Vogt, Julie; von Ziegenweldt, Julie; Rehnstrom, Karola; Megy, Karyn; Talks, Kate; Peerlinck, Kathelijne; Yates, Katherine; Freson, Kathleen; Stirrups, Kathleen; Gomez, Keith; Smith, Kenneth G. C.; Carss, Keren; Rue-Albrecht, Kevin; Gilmour, Kimberley; Masati, Larahmie; Scelsi, Laura; Southgate, Laura; Ranganathan, Lavanya; Ginsberg, Lionel; Devlin, Lisa; Willcocks, Lisa; Ormondroyd, Liz; Lorenzo, Lorena; Harper, Lorraine; Allen, Louise; Daugherty, Louise; Chitre, Manali; Kurian, Manju; Humbert, Marc; Tischkowitz, Marc; Bitner-Glindzicz, Maria; Erwood, Marie; Scully, Marie; Veltman, Marijke; Caulfield, Mark; Layton, Mark; McCarthy, Mark; Ponsford, Mark; Toshner, Mark; Bleda, Marta; Wilkins, Martin; Mathias, Mary; Reilly, Mary; Afzal, Maryam; Brown, Matthew; Rondina, Matthew; Stubbs, Matthew; Haimel, Matthias; Lees, Melissa; Laffan, Michael A.; Browning, Michael; Gattens, Michael; Richards, Michael; Michaelides, Michel; Lambert, Michele P.; Makris, Mike; de Vries, Minka; Mahdi-Rogers, Mohamed; Saleem, Moin; Thomas, Moira; Holder, Muriel; Eyries, Mélanie; Clements-Brod, Naomi; Canham, Natalie; Dormand, Natalie; Zuydam, Natalie Van; Kingston, Nathalie; Ghali, Neeti; Cooper, Nichola; Morrell, Nicholas W.; Yeatman, Nigel; Roy, Noémi; Shamardina, Olga; Alavijeh, Omid S.; Gresele, Paolo; Nurden, Paquita; Chinnery, Patrick; Deegan, Patrick; Yong, Patrick; Man, Patrick Yu Wai; Corris, Paul A.; Calleja, Paul; Gissen, Paul; Bolton-Maggs, Paula; Rayner-Matthews, Paula; Ghataorhe, Pavandeep K.; Gordins, Pavel; Stein, Penelope; Collins, Peter; Dixon, Peter; Kelleher, Peter; Ancliff, Phil; Yu, Ping; Tait, R. Campbell; Linger, Rachel; Doffinger, Rainer; Machado, Rajiv; Kazmi, Rashid; Sargur, Ravishankar; Favier, Remi; Tan, Rhea; Liesner, Ri; Antrobus, Richard; Sandford, Richard; Scott, Richard; Trembath, Richard; Horvath, Rita; Hadden, Rob; MackenzieRoss, Rob V.; Henderson, Robert; MacLaren, Robert; James, Roger; Ghurye, Rohit; DaCosta, Rosa; Hague, Rosie; Mapeta, Rutendo; Armstrong, Ruth; Noorani, Sadia; Murng, Sai; Santra, Saikat; Tuna, Salih; Johnson, Sally; Chong, Sam; Lear, Sara; Walker, Sara; Goddard, Sarah; Mangles, Sarah; Westbury, Sarah; Mehta, Sarju; Hackett, Scott; Nejentsev, Sergey; Moledina, Shahin; Bibi, Shahnaz; Meehan, Sharon; Othman, Shokri; Revel-Vilk, Shoshana; Holden, Simon; McGowan, Simon; Staines, Simon; Savic, Sinisa; Burns, Siobhan; Grigoriadou, Sofia; Papadia, Sofia; Ashford, Sofie; Schulman, Sol; Ali, Sonia; Park, Soo-Mi; Davies, Sophie; Stock, Sophie; Ali, Souad; Deevi, Sri V. V.; Gräf, Stefan; Ghio, Stefano; Wort, Stephen J.; Jolles, Stephen; Austin, Steve; Welch, Steve; Meacham, Stuart; Rankin, Stuart; Walker, Suellen; Seneviratne, Suranjith; Holder, Susan; Sivapalaratnam, Suthesh; Richardson, Sylvia; Kuijpers, Taco; Bariana, Tadbir K.; Bakchoul, Tamam; Everington, Tamara; Renton, Tara; Young, Tim; Aitman, Timothy; Warner, Timothy Q.; Vale, Tom; Hammerton, Tracey; Pollock, Val; Matser, Vera; Cookson, Victoria; Clowes, Virginia; Qasim, Waseem; Wei, Wei; Erber, Wendy N.; Ouwehand, Willem H.; Astle, William; Egner, William; Turek, Wojciech; Henskens, Yvonne; Tan, Yvonne

    2018-01-01

    Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously

  18. Sequence dependence of electron-induced DNA strand breakage revealed by DNA nanoarrays

    DEFF Research Database (Denmark)

    Keller, Adrian; Rackwitz, Jenny; Cauët, Emilie

    2014-01-01

    sections for electron induced single strand breaks in specific 13 mer oligonucleotides we used atomic force microscopy analysis of DNA origami based DNA nanoarrays. We investigated the DNA sequences 5'-TT(XYX)3TT with X = A, G, C and Y = T, BrU 5-bromouracil and found absolute strand break cross sections...

  19. Mouse tetranectin: cDNA sequence, tissue-specific expression, and chromosomal mapping

    DEFF Research Database (Denmark)

    Ibaraki, K; Kozak, C A; Wewer, U M

    1995-01-01

    regulation, mouse tetranectin cDNA was cloned from a 16-day-old mouse embryo library. Sequence analysis revealed a 992-bp cDNA with an open reading frame of 606 bp, which is identical in length to the human tetranectin cDNA. The deduced amino acid sequence showed high homology to the human cDNA with 76...... in human. Although additional minor bands of 1.5 and 3.3 kb were found in Northern blots, RT-PCR (reverse transcription polymerase chain reaction) analysis failed to provide evidence that these minor bands are products of the tetranectin gene. Finally, the genetic map location for this gene, Tna...

  20. An automated annotation tool for genomic DNA sequences using ...

    Indian Academy of Sciences (India)

    Unknown

    , New Delhi 110 067, India. Abstract ... analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by .... genes for the TCA cycle, while in mitochondria only a subset of the ...

  1. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  2. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...

  3. An artificial intelligence approach to DNA sequence feature recognition.

    Science.gov (United States)

    Mural, R J; Einstein, J R; Guan, X; Mann, R C; Uberbacher, E C

    1992-01-01

    The ultimate goal of the Human Genome project is to extract the biologically relevant information recorded in the estimated 100,000 genes encoded by the 3 x 10(9) bases of the human genome. This necessitates development of reliable computer-based methods capable of analysing and correctly identifying genes in the vast amounts of DNA-sequence data generated. Such tools may save time and labour by simplifying, for example, screening of cDNA libraries. They may also facilitate the localization of human disease genes by identifying candidate genes in promising regions of anonymous DNA sequence.

  4. Mitochondrial DNA sequence-based phylogenetic relationship ...

    Indian Academy of Sciences (India)

    The phylogenetic relationships among flesh flies of the family Sarcophagidae has been based mainly on the morphology of male genitalia. However, the male genitalic character-based relationships are far from satisfactory. Therefore, in the present study mitochondrial DNA has been used as marker to unravel genetic ...

  5. Mitochondrial DNA sequence-based phylogenetic relationship ...

    Indian Academy of Sciences (India)

    2007 Population structure of the malaria vector Anopheles dar- lingi in Rondonia, Brazilian Amazon, based on mitochondrial. DNA. Mem. Inst. Oswaldo Cruz 102, 953–958. Avise J. C. 2004 Molecular markers, natural history, and evolution,. 2nd edition. Sinauer, Sunderland, USA. Cameron S. L., Lambkin C. L., Barker S. C. ...

  6. The algorithm of random length sequences synthesis for frame synchronization of digital television systems

    Directory of Open Access Journals (Sweden)

    Аndriy V. Sadchenko

    2015-12-01

    Full Text Available Digital television systems need to ensure that all digital signals processing operations are performed simultaneously and consistently. Frame synchronization dictated by the need to match phases of transmitter and receiver so that it would be possible to identify the start of a frame. As a frame synchronization signals are often used long length binary sequence with good aperiodic autocorrelation function. Aim: This work is dedicated to the development of the algorithm of random length sequences synthesis. Materials and Methods: The paper provides a comparative analysis of the known sequences, which can be used at present as synchronization ones, revealed their advantages and disadvantages. This work proposes the algorithm for the synthesis of binary synchronization sequences of random length with good autocorrelation properties based on noise generator with a uniform distribution law of probabilities. A "white noise" semiconductor generator is proposed to use as the initial material for the synthesis of binary sequences with desired properties. Results: The statistical analysis of the initial implementations of the "white noise" and synthesized sequences for frame synchronization of digital television is conducted. The comparative analysis of the synthesized sequences with known ones was carried out. The results show the benefits of obtained sequences in compare with known ones. The performed simulations confirm the obtained results. Conclusions: Thus, the search algorithm of binary synchronization sequences with desired autocorrelation properties received. According to this algorithm, the sequence can be longer in length and without length limitations. The received sync sequence can be used for frame synchronization in modern digital communication systems that will increase their efficiency and noise immunity.

  7. The isolation and amplification of full length cDNA of oleosins from ...

    African Journals Online (AJOL)

    STORAGESEVER

    2010-03-29

    Mar 29, 2010 ... Subcloning of DNA by inserting into pCR® 4-TOPO vector was performed using TOPO TA Cloning® Kit for Sequencing. (Invitrogen, USA). The sequencing of plasmid clones was performed using ABI PRISMTM Dye Terminator Cycle Sequencing Ready. Reaction Kit (Applied Biosystems Inc., USA) and an ...

  8. Plant viral intergenic DNA sequence repeats with transcription enhancing activity

    Directory of Open Access Journals (Sweden)

    Cazzonelli Christopher I

    2005-02-01

    Full Text Available Abstract Background The geminivirus and nanovirus families of DNA plant viruses have proved to be a fertile source of viral genomic sequences, clearly demonstrated by the large number of sequence entries within public DNA sequence databases. Due to considerable conservation in genome organization, these viruses contain easily identifiable intergenic regions that have been found to contain multiple DNA sequence elements important to viral replication and gene regulation. As a first step in a broad screen of geminivirus and nanovirus intergenic sequences for DNA segments important in controlling viral gene expression, we have 'mined' a large set of viral intergenic regions for transcriptional enhancers. Viral sequences that are found to act as enhancers of transcription in plants are likely to contribute to viral gene activity during infection. Results DNA sequences from the intergenic regions of 29 geminiviruses or nanoviruses were scanned for repeated sequence elements to be tested for transcription enhancing activity. 105 elements were identified and placed immediately upstream from a minimal plant-functional promoter fused to an intron-containing luciferase reporter gene. Transient luciferase activity was measured within Agrobacteria-infused Nicotiana tobacum leaf tissue. Of the 105 elements tested, 14 were found to reproducibly elevate reporter gene activity (>25% increase over that from the minimal promoter-reporter construct, p Conclusion Biological significance for the active DNA elements identified is supported by repeated isolation of a previously defined viral element (CLE, and the finding that two of three viral enhancer elements examined were markedly enriched within both geminivirus sequences and within Arabidopsis promoter regions. These data provide a useful starting point for virologists interested in undertaking more detailed analysis of geminiviral promoter function.

  9. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  10. Complete sequence and characterization of mitochondrial DNA genome of Channa asiatica (Perciformes: Channidae).

    Science.gov (United States)

    Meng, Yan; Zhang, Yan

    2016-01-01

    The complete nucleotide sequence of Channa asiatica mitochondrial (mtDNA) genome was determined in this study. The genome sequence (GenBank accession number KJ930190) was 16,550 base pairs in length, and the gene content and organization on the mitochondrial genome were similar to the other Channa fishes. The overall base composition of C. asiatica mitogenome is 29.4% A, 26.3% T, 15.3% G, 29.0% C, with a high A + T content of 55.7%. The mitochondrial sequence could provide useful genetic information for studying the molecular identification, population genetics, phylogenetic analysis and conservation genetics.

  11. Cloning and sequencing of Indian Water buffalo (Bubalus bubalis) interleukin-3 cDNA

    KAUST Repository

    Sugumar, Thennarasu

    2011-12-12

    Full-length cDNA (435 bp) of the interleukin-3(IL-3) gene of the Indian water buffalo was amplified by reverse transcriptase-polymerase chain reaction and sequenced. This sequence had 96% nucleotide identity and 92% amino acid identity with bovine IL-3. There are 10 amino acid substitutions in buffalo compared with that of bovine. The amino acid sequence of buffalo IL-3 also showed very high identity with that of other ruminants, indicating functional cross-reactivity. Structural homology modelling of buffalo IL-3 protein with human IL-3 showed the presence of five helical structures.

  12. Sequencing of adenine in DNA by scanning tunneling microscopy

    Science.gov (United States)

    Tanaka, Hiroyuki; Taniguchi, Masateru

    2017-08-01

    The development of DNA sequencing technology utilizing the detection of a tunnel current is important for next-generation sequencer technologies based on single-molecule analysis technology. Using a scanning tunneling microscope, we previously reported that dI/dV measurements and dI/dV mapping revealed that the guanine base (purine base) of DNA adsorbed onto the Cu(111) surface has a characteristic peak at V s = -1.6 V. If, in addition to guanine, the other purine base of DNA, namely, adenine, can be distinguished, then by reading all the purine bases of each single strand of a DNA double helix, the entire base sequence of the original double helix can be determined due to the complementarity of the DNA base pair. Therefore, the ability to read adenine is important from the viewpoint of sequencing. Here, we report on the identification of adenine by STM topographic and spectroscopic measurements using a synthetic DNA oligomer and viral DNA.

  13. Mapping Base Modifications in DNA by Transverse-Current Sequencing

    Science.gov (United States)

    Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

    2018-02-01

    Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.

  14. Next-generation sequencing offers new insights into DNA degradation

    DEFF Research Database (Denmark)

    Overballe-Petersen, Søren; Orlando, Ludovic Antoine Alexandre; Willerslev, Eske

    2012-01-01

    The processes underlying DNA degradation are central to various disciplines, including cancer research, forensics and archaeology. The sequencing of ancient DNA molecules on next-generation sequencing platforms provides direct measurements of cytosine deamination, depurination and fragmentation...... rates that previously were obtained only from extrapolations of results from in vitro kinetic experiments performed over short timescales. For example, recent next-generation sequencing of ancient DNA reveals purine bases as one of the main targets of postmortem hydrolytic damage, through base...... elimination and strand breakage. It also shows substantially increased rates of DNA base-loss at guanosine. In this review, we argue that the latter results from an electron resonance structure unique to guanosine rather than adenosine having an extra resonance structure over guanosine as previously suggested....

  15. [Length and structure of telomeric DNA in three species of Baikal gastropods (Caenogastropoda: Hydrobioidea: Benedictiidae)].

    Science.gov (United States)

    Koroleva, A G; Evtushenko, E V; Maximova, N V; Vershinin, A V; Sintnikova, T Y; Kirilchik, S V

    2015-03-01

    The structure of telomeric repeat (TTAGGG)n was determined and the length of telomeric DNA (tDNA) was measured in three species of gastropods from the family Benedictiidae that are endemic to Lake Baikal. Fluorescence in situ hybridization (FISH) confirmed the localization of a telomeric repeat at the chromosome ends. The sizes of tDNA in "giant" eurybathic, psammo-pelobiontic species Benedictia fragilis and shallow water litho-psammobiontic species B. baicalensis with medium shell sizes were similar (16 ± 2.9 and 15 ± 2.1 kb, respectively), but they had a greater length than that of the shallow water spongio-litobiontic species Kobeltocochlea martensiana with small shells (10.5 ± 1.5 kb). We discuss tendencies in age-related changes in tDNA length in snails and a possible mechanism for maintaining tDNA size in ontogeny.

  16. DNA Sequence Analysis in Clinical Medicine, Proceeding Cautiously

    Directory of Open Access Journals (Sweden)

    Moyra Smith

    2017-05-01

    Full Text Available Delineation of underlying genomic and genetic factors in a specific disease may be valuable in establishing a definitive diagnosis and may guide patient management and counseling. In addition, genetic information may be useful in identification of at risk family members. Gene mapping and initial genome sequencing data enabled the development of microarrays to analyze genomic variants. The goal of this review is to consider different generations of sequencing techniques and their application to exome sequencing and whole genome sequencing and their clinical applications. In recent decades, exome sequencing has primarily been used in patient studies. Discussed in some detail, are important measures that have been developed to standardize variant calling and to assess pathogenicity of variants. Examples of cases where exome sequencing has facilitated diagnosis and led to improved medical management are presented. Whole genome sequencing and its clinical relevance are presented particularly in the context of analysis of nucleotide and structural genomic variants in large population studies and in certain patient cohorts. Applications involving analysis of cell free DNA in maternal blood for prenatal diagnosis of specific autosomal trisomies are reviewed. Applications of DNA sequencing to diagnosis and therapeutics of cancer are presented. Also discussed are important recent diagnostic applications of DNA sequencing in cancer, including analysis of tumor derived cell free DNA and exosomes that are present in body fluids. Insights gained into underlying pathogenetic mechanisms of certain complex common diseases, including schizophrenia, macular degeneration, neurodegenerative disease are presented. The relevance of different types of variants, rare, uncommon, and common to disease pathogenesis, and the continuum of causality, are addressed. Pharmogenetic variants detected by DNA sequence analysis are gaining in importance and are particularly relevant

  17. Hibiscus latent Fort Pierce virus in Brazil and synthesis of its biologically active full-length cDNA clone.

    Science.gov (United States)

    Gao, Ruimin; Niu, Shengniao; Dai, Weifang; Kitajima, Elliot; Wong, Sek-Man

    2016-10-01

    A Brazilian isolate of Hibiscus latent Fort Pierce virus (HLFPV-BR) was firstly found in a hibiscus plant in Limeira, SP, Brazil. RACE PCR was carried out to obtain the full-length sequences of HLFPV-BR which is 6453 nucleotides and has more than 99.15 % of complete genomic RNA nucleotide sequence identity with that of HLFPV Japanese isolate. The genomic structure of HLFPV-BR is similar to other tobamoviruses. It includes a 5' untranslated region (UTR), followed by open reading frames encoding for a 128-kDa protein and a 188-kDa readthrough protein, a 38-kDa movement protein, 18-kDa coat protein, and a 3' UTR. Interestingly, the unique feature of poly(A) tract is also found within its 3'-UTR. Furthermore, from the total RNA extracted from the local lesions of HLFPV-BR-infected Chenopodium quinoa leaves, a biologically active, full-length cDNA clone encompassing the genome of HLFPV-BR was amplified and placed adjacent to a T7 RNA polymerase promoter. The capped in vitro transcripts from the cloned cDNA were infectious when mechanically inoculated into C. quinoa and Nicotiana benthamiana plants. This is the first report of the presence of an isolate of HLFPV in Brazil and the successful synthesis of a biologically active HLFPV-BR full-length cDNA clone.

  18. Insert sequence length determines transfection efficiency and gene expression levels in bicistronic mammalian expression vectors

    OpenAIRE

    Payne, Andrew J; Gerdes, Bryan C; Kaja, Simon; Koulen, Peter

    2013-01-01

    Bicistronic expression vectors have been widely used for co-expression studies since the initial discovery of the internal ribosome entry site (IRES) about 25 years ago. IRES sequences allow the 5’ cap-independent initiation of translation of multiple genes on a single messenger RNA strand. Using a commercially available mammalian expression vector containing an IRES sequence with a 3’ green fluorescent protein fluorescent marker, we found that sequence length of the gene of interest expresse...

  19. Bioinformatics analysis of circulating cell-free DNA sequencing data.

    Science.gov (United States)

    Chan, Landon L; Jiang, Peiyong

    2015-10-01

    The discovery of cell-free DNA molecules in plasma has opened up numerous opportunities in noninvasive diagnosis. Cell-free DNA molecules have become increasingly recognized as promising biomarkers for detection and management of many diseases. The advent of next generation sequencing has provided unprecedented opportunities to scrutinize the characteristics of cell-free DNA molecules in plasma in a genome-wide fashion and at single-base resolution. Consequently, clinical applications of circulating cell-free DNA analysis have not only revolutionized noninvasive prenatal diagnosis but also facilitated cancer detection and monitoring toward an era of blood-based personalized medicine. With the remarkably increasing throughput and lowering cost of next generation sequencing, bioinformatics analysis becomes increasingly demanding to understand the large amount of data generated by these sequencing platforms. In this Review, we highlight the major bioinformatics algorithms involved in the analysis of cell-free DNA sequencing data. Firstly, we briefly describe the biological properties of these molecules and provide an overview of the general bioinformatics approach for the analysis of cell-free DNA. Then, we discuss the specific upstream bioinformatics considerations concerning the analysis of sequencing data of circulating cell-free DNA, followed by further detailed elaboration on each key clinical situation in noninvasive prenatal diagnosis and cancer management where downstream bioinformatics analysis is heavily involved. We also discuss bioinformatics analysis as well as clinical applications of the newly developed massively parallel bisulfite sequencing of cell-free DNA. Finally, we offer our perspectives on the future development of bioinformatics in noninvasive diagnosis. Copyright © 2015 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  20. On Sequence Lengths of Some Special External Exclusive OR Type LFSR Structures – Study and Analysis

    Directory of Open Access Journals (Sweden)

    A Ahmad

    2014-12-01

    Full Text Available The study of the length of pseudo-random binary sequences generated by Linear- Feedback Shift Registers (LFSRs plays an important role in the design approaches of built-in selftest, cryptosystems, and other applications. However, certain LFSR structures might not be appropriate in some situations. Given that determining the length of generated pseudo-random binary sequence is a complex task, therefore, before using an LFSR structure, it is essential to investigate the length and the properties of the sequence. This paper investigates some conditions and LFSR’s structures, which restrict the pseudo-random binary sequences’ generation to a certain fixed length. The outcomes of this paper are presented in the form of theorems, simulations, and analyses. We believe that these outcomes are of great importance to the designers of built-in self-test equipment, cryptosystems, and other applications such as radar, CDMA, error correction, and Monte Carlo simulation.

  1. Isolation and enrichment of Cryptosporidium DNA and verification of DNA purity for whole-genome sequencing.

    Science.gov (United States)

    Guo, Yaqiong; Li, Na; Lysén, Colleen; Frace, Michael; Tang, Kevin; Sammons, Scott; Roellig, Dawn M; Feng, Yaoyu; Xiao, Lihua

    2015-02-01

    Whole-genome sequencing of Cryptosporidium spp. is hampered by difficulties in obtaining sufficient, highly pure genomic DNA from clinical specimens. In this study, we developed procedures for the isolation and enrichment of Cryptosporidium genomic DNA from fecal specimens and verification of DNA purity for whole-genome sequencing. The isolation and enrichment of genomic DNA were achieved by a combination of three oocyst purification steps and whole-genome amplification (WGA) of DNA from purified oocysts. Quantitative PCR (qPCR) analysis of WGA products was used as an initial quality assessment of amplified genomic DNA. The purity of WGA products was assessed by Sanger sequencing of cloned products. Next-generation sequencing tools were used in final evaluations of genome coverage and of the extent of contamination. Altogether, 24 fecal specimens of Cryptosporidium parvum, C. hominis, C. andersoni, C. ubiquitum, C. tyzzeri, and Cryptosporidium chipmunk genotype I were processed with the procedures. As expected, WGA products with low (sequences in Sanger sequencing. The cloning-sequencing analysis, however, showed significant contamination in 5 WGA products (proportion of positive colonies derived from Cryptosporidium genomic DNA, ≤25%). Following this strategy, 20 WGA products from six Cryptosporidium species or genotypes with low (mostly sequencing, generating sequence data covering 94.5% to 99.7% of Cryptosporidium genomes, with mostly minor contamination from bacterial, fungal, and host DNA. These results suggest that the described strategy can be used effectively for the isolation and enrichment of Cryptosporidium DNA from fecal specimens for whole-genome sequencing. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  2. Robust long-read native DNA sequencing using the ONT CsgG Nanopore system [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Jean-Michel Carter

    2017-04-01

    Full Text Available Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT. However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications.

  3. Full-length transcriptome analysis using a bias-free cDNA library prepared with the vector-capping method.

    Science.gov (United States)

    Kato, Seishi; Oshikawa, Mio; Ohtoko, Kuniyo

    2011-01-01

    Full-length complementary DNAs (cDNAs) are an essential resource for functional genomics. Recently, we have developed a simple and efficient method for preparing a full-length cDNA library from a small amount of total RNA, named the "vector-capping" method. The biggest advantage of this method is that the intactness of the cDNA can be assured by the presence of dG at the 5' end of the full-length cDNA. Furthermore, the cDNA library represents the mRNA population in the cell owing to a bias-free procedure. In this chapter, we describe not only the protocol for preparing the library but also the points for analyzing the 5'-end sequence of the obtained cDNA.

  4. PCR primers for metazoan mitochondrial 12S ribosomal DNA sequences.

    Directory of Open Access Journals (Sweden)

    Ryuji J Machida

    Full Text Available BACKGROUND: Assessment of the biodiversity of communities of small organisms is most readily done using PCR-based analysis of environmental samples consisting of mixtures of individuals. Known as metagenetics, this approach has transformed understanding of microbial communities and is beginning to be applied to metazoans as well. Unlike microbial studies, where analysis of the 16S ribosomal DNA sequence is standard, the best gene for metazoan metagenetics is less clear. In this study we designed a set of PCR primers for the mitochondrial 12S ribosomal DNA sequence based on 64 complete mitochondrial genomes and then tested their efficacy. METHODOLOGY/PRINCIPAL FINDINGS: A total of the 64 complete mitochondrial genome sequences representing all metazoan classes available in GenBank were downloaded using the NCBI Taxonomy Browser. Alignment of sequences was performed for the excised mitochondrial 12S ribosomal DNA sequences, and conserved regions were identified for all 64 mitochondrial genomes. These regions were used to design a primer pair that flanks a more variable region in the gene. Then all of the complete metazoan mitochondrial genomes available in NCBI's Organelle Genome Resources database were used to determine the percentage of taxa that would likely be amplified using these primers. Results suggest that these primers will amplify target sequences for many metazoans. CONCLUSIONS/SIGNIFICANCE: Newly designed 12S ribosomal DNA primers have considerable potential for metazoan metagenetic analysis because of their ability to amplify sequences from many metazoans.

  5. Mitochondrial DNA sequence variation in Drosophilid species ...

    Indian Academy of Sciences (India)

    Here, we assessed genetic variations in three mitochondrial genes, namely, 16S rRNA, cytochrome c oxidase subunit I and cytochrome c oxidase subunit II (COI and COII) in 26 drosophilid species collected along altitudinal transect from 550 to 2700 m above mean sea level. In the present study, overall 543 sequences ...

  6. A new program for DNA sequence mining

    Indian Academy of Sciences (India)

    Unknown

    activity of proteins by altering their structure (Klintschar and Wiegand 2003). Expressed Sequence Tags ..... among the organisms (for instance; animal versus plant, trees versus annual crops), among the organs (for instance; ... Int. 3rd Balkan Symposium on vegetables and potatoes. Bursa, Turkey, Acta Horticulturae (in ...

  7. Studies of DNA dumbbells VII: evaluation of the next-nearest-neighbor sequence-dependent interactions in duplex DNA.

    Science.gov (United States)

    Owczarzy, R; Vallone, P M; Goldstein, R F; Benight, A S

    1999-01-01

    Melting experiments were conducted on 22 DNA dumbbells as a function of solvent ionic strength from 25-115 mM Na(+). The dumbbell molecules have short duplex regions comprised of 16-20 base pairs linked on both ends by T(4) single-strand loops. Only the 4-8 central base pairs of the dumbbell stems differ for different molecules, and the six base pairs on both sides of the central sequence and adjoining loops on both ends are the same in every molecule. Results of melting analysis on the 22 new DNA dumbbells are combined with our previous results on 17 other DNA dumbbells, with stem lengths containing from 14-18 base pairs, reported in the first article of this series (Doktycz, Goldstein, Paner, Gallo, and Benight, Biopoly 32, 1992, 849-864). The combination of results comprises a database of optical melting parameters for 39 DNA dumbbells in ionic strengths from 25-115 mM Na(+). This database is employed to evaluate the thermodynamics of singlet, doublet, and triplet sequence-dependent interactions in duplex DNA. Analysis of the 25 mM Na(+) data reveals the existence of significant sequence-dependent triplet or next-nearest-neighbor interactions. The enthalpy of these interactions is evaluated for all possible triplets. Some of the triplet enthalpy values are less than the uncertainty in their evaluation, indicating no measurable interaction for that particular sequence. This finding suggests that the thermodynamic stability of duplex DNA depends on solvent ionic strength in a sequence-dependent manner. As a part of the analysis, the nearest-neighbor (base pair doublet) interactions in 55, 85, and 115 mM Na(+) are also reevaluated from the larger database. Copyright 2000 John Wiley & Sons, Inc.

  8. Coccidioides species determination: does sequence analysis agree with restriction fragment length polymorphism?

    Science.gov (United States)

    Johnson, Suzanne M; Carlson, Erin L; Pappagianis, Demosthenes

    2015-06-01

    Fifteen Coccidioides isolates were previously examined for genetic diversity using restriction fragment length polymorphism (RFLP); two fragment patterns were observed. Two isolates demonstrated one banding pattern (designated RFLP group I), while the remaining 13 isolates demonstrated a second pattern (designated RFLP group II). Recently, molecular studies supported the division of the genera Coccidioides into two species: Coccidioides posadasii and Coccidioides immitis. It has been assumed that the species division corresponds to the RFLP grouping. We tested this hypothesis by amplifying the ribosomal DNA internal transcribed spacer region as well as the dioxygenase, serine proteinase, and urease genes from 13 isolates previously examined by RFLP and then sequencing the PCR products. The appropriate species for each isolate was assigned using phylogenetically informative sites. The RFLP grouping agreed with the Coccidioides species assignment for all but one isolate, which may represent a hybrid. In addition, polymorphic sites among the four genes examined were in agreement for species assignment such that analysis of a single gene may be sufficient for species assignment.

  9. A novel DNA restriction technology based on laser pulse energy conversion on sequence-specific bound metal nanoparticles

    Science.gov (United States)

    Csaki, Andrea; Maubach, Gunter; Garwe, Frank; Steinbrueck, Andrea; Koenig, Karsten; Fritzsche, Wolfgang

    2005-03-01

    DNA restriction is a basic method in today"s molecular biology. Besides application for DNA manipulation, this method is used in DNA analytics for 'restriction analysis'. Thereby DNA is digested by sequence specific restriction enzymes, and the length distribution of the resulting fragments is detected by gel electrophoresis. Differences in the sequence lead to different restriction patterns. A disadvantage of this standard method is the limitation to a small set of fixed sequences, so that the assay can not be adapted to any sequence of interest (e.g. SNP). We designed a scheme for DNA restriction in order to provide access to any desired sequence, based on laser light conversion on sequence-specific positioned metal nanoparticles. Especially gold nanoparticles are known for their interesting optical properties caused by plasmon resonance. The resulting absorption can be used to convert laser light pulses into heat, resulting in nanoparticle destruction. We work on the combination of this principle with DNA-modification of nanoparticles and the sequence-specific binding (hybridization) of these DNA-nanoparticle complexes along DNA molecules. Different mechanisms of light-conversion were studied, and the destructive effect of laser light on the nanoparticles and DNA is demonstrated.

  10. Cloning and expression of full-length cDNA encoding human vitamin D receptor

    Energy Technology Data Exchange (ETDEWEB)

    Baker, A.R.; McDonnell, D.P.; Hughes, M.; Crisp, T.M.; Mangelsdorf, D.J.; Haussler, M.R.; Pike, J.W.; Shine, J.; O' Malley, B.W. (California Biotechnology Inc., Mountain View (USA))

    1988-05-01

    Complementary DNA clones encoding the human vitamin D receptor have been isolated from human intestine and T47D cell cDNA libraries. The nucleotide sequence of the 4605-base pair (bp) cDNA includes a noncoding leader sequence of 115 bp, a 1281-bp open reading frame, and 3209 bp of 3{prime} noncoding sequence. Two polyadenylylation signals, AATAAA, are present 25 and 70 bp upstream of the poly(A) tail, respectively. RNA blot hybridization indicates a single mRNA species of {approx} 4600 bp. Transfection of the cloned sequences into COS-1 cells results in the production of a single receptor species indistinguishable from the native receptor. Sequence comparisons demonstrate that the vitamin D receptor belongs to the steroid-receptor gene family and is closest in size and sequence to another member of this family, the thyroid hormone receptor.

  11. Usefulness of telomere length in DNA from human teeth for age estimation.

    Science.gov (United States)

    Márquez-Ruiz, Ana Belén; González-Herrera, Lucas; Valenzuela, Aurora

    2018-03-01

    Age estimation is widely used to identify individuals in forensic medicine. However, the accuracy of the most commonly used procedures is markedly reduced in adulthood, and these methods cannot be applied in practice when morphological information is limited. Molecular methods for age estimation have been extensively developed in the last few years. The fact that telomeres shorten at each round of cell division has led to the hypothesis that telomere length can be used as a tool to predict age. The present study thus aimed to assess the correlation between telomere length measured in dental DNA and age, and the effect of sex and tooth type on telomere length; a further aim was to propose a statistical regression model to estimate the biological age based on telomere length. DNA was extracted from 91 tooth samples belonging to 77 individuals of both sexes and 15 to 85 years old and was used to determine telomere length by quantitative real-time PCR. Our results suggested that telomere length was not affected by sex and was greater in molar teeth. We found a significant correlation between age and telomere length measured in DNA from teeth. However, the equation proposed to predict age was not accurate enough for forensic age estimation on its own. Age estimation based on telomere length in DNA from tooth samples may be useful as a complementary method which provides an approximate estimate of age, especially when human skeletal remains are the only forensic sample available.

  12. Management of High-Throughput DNA Sequencing Projects: Alpheus.

    Science.gov (United States)

    Miller, Neil A; Kingsmore, Stephen F; Farmer, Andrew; Langley, Raymond J; Mudge, Joann; Crow, John A; Gonzalez, Alvaro J; Schilkey, Faye D; Kim, Ryan J; van Velkinburgh, Jennifer; May, Gregory D; Black, C Forrest; Myers, M Kathy; Utsey, John P; Frost, Nicholas S; Sugarbaker, David J; Bueno, Raphael; Gullans, Stephen R; Baxter, Susan M; Day, Steve W; Retzel, Ernest F

    2008-12-26

    High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem's SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.

  13. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities.

    Science.gov (United States)

    Troshin, Peter V; Postis, Vincent Lg; Ashworth, Denise; Baldwin, Stephen A; McPherson, Michael J; Barton, Geoffrey J

    2011-03-07

    Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  14. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Directory of Open Access Journals (Sweden)

    Baldwin Stephen A

    2011-03-01

    Full Text Available Abstract Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  15. DNA-PK dependent targeting of DNA-ends to a protein complex assembled on matrix attachment region DNA sequences

    International Nuclear Information System (INIS)

    Mauldin, S.K.; Getts, R.C.; Perez, M.L.; DiRienzo, S.; Stamato, T.D.

    2003-01-01

    Full text: We find that nuclear protein extracts from mammalian cells contain an activity that allows DNA ends to associate with circular pUC18 plasmid DNA. This activity requires the catalytic subunit of DNA-PK (DNA-PKcs) and Ku since it was not observed in mutants lacking Ku or DNA-PKcs but was observed when purified Ku/DNA-PKcs was added to these mutant extracts. Competition experiments between pUC18 and pUC18 plasmids containing various nuclear matrix attachment region (MAR) sequences suggest that DNA ends preferentially associate with plasmids containing MAR DNA sequences. At a 1:5 mass ratio of MAR to pUC18, approximately equal amounts of DNA end binding to the two plasmids were observed, while at a 1:1 ratio no pUC18 end-binding was observed. Calculation of relative binding activities indicates that DNA-end binding activities to MAR sequences was 7 to 21 fold higher than pUC18. Western analysis of proteins bound to pUC18 and MAR plasmids indicates that XRCC4, DNA ligase IV, scaffold attachment factor A, topoisomerase II, and poly(ADP-ribose) polymerase preferentially associate with the MAR plasmid in the absence or presence of DNA ends. In contrast, Ku and DNA-PKcs were found on the MAR plasmid only in the presence of DNA ends. After electroporation of a 32P-labeled DNA probe into human cells and cell fractionation, 87% of the total intercellular radioactivity remained in nuclei after a 0.5M NaCl extraction suggesting the probe was strongly bound in the nucleus. The above observations raise the possibility that DNA-PK targets DNA-ends to a repair and/or DNA damage signaling complex which is assembled on MAR sites in the nucleus

  16. Noninvasive prenatal paternity testing (NIPAT) through maternal plasma DNA sequencing

    DEFF Research Database (Denmark)

    Jiang, Haojun; Xie, Yifan; Li, Xuchao

    2016-01-01

    developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels...... in order to verify the performance in clinical cases. Combining targeted deep sequencing of selective SNP and informative bioinformatics pipeline, we calculated the combined paternity index (CPI) of 17 cases to determine paternity. Sequencing-based NIPAT results fully agreed with invasive prenatal...

  17. Dialects of the DNA Uptake Sequence in Neisseriaceae

    Science.gov (United States)

    Frye, Stephan A.; Nilsen, Mariann; Tønjum, Tone; Ambur, Ole Herman

    2013-01-01

    In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS), which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS–dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5′-CTG-3′ is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS–dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic transformation

  18. Dialects of the DNA uptake sequence in Neisseriaceae.

    Directory of Open Access Journals (Sweden)

    Stephan A Frye

    2013-04-01

    Full Text Available In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS, which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS-dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5'-CTG-3' is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS-dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic

  19. Sigma: multiple alignment of weakly-conserved non-coding DNA sequence

    Directory of Open Access Journals (Sweden)

    Siddharthan Rahul

    2006-03-01

    Full Text Available Abstract Background Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign, at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. Results Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. Conclusion By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.

  20. SMRT sequencing of full-length transcriptome of flea beetle Agasicles hygrophila (Selman and Vogt).

    Science.gov (United States)

    Jia, Dong; Wang, Yuanxin; Liu, Yanhong; Hu, Jun; Guo, Yanqiong; Gao, Lingling; Ma, Ruiyan

    2018-02-02

    This study was aimed at generating the full-length transcriptome of flea beetle Agasicles hygrophila (Selman and Vogt) using single-molecule real-time (SMRT) sequencing. Four developmental stages of A. hygrophila, including eggs, larvae, pupae, and adults were harvested for isolating total RNA. The mixed samples were used for SMRT sequencing to generate the full-length transcriptome. Based on the obtained transcriptome data, alternative splicing event, simple sequence repeat (SSR) analysis, coding sequence prediction, transcript functional annotation, and lncRNA prediction were performed. Total 9.45 Gb of clean reads were generated, including 335,045 reads of insert (ROI) and 158,085 full-length non-chimeric (FLNC) reads. Transcript clustering analysis of FLNC reads identified 40,004 consensus isoforms, including 31,015 high-quality ones. After removing redundant reads, 28,982 transcripts were obtained. Total 145 alternative splicing events were predicted. Additionally, 12,753 SSRs and 16,205 coding sequences were identified based on SSR analysis. Furthermore, 24,031 transcripts were annotated in eight functional databases, and 4,198 lncRNAs were predicted. This is the first study to perform SMRT sequencing of the full-length transcriptome of A. hygrophila. The obtained transcriptome may facilitate further exploration of the genetic data of A. hygrophila and uncover the interactions between this insect and the ecosystem.

  1. Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads.

    Science.gov (United States)

    Bai, Yu; Ni, Min; Cooper, Blerta; Wei, Yi; Fury, Wen

    2014-05-01

    Accurate HLA typing at amino acid level (four-digit resolution) is critical in hematopoietic and organ transplantations, pathogenesis studies of autoimmune and infectious diseases, as well as the development of immunoncology therapies. With the rapid adoption of genome-wide sequencing in biomedical research, HLA typing based on transcriptome and whole exome/genome sequencing data becomes increasingly attractive due to its high throughput and convenience. However, unlike targeted amplicon sequencing, genome-wide sequencing often employs a reduced read length and coverage that impose great challenges in resolving the highly homologous HLA alleles. Though several algorithms exist and have been applied to four-digit typing, some deliver low to moderate accuracies, some output ambiguous predictions. Moreover, few methods suit diverse read lengths and depths, and both RNA and DNA sequencing inputs. New algorithms are therefore needed to leverage the accuracy and flexibility of HLA typing at high resolution using genome-wide sequencing data. We have developed a new algorithm named PHLAT to discover the most probable pair of HLA alleles at four-digit resolution or higher, via a unique integration of a candidate allele selection and a likelihood scoring. Over a comprehensive set of benchmarking data (a total of 768 HLA alleles) from both RNA and DNA sequencing and with a broad range of read lengths and coverage, PHLAT consistently achieves a high accuracy at four-digit (92%-95%) and two-digit resolutions (96%-99%), outcompeting most of the existing methods. It also supports targeted amplicon sequencing data from Illumina Miseq. PHLAT significantly leverages the accuracy and flexibility of high resolution HLA typing based on genome-wide sequencing data. It may benefit both basic and applied research in immunology and related fields as well as numerous clinical applications.

  2. Accelerating Computation of DNA Sequence Alignment in Distributed Environment

    Science.gov (United States)

    Guo, Tao; Li, Guiyang; Deaton, Russel

    Sequence similarity and alignment are most important operations in computational biology. However, analyzing large sets of DNA sequence seems to be impractical on a regular PC. Using multiple threads with JavaParty mechanism, this project has successfully implemented in extending the capabilities of regular Java to a distributed environment for simulation of DNA computation. With the aid of JavaParty and the design of multiple threads, the results of this study demonstrated that the modified regular Java program could perform parallel computing without using RMI or socket communication. In this paper, an efficient method for modeling and comparing DNA sequences with dynamic programming and JavaParty was firstly proposed. Additionally, results of this method in distributed environment have been discussed.

  3. Update on Acanthamoeba jacobsi genotype T15, including full-length 18S rDNA molecular phylogeny.

    Science.gov (United States)

    Corsaro, Daniele; Köhsler, Martina; Montalbano Di Filippo, Margherita; Venditti, Danielle; Monno, Rosa; Di Cave, David; Berrilli, Federica; Walochnik, Julia

    2017-04-01

    Free-living amoebae of the genus Acanthamoeba are worldwide present in natural and artificial environments, and are also clinically important, as causative agents of diseases in humans and other animals. Acanthamoeba comprises several species, historically assigned to one of the three groups based on their cyst morphology, but presently recognized as at least 20 genotypes (T1-T20) on the basis of their nuclear 18S ribosomal RNA (rRNA) gene (18S rDNA) sequences. While strain identification may usually be achieved targeting short (2200 bp) is necessary for correct genotype description and reliable molecular phylogenetic inference. The genotype T15, corresponding to Acanthamoeba jacobsi, is the only genotype described on the basis of partial sequences (~1500 bp). While this feature does not prevent the correct identification of the strains, having only partial sequences renders the genotype T15 not completely defined and may furthermore affect its position in the Acanthamoeba molecular tree. Here, we complete this gap, by obtaining full-length 18S rDNA sequences from eight A. jacobsi strains, genotype T15. Morphologies and physiological features of isolated strains are reported. Molecular phylogeny based on full 18S rDNA confirms some previous suggestions for a genetic link between T15 and T13, T16, and T19, with T19 as sister-group to T15.

  4. Anaplasma phagocytophilum in Danish sheep: confirmation by DNA sequencing

    Directory of Open Access Journals (Sweden)

    Thamsborg Stig M

    2009-12-01

    Full Text Available Abstract Background The presence of Anaplasma phagocytophilum, an Ixodes ricinus transmitted bacterium, was investigated in two flocks of Danish grazing lambs. Direct PCR detection was performed on DNA extracted from blood and serum with subsequent confirmation by DNA sequencing. Methods 31 samples obtained from clinically normal lambs in 2000 from Fussingø, Jutland and 12 samples from ten lambs and two ewes from a clinical outbreak at Feddet, Zealand in 2006 were included in the study. Some of the animals from Feddet had shown clinical signs of polyarthritis and general unthriftiness prior to sampling. DNA extraction was optimized from blood and serum and detection achieved by a 16S rRNA targeted PCR with verification of the product by DNA sequencing. Results Five DNA extracts were found positive by PCR, including two samples from 2000 and three from 2006. For both series of samples the product was verified as A. phagocytophilum by DNA sequencing. Conclusions A. phagocytophilum was detected by molecular methods for the first time in Danish grazing lambs during the two seasons investigated (2000 and 2006.

  5. DNA qualification workflow for next generation sequencing of histopathological samples.

    Directory of Open Access Journals (Sweden)

    Michele Simbolo

    Full Text Available Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF tissues, 6 formalin-fixed paraffin-embedded (FFPE tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard

  6. DNA Qualification Workflow for Next Generation Sequencing of Histopathological Samples

    Science.gov (United States)

    Simbolo, Michele; Gottardi, Marisa; Corbo, Vincenzo; Fassan, Matteo; Mafficini, Andrea; Malpeli, Giorgio; Lawlor, Rita T.; Scarpa, Aldo

    2013-01-01

    Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA) and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR) was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF) tissues, 6 formalin-fixed paraffin-embedded (FFPE) tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard workflow for

  7. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney

    2012-10-08

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the \\'oligo-capping\\' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5\\'-ESTs and 41,317 3\\'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for

  8. Training Sequence Length Optimization for a Turbo-Detector Using Decision-Directed Channel Estimation

    Directory of Open Access Journals (Sweden)

    Imed Hadj Kacem

    2008-01-01

    Full Text Available We consider the problem of optimization of the training sequence length when a turbo-detector composed of a maximum a posteriori (MAP equalizer and a MAP decoder is used. At each iteration of the receiver, the channel is estimated using the hard decisions on the transmitted symbols at the output of the decoder. The optimal length of the training sequence is found by maximizing an effective signal-to-noise ratio (SNR taking into account the data throughput loss due to the use of pilot symbols.

  9. Non-radioactive chemical sequencing of biotin labelled DNA.

    OpenAIRE

    Richterich, P

    1989-01-01

    Methods for the nonradioactive chemical sequencing of DNA are described. A biotin marker molecule, attached chemically to an oligonucleotide primer or enzymatically in an endfilling reaction of restriction enzyme sites, is stable during the base-specific chemical modification and strand scission reactions. Following fragment separation by direct blotting electrophoresis, the membrane bound sequence pattern can be visualized by a streptavidin-bridged enzymatic color reaction. The biotin labeli...

  10. Restriction and sequence alterations affect DNA uptake sequence-dependent transformation in Neisseria meningitidis.

    Directory of Open Access Journals (Sweden)

    Ole Herman Ambur

    Full Text Available Transformation is a complex process that involves several interactions from the binding and uptake of naked DNA to homologous recombination. Some actions affect transformation favourably whereas others act to limit it. Here, meticulous manipulation of a single type of transforming DNA allowed for quantifying the impact of three different mediators of meningococcal transformation: NlaIV restriction, homologous recombination and the DNA Uptake Sequence (DUS. In the wildtype, an inverse relationship between the transformation frequency and the number of NlaIV restriction sites in DNA was observed when the transforming DNA harboured a heterologous region for selection (ermC but not when the transforming DNA was homologous with only a single nucleotide heterology. The influence of homologous sequence in transforming DNA was further studied using plasmids with a small interruption or larger deletions in the recombinogenic region and these alterations were found to impair transformation frequency. In contrast, a particularly potent positive driver of DNA uptake in Neisseria sp. are short DUS in the transforming DNA. However, the molecular mechanism(s responsible for DUS specificity remains unknown. Increasing the number of DUS in the transforming DNA was here shown to exert a positive effect on transformation. Furthermore, an influence of variable placement of DUS relative to the homologous region in the donor DNA was documented for the first time. No effect of altering the orientation of DUS was observed. These observations suggest that DUS is important at an early stage in the recognition of DNA, but does not exclude the existence of more than one level of DUS specificity in the sequence of events that constitute transformation. New knowledge on the positive and negative drivers of transformation may in a larger perspective illuminate both the mechanisms and the evolutionary role(s of one of the most conserved mechanisms in nature: homologous

  11. High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

    Directory of Open Access Journals (Sweden)

    Brian Wade Jamandre

    2014-01-01

    Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.

  12. Length heteroplasmy of the polyC-polyT-polyC stretch in the dog mtDNA control region.

    Science.gov (United States)

    Verscheure, Sophie; Backeljau, Thierry; Desmyter, Stijn

    2015-09-01

    Previously, the mitochondrial control region of 214 Belgian dogs was sequenced. Analysis of this data indicated length heteroplasmy of the polyT stretch in the polyC-polyT-polyC stretch from positions 16661 to 16674. Nine polyC-polyT-polyC haplotype combinations were observed, consisting of seven major haplotypes (highest signal intensity) combined with minor haplotypes (lower signal intensity) one T shorter than the major haplotype in all but three dogs. The longer the polyT stretch, the smaller was the difference in signal intensity between the major and minor haplotype peaks. Additional sequencing, cloning, and PCR trap experiments were performed to further study the intra-individual variation of this mitochondrial DNA (mtDNA) region. Cloning experiments demonstrated that the proportion of clones displaying the minor haplotypes also increased with the length of the polyT stretch. Clone amplification showed that in vitro polymerase errors might contribute to the length heteroplasmy of polyT stretches with at least 10 Ts. Although major and minor polyC-polyT-polyC haplotypes did not differ intra-individually within and between tissues in this study, interpretation of polyT stretch variation should be handled with care in forensic casework.

  13. Cloning and sequencing of full-length cDNAs of RNA1 and RNA2 of a Tomato black ring virus isolate from Poland.

    Science.gov (United States)

    Jończyk, M; Le Gall, O; Pałucha, A; Borodynko, N; Pospieszny, H

    2004-04-01

    Full-length cDNA clones corresponding to the RNA1 and RNA2 of the Polish isolate MJ of Tomato black ring virus (TBRV, genus Nepovirus) were obtained using a direct recombination strategy in yeast, and their complete nucleotide sequences were established. RNA1 is 7358 nucleotides and RNA2 is 4633 nucleotides in length, excluding the poly(A) tails. Both RNAs contain a single open reading frame encoding polyproteins of 254 kDa and 149 kDa for RNA1 and RNA2 respectively. Putative cleavage sites were identified, and the relationships between TBRV and related nepoviruses were studied by sequence comparison.

  14. Dual redundant sequencing strategy: Full-length gene characterisation of 1056 novel and confirmatory HLA alleles.

    Science.gov (United States)

    Albrecht, V; Zweiniger, C; Surendranath, V; Lang, K; Schöfl, G; Dahl, A; Winkler, S; Lange, V; Böhme, I; Schmidt, A H

    2017-08-01

    The high-throughput department of DKMS Life Science Lab encounters novel human leukocyte antigen (HLA) alleles on a daily basis. To characterise these alleles, we have developed a system to sequence the whole gene from 5'- to 3'-UTR for the HLA loci A, B, C, DQB1 and DPB1 for submission to the European Molecular Biology Laboratory - European Nucleotide Archive (EMBL-ENA) and the IPD-IMGT/HLA Database. Our workflow is based on a dual redundant sequencing strategy. Using shotgun sequencing on an Illumina MiSeq instrument and single molecule real-time (SMRT) sequencing on a PacBio RS II instrument, we are able to achieve highly accurate HLA full-length consensus sequences. Remaining conflicts are resolved using the R package DR2S (Dual Redundant Reference Sequencing). Given the relatively high throughput of this strategy, we have developed the semi-automated web service TypeLoader, to aid in the submission of sequences to the EMBL-ENA and the IPD-IMGT/HLA Database. In the IPD-IMGT/HLA Database release 3.24.0 (April 2016; prior to the submission of the sequences described here), only 5.2% of all known HLA alleles have been fully characterised together with intronic and UTR sequences. So far, we have applied our strategy to characterise and submit 1056 HLA alleles, thereby more than doubling the number of fully characterised alleles. Given the increasing application of next generation sequencing (NGS) for full gene characterisation in clinical practice, extending the HLA database concomitantly is highly desirable. Therefore, we propose this dual redundant sequencing strategy as a workflow for submission of novel full-length alleles and characterisation of sequences that are as yet incomplete. This would help to mitigate the predominance of partially known alleles in the database. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  15. Noise Attenuation Estimation for Maximum Length Sequences in Deconvolution Process of Auditory Evoked Potentials

    Directory of Open Access Journals (Sweden)

    Xian Peng

    2017-01-01

    Full Text Available The use of maximum length sequence (m-sequence has been found beneficial for recovering both linear and nonlinear components at rapid stimulation. Since m-sequence is fully characterized by a primitive polynomial of different orders, the selection of polynomial order can be problematic in practice. Usually, the m-sequence is repetitively delivered in a looped fashion. Ensemble averaging is carried out as the first step and followed by the cross-correlation analysis to deconvolve linear/nonlinear responses. According to the classical noise reduction property based on additive noise model, theoretical equations have been derived in measuring noise attenuation ratios (NARs after the averaging and correlation processes in the present study. A computer simulation experiment was conducted to test the derived equations, and a nonlinear deconvolution experiment was also conducted using order 7 and 9 m-sequences to address this issue with real data. Both theoretical and experimental results show that the NAR is essentially independent of the m-sequence order and is decided by the total length of valid data, as well as stimulation rate. The present study offers a guideline for m-sequence selections, which can be used to estimate required recording time and signal-to-noise ratio in designing m-sequence experiments.

  16. RNA-DNA sequence differences spell genetic code ambiguities

    DEFF Research Database (Denmark)

    Bentin, Thomas; Nielsen, Michael L

    2013-01-01

    A recent paper in Science by Li et al. 2011(1) reports widespread sequence differences in the human transcriptome between RNAs and their encoding genes termed RNA-DNA differences (RDDs). The findings could add a new layer of complexity to gene expression but the study has been criticized. ...

  17. (Brassicaceae) based on nuclear ribosomal ITS DNA sequences

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 93; Issue 2. Phylogeny and biogeography of Alyssum (Brassicaceae) based on nuclear ribosomal ITS DNA sequences. Yan Li Yan Kong Zhe Zhang Yanqiang Yin Bin Liu Guanghui Lv Xiyong Wang. Research Article Volume 93 Issue 2 August 2014 pp 313-323 ...

  18. POSA: perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, J.A.; Jungerius, B.J.; Groenen, M.A.M.

    2004-01-01

    Background - Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  19. POSA : Perl objects for DNA sequencing data analysis

    NARCIS (Netherlands)

    Aerts, JA; Jungerius, BJ; Groenen, MA

    2004-01-01

    Background: Capillary DNA sequencing machines allow the generation of vast amounts of data with little hands-on time. With this expansion of data generation, there is a growing need for automated data processing. Most available software solutions, however, still require user intervention or provide

  20. What DNA sequence tells us about gene regulation - The ...

    Indian Academy of Sciences (India)

    Rahul Siddharthan

    2007-11-03

    Nov 3, 2007 ... Predicting cis-regulatory modules: eve enhancers. Performance, even without prior WMs, comparable to dedicated CRM prediction programs like Stubb. Rahul Siddharthan. (The Institute of Mathematical Sciences, Chennai 600 113. What DNA sequence tells us about gene regulation. 03/11/2007. 27 / 34 ...

  1. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    RPS16 of eukaryote is a component of the 40S small ribosomal subunit encoded by RPS16 gene and is also a homolog of prokaryotic RPS9. The cDNA and genomic sequence of RPS16 was cloned successfully for the first time from the Giant Panda (Ailuropoda melanoleuca) using reverse transcription-polymerase chain ...

  2. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  3. Mitochondrial DNA sequence variation in the Anatolian Peninsula ...

    Indian Academy of Sciences (India)

    Unknown

    A few studies have previously reported mtDNA sequences in Turks. We attempted to extend these results by analysing a cohort that is not only larger, but also more representative of the. Turkish population living in Anatolia. In order to obtain a descriptive picture for the phylogenetic distribution of the mitochondrial genome ...

  4. Random amplified polymorphic DNA (RAPD) and simple sequence ...

    African Journals Online (AJOL)

    Knowledge as to genetic diversity and relationships among maize hybrids is important for breeding strategies. The main aims of this study were to (1) estimate molecular genetic diversity among 30 maize hybrids by random amplified polymorphic DNA (RAPD) and simple sequence repeat (SSR) markers; and (2) compare ...

  5. The white spot syndrome virus DNA genome sequence

    NARCIS (Netherlands)

    Hulten, van M.C.W.; Witteveldt, J.; Peters, S.; Kloosterboer, N.; Tarchini, R.; Fiers, M.; Sandbrink, H.; Klein Lankhorst, R.; Vlak, J.M.

    2001-01-01

    White spot syndrome virus (WSSV) is at present a major scourge to worldwide shrimp cultivation. We have determined the entire sequence of the double-stranded, circular DNA genome of WSSV, which contains 292,967 nucleotides encompassing 184 major open reading frames (ORFs). Only 6 f the WSSV ORFs

  6. Solid-State Nanopore-Based DNA Sequencing Technology

    Directory of Open Access Journals (Sweden)

    Zewen Liu

    2016-01-01

    Full Text Available The solid-state nanopore-based DNA sequencing technology is becoming more and more attractive for its brand new future in gene detection field. The challenges that need to be addressed are diverse: the effective methods to detect base-specific signatures, the control of the nanopore’s size and surface properties, and the modulation of translocation velocity and behavior of the DNA molecules. Among these challenges, the realization of the high-quality nanopores with the help of modern micro/nanofabrication technologies is a crucial one. In this paper, typical technologies applied in the field of solid-state nanopore-based DNA sequencing have been reviewed.

  7. Sequence heterogeneity accelerates protein search for targets on DNA

    International Nuclear Information System (INIS)

    Shvets, Alexey A.; Kolomeisky, Anatoly B.

    2015-01-01

    The process of protein search for specific binding sites on DNA is fundamentally important since it marks the beginning of all major biological processes. We present a theoretical investigation that probes the role of DNA sequence symmetry, heterogeneity, and chemical composition in the protein search dynamics. Using a discrete-state stochastic approach with a first-passage events analysis, which takes into account the most relevant physical-chemical processes, a full analytical description of the search dynamics is obtained. It is found that, contrary to existing views, the protein search is generally faster on DNA with more heterogeneous sequences. In addition, the search dynamics might be affected by the chemical composition near the target site. The physical origins of these phenomena are discussed. Our results suggest that biological processes might be effectively regulated by modifying chemical composition, symmetry, and heterogeneity of a genome

  8. DNA watermarks in non-coding regulatory sequences

    Directory of Open Access Journals (Sweden)

    Pyka Martin

    2009-07-01

    Full Text Available Abstract Background DNA watermarks can be applied to identify the unauthorized use of genetically modified organisms. It has been shown that coding regions can be used to encrypt information into living organisms by using the DNA-Crypt algorithm. Yet, if the sequence of interest presents a non-coding DNA sequence, either the function of a resulting functional RNA molecule or a regulatory sequence, such as a promoter, could be affected. For our studies we used the small cytoplasmic RNA 1 in yeast and the lac promoter region of Escherichia coli. Findings The lac promoter was deactivated by the integrated watermark. In addition, the RNA molecules displayed altered configurations after introducing a watermark, but surprisingly were functionally intact, which has been verified by analyzing the growth characteristics of both wild type and watermarked scR1 transformed yeast cells. In a third approach we introduced a second overlapping watermark into the lac promoter, which did not affect the promoter activity. Conclusion Even though the watermarked RNA and one of the watermarked promoters did not show any significant differences compared to the wild type RNA and wild type promoter region, respectively, it cannot be generalized that other RNA molecules or regulatory sequences behave accordingly. Therefore, we do not recommend integrating watermark sequences into regulatory regions.

  9. Multifractal properties of Hao's geometric representations of DNA sequences

    Science.gov (United States)

    Tiňo, Peter

    2002-02-01

    Hao proposed a graphic representation of subsequence structure in DNA sequences and computed fractal dimensions of such representations for factorizable languages. In this study, we extend Hao's work in several directions: (1) We generalize Hao's scheme to accommodate sequences over an arbitrary finite number of symbols. (2) We establish a direct correspondence between the statistical characterization of symbolic sequences via Rényi entropy spectra and the multifractal characteristics (Rényi generalized dimensions) of the sequences’ spatial representations. (3) We show that for general symbolic dynamical systems, the multifractal fH-spectra in the sequence space endowed with commonly used metrics, coincide with the fH-spectra on Hao's sequence representations. (4) So far the connection between the Hao's scheme and another well-known subsequence visualization scheme-Jeffrey's chaos game representation (CGR)-has been characterized only in very vague terms. We show that the fractal dimension results for Hao's visualization frames directly translate to Jeffrey's CGR scheme.

  10. Nanopore-based fourth-generation DNA sequencing technology.

    Science.gov (United States)

    Feng, Yanxiao; Zhang, Yuechuan; Ying, Cuifeng; Wang, Deqiang; Du, Chunlei

    2015-02-01

    Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than $100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis opens a new door to molecular biology investigation at the single-molecule scale. In this article, we have reviewed academic achievements in nanopore technology from the past as well as the latest advances, including both biological and solid-state nanopores, and discussed their recent and potential applications. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  11. Nanopore-based Fourth-generation DNA Sequencing Technology

    Directory of Open Access Journals (Sweden)

    Yanxiao Feng

    2015-02-01

    Full Text Available Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than $100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis opens a new door to molecular biology investigation at the single-molecule scale. In this article, we have reviewed academic achievements in nanopore technology from the past as well as the latest advances, including both biological and solid-state nanopores, and discussed their recent and potential applications.

  12. A two-locus DNA sequence database for typing plant and human pathogens within the Fusarium oxysporum species complex

    DEFF Research Database (Denmark)

    O'Donnell, Kerry; Gueidan, C; Sink, S

    2009-01-01

    We constructed a two-locus database, comprising partial translation elongation factor (EF-1alpha) gene sequences and nearly full-length sequences of the nuclear ribosomal intergenic spacer region (IGS rDNA) for 850 isolates spanning the phylogenetic breadth of the Fusarium oxysporum species complex...... (FOSC). Of the 850 isolates typed, 101 EF-1alpha, 203 IGS rDNA, and 256 two-locus sequence types (STs) were differentiated. Analysis of the combined dataset suggests that two-thirds of the STs might be associated with a single host plant. This analysis also revealed that the 26 STs associated with human...

  13. VoSeq: a voucher and DNA sequence web application.

    Directory of Open Access Journals (Sweden)

    Carlos Peña

    Full Text Available There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL and GBIF (by generating data-dumps that can be processed with GBIF's Integrated Publishing Toolkit.

  14. C-terminal low-complexity sequence repeats of Mycobacterium smegmatis Ku modulate DNA binding.

    Science.gov (United States)

    Kushwaha, Ambuj K; Grove, Anne

    2013-01-24

    Ku protein is an integral component of the NHEJ (non-homologous end-joining) pathway of DSB (double-strand break) repair. Both eukaryotic and prokaryotic Ku homologues have been characterized and shown to bind DNA ends. A unique feature of Mycobacterium smegmatis Ku is its basic C-terminal tail that contains several lysine-rich low-complexity PAKKA repeats that are absent from homologues encoded by obligate parasitic mycobacteria. Such PAKKA repeats are also characteristic of mycobacterial Hlp (histone-like protein) for which they have been shown to confer the ability to appose DNA ends. Unexpectedly, removal of the lysine-rich extension enhances DNA-binding affinity, but an interaction between DNA and the PAKKA repeats is indicated by the observation that only full-length Ku forms multiple complexes with a short stem-loop-containing DNA previously designed to accommodate only one Ku dimer. The C-terminal extension promotes DNA end-joining by T4 DNA ligase, suggesting that the PAKKA repeats also contribute to efficient end-joining. We suggest that low-complexity lysine-rich sequences have evolved repeatedly to modulate the function of unrelated DNA-binding proteins.

  15. Characterization of two Arabidopsis thaliana myb-like proteins showing affinity to telomeric DNA sequence.

    Science.gov (United States)

    Schrumpfová, Petra; Kuchar, Milan; Miková, Gabriela; Skrísovská, Lenka; Kubicárová, Tatiana; Fajkus, Jirí

    2004-04-01

    Telomere-binding proteins participate in forming a functional nucleoprotein structure at chromosome ends. Using a genomic approach, two Arabidopsis thaliana genes coding for candidate Myb-like telomere binding proteins were cloned and expressed in E. coli. Both proteins, termed AtTBP2 (accession Nos. T46051 (protein database) and GI:638639 (nucleotide database); 295 amino acids, 32 kDa, pI 9.53) and AtTBP3 (BAB08466, GI:9757879; 299 amino acids, 33 kDa, pI 9.88), contain a single Myb-like DNA-binding domain at the N-terminus, and a histone H1/H5-like DNA-binding domain in the middle of the protein sequence. Both proteins are expressed in various A. thaliana tissues. Using the two-hybrid system interaction between the proteins AtTBP2 and AtTBP3 and self interactions of each of the proteins were detected. Gel-retardation assays revealed that each of the two proteins is able to bind the G-rich strand and double-stranded DNA of plant telomeric sequence with an affinity proportional to a number of telomeric repeats. Substrates bearing a non-telomeric DNA sequence positioned between two telomeric repeats were bound with an efficiency depending on the length of interrupting sequence. The ability to bind variant telomere sequences decreased with sequence divergence from the A. thaliana telomeric DNA. None of the proteins alone or their mixture affects telomerase activity in vitro. Correspondingly, no interaction was observed between any of two proteins and the Arabidopsis telomerase reverse transcriptase catalytic subunit TERT (accession No. AF172097) using two-hybrid assay.

  16. Identification of common motifs in unaligned DNA sequences: application to Escherichia coli Lrp regulon.

    Science.gov (United States)

    Fraenkel, Y M; Mandel, Y; Friedberg, D; Margalit, H

    1995-08-01

    We describe a relatively simple method for the identification of common motifs in DNA sequences that are known to share a common function. The input sequences are unaligned and there is no information regarding the position or orientation of the motif. Often such data exists for protein-binding regions, where genetic or molecular information that defines the binding region is available, but the specific recognition site within it is unknown. The method is based on the principle of 'divide and conquer'; we first search for dominant submotifs and then build full-length motifs around them. This method has several useful features: (i) it screens all submotifs so that the results are independent of the sequence order in the data; (ii) it allows the submotifs to contain spacers; (iii) it identifies an existing motif even if the data contains 'noise'; (iv) its running time depends linearly on the total length of the input. The method is demonstrated on two groups of protein-binding sequences: a well-studied group of known CRP-binding sequences, and a relatively newly identified group of genes known to be regulated by Lrp. The Lrp motif that we identify, based on 23 gene sequences, is similar to a previously identified motif based on a smaller data set, and to a consensus sequence of experimentally defined binding sites. Individual Lrp sites are evaluated and compared in regard to their regulation mode.

  17. [Subcloning and sequencing of DNA fragment related to salt tolerance in Sinorhizobium fredii RT19].

    Science.gov (United States)

    Bian, X L; Ge, S C; Yang, S S

    2000-01-01

    A 23 kb DNA fragment related to salt tolerance was obtained from the gene library of S. fredii strain RT19. In this study, BamH I was selected to digest 23 kb DNA fragment into different length of DNA fragments. The resulting fragments were ligated with plasmid pML122, then the recombinant plasmids were transformed to competent cells of E. coli S17-1 on selective medium and three transformants TR were obtained. Two-parental mating experiments were carried out with these transformants as donor and salt sensitive S. fredii strain RC3-3 as recipient, and the transconjugant BR2 was selected on FY plates containing gentamycin and 0.4 mol/L NaCl. Thus, a 4.4 kb DNA fragment related to salt tolerance was obtained. Based on its physical map, six restriction fragments were subcloned into plasmid pUC18 for DNA sequencing. Subsequently, sequencing and analysis of 4.4 kb DNA fragment showed that fixO, fixN genes and three ORFs were obtained.

  18. Human β satellite DNA: Genomic organization and sequence definition of a class of highly repetitive tandem DNA

    International Nuclear Information System (INIS)

    Waye, J.S.; Willard, H.F.

    1989-01-01

    The authors describe a class of human repetitive DNA, called β satellite, that, at a most fundamental level, exists as tandem arrays of diverged ∼68-base-pair monomer repeat units. The monomer units are organized as distinct subsets, each characterized by a multimeric higher-order repeat unit that is tandemly reiterated and represents a recent unit of amplification. They have cloned, characterized, and determined the sequence of two β satellite higher-order repeat units: one located on chromosome 9, the other on the acrocentric chromosomes (13, 14, 15, 21, and 22) and perhaps other sites in the genome. Analysis by pulsed-field gel electrophoresis reveals that these tandem arrays are localized in large domains that are marked by restriction fragment length polymorphisms. In total, β-satellite sequences comprise several million base pairs of DNA in the human genome. Analysis of this DNA family should permit insights into the nature of chromosome-specific and nonspecific modes of satellite DNA evolution and provide useful tools for probing the molecular organization and concerted evolution of the acrocentric chromosomes

  19. Detection of inter-spread repeat sequence in genomic DNA sequence.

    Science.gov (United States)

    Murakami, Hiroo; Sugaya, Nobuyoshi; Sato, Makihiko; Imaizumi, Akira; Aburatani, Sachiyo; Horimoto, Katsuhisa

    2004-01-01

    Various types of periodic patterns in nucleotide sequences are known to be very abundant in a genomic DNA sequence, and to play important biological roles such as gene expression, genome structural stabilization, and recombination. We present a new method, named "STEPSTONE", to find a specific periodic pattern of repeat sequence, inter-spread repeat, in which the tandem repeats of the conserved and the not-conserved regions appear periodically. In our method, at first, the data on periods of short repeat sequences found in a target sequence are stored as a hash data, and then are selected by application of an auto-correlation test in time series analysis. Among the statistically selected sequences, the inter-spread repeats are obtained by usual alignment procedures through two steps. To test the performance of our method, we examined the inter-spread repeats in Mycobacterium tuberculosis and Zamia paucijuga genomic sequences. As a result, our method exactly detected the repeats in the two sequences, being useful for identifying systematically the inter-spread repeats in DNA sequence.

  20. Optimization of primer specific filter metrics for the assessment of mitochondrial DNA sequence data

    Science.gov (United States)

    CURTIS, PAMELA C.; THOMAS, JENNIFER L.; PHILLIPS, NICOLE R.; ROBY, RHONDA K.

    2011-01-01

    Filter metrics are used as a quick assessment of sequence trace files in order to sort data into different categories, i.e. High Quality, Review, and Low Quality, without human intervention. The filter metrics consist of two numerical parameters for sequence quality assessment: trace score (TS) and contiguous read length (CRL). Primer specific settings for the TS and CRL were established using a calibration dataset of 2817 traces and validated using a concordance dataset of 5617 traces. Prior to optimization, 57% of the traces required manual review before import into a sequence analysis program, whereas after optimization only 28% of the traces required manual review. After optimization of primer specific filter metrics for mitochondrial DNA sequence data, an overall reduction of review of trace files translates into increased throughput of data analysis and decreased time required for manual review. PMID:21171863

  1. High-Throughput DNA sequencing of ancient wood.

    Science.gov (United States)

    Wagner, Stefanie; Lagane, Frédéric; Seguin-Orlando, Andaine; Schubert, Mikkel; Leroy, Thibault; Guichoux, Erwan; Chancerel, Emilie; Bech-Hebelstrup, Inger; Bernard, Vincent; Billard, Cyrille; Billaud, Yves; Bolliger, Matthias; Croutsch, Christophe; Čufar, Katarina; Eynaud, Frédérique; Heussner, Karl Uwe; Köninger, Joachim; Langenegger, Fabien; Leroy, Frédéric; Lima, Christine; Martinelli, Nicoletta; Momber, Garry; Billamboz, André; Nelle, Oliver; Palomo, Antoni; Piqué, Raquel; Ramstein, Marianne; Schweichel, Roswitha; Stäuble, Harald; Tegel, Willy; Terradas, Xavier; Verdin, Florence; Plomion, Christophe; Kremer, Antoine; Orlando, Ludovic

    2018-03-01

    Reconstructing the colonization and demographic dynamics that gave rise to extant forests is essential to forecasts of forest responses to environmental changes. Classical approaches to map how population of trees changed through space and time largely rely on pollen distribution patterns, with only a limited number of studies exploiting DNA molecules preserved in wooden tree archaeological and subfossil remains. Here, we advance such analyses by applying high-throughput (HTS) DNA sequencing to wood archaeological and subfossil material for the first time, using a comprehensive sample of 167 European white oak waterlogged remains spanning a large temporal (from 550 to 9,800 years) and geographical range across Europe. The successful characterization of the endogenous DNA and exogenous microbial DNA of 140 (~83%) samples helped the identification of environmental conditions favouring long-term DNA preservation in wood remains, and started to unveil the first trends in the DNA decay process in wood material. Additionally, the maternally inherited chloroplast haplotypes of 21 samples from three periods of forest human-induced use (Neolithic, Bronze Age and Middle Ages) were found to be consistent with those of modern populations growing in the same geographic areas. Our work paves the way for further studies aiming at using ancient DNA preserved in wood to reconstruct the micro-evolutionary response of trees to climate change and human forest management. © 2018 John Wiley & Sons Ltd.

  2. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    Science.gov (United States)

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  3. Design of sequence-specific DNA binding ligands that use a two-stranded peptide motif for DNA sequence recognition.

    Science.gov (United States)

    Nikolaev, V A; Grokhovsky, S L; Surovaya, A N; Leinsoo, T A; Sidorova NYu; Zasedatelev, A S; Zhuze, A L; Strahan, G A; Shafer, R H; Gursky, G V

    1996-08-01

    The design and DNA binding activity of beta-structure-forming peptides and netropsin-peptide conjugates are reported. It is found that a pair of peptides-S,S'-bis(Lys-Gly-Val-Cys-Val-NH-NH-Dns)-bridged by an S-S bond binds at least 10 times more strongly to poly(dG).poly(dC) than to poly(dA).poly(dT). This peptide can also discriminate between 5'-GpG-3' and 5'-GpC-3' steps in the DNA minor groove. Based on these observations, new synthetic ligands, bis-netropsins, were constructed in which two netropsin-like fragments were attached by means of short linkers to a pair of peptides-Gly-Cys-Gly- or Val-Cys-Val-bridged by S-S bonds. These compounds possess a composite binding specificity: the peptide chains recognize 5'-GpG-3' steps on DNA, whereas the netropsin-like fragments bind preferentially to runs of 4 AT base pairs. Our data indicate that combining the AT-base-pair specific properties of the netropsin-type structure with the 5'-GpG-3'-specific properties of certain oligopeptides offers a new approach to the synthesis of ligands capable of recognizing mixed sequences of AT- and GC-base pairs in the DNA minor groove. These compounds are potential models for DNA-binding domains in proteins which specifically recognize base pair sequences in the minor groove of DNA.

  4. Micropatterning stretched and aligned DNA for sequence-specific nanolithography

    Science.gov (United States)

    Petit, Cecilia Anna Paulette

    Techniques for fabricating nanostructured materials can be categorized as either "top-down" or "bottom-up". Top-down techniques use lithography and contact printing to create patterned surfaces and microfluidic channels that can corral and organize nanoscale structures, such as molecules and nanorods in contrast; bottom-up techniques use self-assembly or molecular recognition to direct the organization of materials. A central goal in nanotechnology is the integration of bottom-up and top-down assembly strategies for materials development, device design; and process integration. With this goal in mind, we have developed strategies that will allow this integration by using DNA as a template for nanofabrication; two top-down approaches allow the placement of these templates, while the bottom-up technique uses the specific sequence of bases to pattern materials along each strand of DNA. Our first top-down approach, termed combing of molecules in microchannels (COMMIC), produces microscopic patterns of stretched and aligned molecules of DNA on surfaces. This process consists of passing an air-water interface over end adsorbed molecules inside microfabricated channels. The geometry of the microchannel directs the placement of the DNA molecules, while the geometry of the airwater interface directs the local orientation and curvature of the molecules. We developed another top-down strategy for creating micropatterns of stretched and aligned DNA using surface chemistry. Because DNA stretching occurs on hydrophobic surfaces, this technique uses photolithography to pattern vinyl-terminated silanes on glass When these surface-, are immersed in DNA solution, molecules adhere preferentially to the silanized areas. This approach has also proven useful in patterning protein for cell adhesion studies. Finally, we describe the use of these stretched and aligned molecules of DNA as templates for the subsequent bottom-up construction of hetero-structures through hybridization

  5. [Cloning and sequence analysis of Eg95 cDNA from different stages of Echinococcus granulosus in Xinjiang].

    Science.gov (United States)

    Lin, Ren-yong; Ding, Jian-bing; Wen, Hao; Zhang, Wen-bao; Li, Jun; Lu, Xiao-mei

    2003-01-01

    To study expression and sequence differences of Echinococcus granulosus 95(Eg95) antigen cDNA from different stages of protoscolex, oncosphere and adult worm of E. granulosus from Xinjiang Uighur Aut. Reg. In accordance with the sequence of Eg95 antigen cDNA, the primers of Eg95 were designed. Eg95 antigen cDNAs were amplified by PCR from protoscolex, oncosphere and adult worm cDNA libraries of E. granulosus, respectively and were cloned into pUCm-T plasmid, and sequenced. The sequences were analyzed by DNAman and GenBank/BLAST biosoftware. PCR results showed that Eg95 antigen cDNA was amplified from three stages of E. granulosus cDNA libraries. Sequencing analysis indicated that the Eg95 cDNA length was 402 bp, same as the reported data in GenBank. The Eg95 antigen cDNA was expressed in the different life-cycle stages of E. granulosus in Xinjiang and there was no nucleic acid sequence difference of Eg95 antigen among the protoscolex, oncosphere and adult worm of E. granulosus.

  6. The influence of DNA sequence on epigenome-induced pathologies

    Directory of Open Access Journals (Sweden)

    Meagher Richard B

    2012-07-01

    Full Text Available Abstract Clear cause-and-effect relationships are commonly established between genotype and the inherited risk of acquiring human and plant diseases and aberrant phenotypes. By contrast, few such cause-and-effect relationships are established linking a chromatin structure (that is, the epitype with the transgenerational risk of acquiring a disease or abnormal phenotype. It is not entirely clear how epitypes are inherited from parent to offspring as populations evolve, even though epigenetics is proposed to be fundamental to evolution and the likelihood of acquiring many diseases. This article explores the hypothesis that, for transgenerationally inherited chromatin structures, “genotype predisposes epitype”, and that epitype functions as a modifier of gene expression within the classical central dogma of molecular biology. Evidence for the causal contribution of genotype to inherited epitypes and epigenetic risk comes primarily from two different kinds of studies discussed herein. The first and direct method of research proceeds by the examination of the transgenerational inheritance of epitype and the penetrance of phenotype among genetically related individuals. The second approach identifies epitypes that are duplicated (as DNA sequences are duplicated and evolutionarily conserved among repeated patterns in the DNA sequence. The body of this article summarizes particularly robust examples of these studies from humans, mice, Arabidopsis, and other organisms. The bulk of the data from both areas of research support the hypothesis that genotypes predispose the likelihood of displaying various epitypes, but for only a few classes of epitype. This analysis suggests that renewed efforts are needed in identifying polymorphic DNA sequences that determine variable nucleosome positioning and DNA methylation as the primary cause of inherited epigenome-induced pathologies. By contrast, there is very little evidence that DNA sequence directly

  7. Pericentric satellite DNA sequences in Pipistrellus pipistrellus (Vespertilionidae; Chiroptera).

    Science.gov (United States)

    Barragán, M J L; Martínez, S; Marchal, J A; Fernández, R; Bullejos, M; Díaz de la Guardia, R; Sánchez, A

    2003-09-01

    This paper reports the molecular and cytogenetic characterization of a HindIII family of satellite DNA in the bat species Pipistrellus pipistrellus. This satellite is organized in tandem repeats of 418 bp monomer units, and represents approximately 3% of the whole genome. The consensus sequence from five cloned monomer units has an A-T content of 62.20%. We have found differences in the ladder pattern of bands between two populations of the same species. These differences are probably because of the absence of the target sites for the HindIII enzyme in most monomer units of one population, but not in the other. Fluorescent in situ hybridization (FISH) localized the satellite DNA in the pericentromeric regions of all autosomes and the X chromosome, but it was absent from the Y chromosome. Digestion of genomic DNAs with HpaII and its isoschizomer MspI demonstrated that these repetitive DNA sequences are not methylated. Other bat species were tested for the presence of this repetitive DNA. It was absent in five Vespertilionidae and one Rhinolophidae species, indicating that it could be a species/genus specific, repetitive DNA family.

  8. Early Lyme disease with spirochetemia - diagnosed by DNA sequencing

    Directory of Open Access Journals (Sweden)

    Jones William

    2010-11-01

    Full Text Available Abstract Background A sensitive and analytically specific nucleic acid amplification test (NAAT is valuable in confirming the diagnosis of early Lyme disease at the stage of spirochetemia. Findings Venous blood drawn from patients with clinical presentations of Lyme disease was tested for the standard 2-tier screen and Western Blot serology assay for Lyme disease, and also by a nested polymerase chain reaction (PCR for B. burgdorferi sensu lato 16S ribosomal DNA. The PCR amplicon was sequenced for B. burgdorferi genomic DNA validation. A total of 130 patients visiting emergency room (ER or Walk-in clinic (WALKIN, and 333 patients referred through the private physicians' offices were studied. While 5.4% of the ER/WALKIN patients showed DNA evidence of spirochetemia, none (0% of the patients referred from private physicians' offices were DNA-positive. In contrast, while 8.4% of the patients referred from private physicians' offices were positive for the 2-tier Lyme serology assay, only 1.5% of the ER/WALKIN patients were positive for this antibody test. The 2-tier serology assay missed 85.7% of the cases of early Lyme disease with spirochetemia. The latter diagnosis was confirmed by DNA sequencing. Conclusion Nested PCR followed by automated DNA sequencing is a valuable supplement to the standard 2-tier antibody assay in the diagnosis of early Lyme disease with spirochetemia. The best time to test for Lyme spirochetemia is when the patients living in the Lyme disease endemic areas develop unexplained symptoms or clinical manifestations that are consistent with Lyme disease early in the course of their illness.

  9. Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Matt J Cahill

    Full Text Available BACKGROUND: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. METHODOLOGY/PRINCIPAL FINDINGS: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. CONCLUSIONS: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length.

  10. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.

    2010-07-12

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  11. Phylogenetic relationships of the Gomphales based on nuc-25S-rDNA, mit-12S-rDNA, and mit-atp6-DNA combined sequences

    Science.gov (United States)

    Admir J. Giachini; Kentaro Hosaka; Eduardo Nouhra; Joseph Spatafora; James M. Trappe

    2010-01-01

    Phylogenetic relationships among Geastrales, Gomphales, Hysterangiales, and Phallales were estimated via combined sequences: nuclear large subunit ribosomal DNA (nuc-25S-rDNA), mitochondrial small subunit ribosomal DNA (mit-12S-rDNA), and mitochondrial atp6 DNA (mit-atp6-DNA). Eighty-one taxa comprising 19 genera and 58 species...

  12. Differential diagnosis of genetic disease by DNA restriction fragment length polymorphisms

    NARCIS (Netherlands)

    Bolhuis, P. A.; Defesche, J. C.; van der Helm, H. J.

    1987-01-01

    DNA restriction fragment length polymorphisms (RFLPs) are used for diagnosis of genetic disease in families known to be affected by specific disorders, but RFLPs can be also useful for the differential diagnosis of hereditary disease. An RFLP pattern represents the inheritance of chromosomal markers

  13. Rational Design of High-Number dsDNA Fragments Based on Thermodynamics for the Construction of Full-Length Genes in a Single Reaction.

    Directory of Open Access Journals (Sweden)

    Bhagyashree S Birla

    Full Text Available Gene synthesis is frequently used in modern molecular biology research either to create novel genes or to obtain natural genes when the synthesis approach is more flexible and reliable than cloning. DNA chemical synthesis has limits on both its length and yield, thus full-length genes have to be hierarchically constructed from synthesized DNA fragments. Gibson Assembly and its derivatives are the simplest methods to assemble multiple double-stranded DNA fragments. Currently, up to 12 dsDNA fragments can be assembled at once with Gibson Assembly according to its vendor. In practice, the number of dsDNA fragments that can be assembled in a single reaction are much lower. We have developed a rational design method for gene construction that allows high-number dsDNA fragments to be assembled into full-length genes in a single reaction. Using this new design method and a modified version of the Gibson Assembly protocol, we have assembled 3 different genes from up to 45 dsDNA fragments at once. Our design method uses the thermodynamic analysis software Picky that identifies all unique junctions in a gene where consecutive DNA fragments are specifically made to connect to each other. Our novel method is generally applicable to most gene sequences, and can improve both the efficiency and cost of gene assembly.

  14. Rational Design of High-Number dsDNA Fragments Based on Thermodynamics for the Construction of Full-Length Genes in a Single Reaction.

    Science.gov (United States)

    Birla, Bhagyashree S; Chou, Hui-Hsien

    2015-01-01

    Gene synthesis is frequently used in modern molecular biology research either to create novel genes or to obtain natural genes when the synthesis approach is more flexible and reliable than cloning. DNA chemical synthesis has limits on both its length and yield, thus full-length genes have to be hierarchically constructed from synthesized DNA fragments. Gibson Assembly and its derivatives are the simplest methods to assemble multiple double-stranded DNA fragments. Currently, up to 12 dsDNA fragments can be assembled at once with Gibson Assembly according to its vendor. In practice, the number of dsDNA fragments that can be assembled in a single reaction are much lower. We have developed a rational design method for gene construction that allows high-number dsDNA fragments to be assembled into full-length genes in a single reaction. Using this new design method and a modified version of the Gibson Assembly protocol, we have assembled 3 different genes from up to 45 dsDNA fragments at once. Our design method uses the thermodynamic analysis software Picky that identifies all unique junctions in a gene where consecutive DNA fragments are specifically made to connect to each other. Our novel method is generally applicable to most gene sequences, and can improve both the efficiency and cost of gene assembly.

  15. Obtaining long 16S rDNA sequences using multiple primers and its application on dioxin-containing samples.

    Science.gov (United States)

    Chen, Yi-Lin; Lee, Chuan-Chun; Lin, Ya-Lan; Yin, Kai-Min; Ho, Chung-Liang; Liu, Tsunglin

    2015-01-01

    Next-generation sequencing (NGS) technology has transformed metagenomics because the high-throughput data allow an in-depth exploration of a complex microbial community. However, accurate species identification with NGS data is challenging because NGS sequences are relatively short. Assembling 16S rDNA segments into longer sequences has been proposed for improving species identification. Current approaches, however, either suffer from amplification bias due to one single primer or insufficient 16S rDNA reads in whole genome sequencing data. Multiple primers were used to amplify different 16S rDNA segments for 454 sequencing, followed by 454 read classification and assembly. This permitted targeted sequencing while reducing primer bias. For test samples containing four known bacteria, accurate and near full-length 16S rDNAs of three known bacteria were obtained. For real soil and sediment samples containing dioxins in various concentrations, 16S rDNA sequences were lengthened by 50% for about half of the non-rare microbes, and 16S rDNAs of several microbes reached more than 1000 bp. In addition, reduced primer bias using multiple primers was illustrated. A new experimental and computational pipeline for obtaining long 16S rDNA sequences was proposed. The capability of the pipeline was validated on test samples and illustrated on real samples. For dioxin-containing samples, the pipeline revealed several microbes suitable for future studies of dioxin chemistry.

  16. Construction and characterization of a full-length cDNA library for the wheat stripe rust pathogen (Puccinia striiformis f. sp. tritici

    Directory of Open Access Journals (Sweden)

    Chen Xianming

    2007-06-01

    Full Text Available Abstract Background Puccinia striiformis is a plant pathogenic fungus causing stripe rust, one of the most important diseases on cereal crops and grasses worldwide. However, little is know about its genome and genes involved in the biology and pathogenicity of the pathogen. We initiated the functional genomic research of the fungus by constructing a full-length cDNA and determined functions of the first group of genes by sequence comparison of cDNA clones to genes reported in other fungi. Results A full-length cDNA library, consisting of 42,240 clones with an average cDNA insert of 1.9 kb, was constructed using urediniospores of race PST-78 of P. striiformis f. sp. tritici. From 196 sequenced cDNA clones, we determined functions of 73 clones (37.2%. In addition, 36 clones (18.4% had significant homology to hypothetical proteins, 37 clones (18.9% had some homology to genes in other fungi, and the remaining 50 clones (25.5% did not produce any hits. From the 73 clones with functions, we identified 51 different genes encoding protein products that are involved in amino acid metabolism, cell defense, cell cycle, cell signaling, cell structure and growth, energy cycle, lipid and nucleotide metabolism, protein modification, ribosomal protein complex, sugar metabolism, transcription factor, transport metabolism, and virulence/infection. Conclusion The full-length cDNA library is useful in identifying functional genes of P. striiformis.

  17. Construction and characterization of a full-length cDNA library for the wheat stripe rust pathogen (Puccinia striiformis f. sp. tritici).

    Science.gov (United States)

    Ling, Peng; Wang, Meinan; Chen, Xianming; Campbell, Kimberly Garland

    2007-06-04

    Puccinia striiformis is a plant pathogenic fungus causing stripe rust, one of the most important diseases on cereal crops and grasses worldwide. However, little is know about its genome and genes involved in the biology and pathogenicity of the pathogen. We initiated the functional genomic research of the fungus by constructing a full-length cDNA and determined functions of the first group of genes by sequence comparison of cDNA clones to genes reported in other fungi. A full-length cDNA library, consisting of 42,240 clones with an average cDNA insert of 1.9 kb, was constructed using urediniospores of race PST-78 of P. striiformis f. sp. tritici. From 196 sequenced cDNA clones, we determined functions of 73 clones (37.2%). In addition, 36 clones (18.4%) had significant homology to hypothetical proteins, 37 clones (18.9%) had some homology to genes in other fungi, and the remaining 50 clones (25.5%) did not produce any hits. From the 73 clones with functions, we identified 51 different genes encoding protein products that are involved in amino acid metabolism, cell defense, cell cycle, cell signaling, cell structure and growth, energy cycle, lipid and nucleotide metabolism, protein modification, ribosomal protein complex, sugar metabolism, transcription factor, transport metabolism, and virulence/infection. The full-length cDNA library is useful in identifying functional genes of P. striiformis.

  18. cDNA sequences of two inducible T-cell genes

    Energy Technology Data Exchange (ETDEWEB)

    Kwon, B.S. (Indiana Univ. School of Medicine, Indianapolis (USA) Guthrie Research Institute, Sayre, PA (USA)); Weissman, S.M. (Yale Univ., New Haven, CT (USA))

    1989-03-01

    The authors have previously described a set of human T-lymphocyte-specific cDNA clones isolated by a modified differential screening procedure. Apparent full-length cDNAs containing the sequences of 14 of the 16 initial isolates were sequenced and were found to represent five different species of mRNA; three of the five species were identical to previously reported cDNA sequences of preproenkephalin, T-cell-replacing factor, and a serine esterase, respectively. The other two species, 4-1BB and L2G25B, were inducible sequences found in mRNA from both a cytolytic T-lymphocyte and a helper T-lymphocyte clone and were not previously described in T-cell mRNA; these mRNA sequences encode peptides of 256 and 92 amino acids, respectively. Both peptides contain putative leader sequences. The protein encoded by 4-1BB also has a potential membrane anchor segment and other features also seen in known receptor proteins.

  19. Taxonomy and phylogeny of the genus citrus based on the nuclear ribosomal dna its region sequence

    International Nuclear Information System (INIS)

    Sun, Y.L.

    2015-01-01

    The genus Citrus (Aurantioideae, Rutaceae) is the sole source of the citrus fruits of commerce showing high economic values. In this study, the taxonomy and phylogeny of Citrus species is evaluated using sequence analysis of the ITS region of nrDNA. This study is based on 26 plants materials belonging to 22 Citrus species having wild, domesticated, and cultivated species. Through DNA alignment of the ITS sequence, ITS1 and ITS2 regions showed relatively high variations of sequence length and nucleotide among these Citrus species. According to previous six-tribe discrimination theory by Swingle and Reece, the grouping in our ITS phylogenetic tree reconstructed by ITS sequences was not related to tribe discrimination but species discrimination. However, the molecular analysis could provide more information on citrus taxonomy. Combined with ITS sequences of other subgenera in then true citrus fruit tree group, the ITS phylogenetic tree indicated subgenera Citrus was monophyletic and nearer to Fortunella, Poncirus, and Clymenia compared to Microcitrus and Eremocitrus. Abundant sequence variations of the ITS region shown in this study would help species identification and tribe differentiation of the genus Citrus. (author)

  20. The DNA sequence, annotation and analysis of human chromosome 3

    DEFF Research Database (Denmark)

    Muzny, D.M.; Bolund, Lars; As part of the Chinese Human Genome Sequencing Consortium, E.T.A.L.

    2006-01-01

    chromosomes. Chromosome 3 comprises just four contigs, one of which currently represents the longest unbroken stretch of finished DNA sequence known so far. The chromosome is remarkable in having the lowest rate of segmental duplication in the genome. It also includes a chemokine receptor gene cluster as well...... as numerous loci involved in multiple human cancers such as the gene encoding FHIT, which contains the most common constitutive fragile site in the genome, FRA3B. Using genomic sequence from chimpanzee and rhesus macaque, we were able to characterize the breakpoints defining a large pericentric inversion...

  1. Human fetal globin DNA sequences suggest novel conversion event.

    OpenAIRE

    Stoeckert, C J; Collins, F S; Weissman, S M

    1984-01-01

    DNA sequencing studies of two recently cloned human A gamma globin alleles has revealed a number of base differences which are clustered in the large intron (IVS-2). One allele has a previously undescribed IVS-2 sequence. Most of the allelic differences can be explained as resulting from a gene conversion event involving G gamma as a donor. A novel feature of this event is that three G gamma-like regions occur interspersed among unconverted areas of the A gamma gene. We propose that an altern...

  2. Digital Droplet Multiple Displacement Amplification (ddMDA for Whole Genome Sequencing of Limited DNA Samples.

    Directory of Open Access Journals (Sweden)

    Minsoung Rhee

    Full Text Available Multiple displacement amplification (MDA is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently, the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet, ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology.

  3. Effect of dephasing on DNA sequencing via transverse electronic transport

    Energy Technology Data Exchange (ETDEWEB)

    Zwolak, Michael [Los Alamos National Laboratory; Krems, Matt [NON LANL; Pershin, Yuriy V [NON LANL; Di Ventra, Massimiliano [NON LANL

    2009-01-01

    We study theoretically the effects of dephasing on DNA sequencing in a nanopore via transverse electronic transport. To do this, we couple classical molecular dynamics simulations with transport calculations using scattering theory. Previous studies, which did not include dephasing, have shown that by measuring the transverse current of a particular base multiple times, one can get distributions of currents for each base that are distinguishable. We introduce a dephasing parameter into transport calculations to simulate the effects of the ions and other fluctuations. These effects lower the overall magnitude of the current, but have little effect on the current distributions themselves. The results of this work further implicate that distinguishing DNA bases via transverse electronic transport has potential as a sequencing tool.

  4. Formation and analysis of topographical domains between lipid membranes tethered by DNA hybrids of different lengths.

    Science.gov (United States)

    Chung, Minsub; Koo, Bon Jun; Boxer, Steven G

    2013-01-01

    We recently described a strategy to prepare DNA-tethered lipid membranes either to fixed DNA on a surface or to DNA displayed on a supported bilayer [Boxer et al., J. Struct. Biol., 2009, 168, 190; Boxer et al., Langmuir, 2011, 27, 5492]. With the latter system, the DNA hybrids are laterally mobile; when orthogonal sense-antisense pairs of different lengths are used, the DNA hybrids segregate by height and the tethered membrane deforms to accommodate the height difference. This architecture is particularly useful for modelling interactions between membranes mediated by molecular recognition and resembles cell-to-cell junctions. The length, affinity and population of the DNA hybrids between the membranes are completely controllable. Interesting patterns of height segregation are observed by fluorescence interference contrast microscopy. Diverse behavior is observed in the segregation and pattern forming process and possible mechanisms are discussed. This model system captures some of the essential physics of synapse formation and is a step towards understanding lipid membrane behaviour in cell-to-cell junctions.

  5. Influence of DNA extraction methods on relative telomere length measurements and its impact on epidemiological studies.

    Science.gov (United States)

    Raschenberger, Julia; Lamina, Claudia; Haun, Margot; Kollerits, Barbara; Coassin, Stefan; Boes, Eva; Kedenko, Ludmilla; Köttgen, Anna; Kronenberg, Florian

    2016-05-03

    Measurement of telomere length is widely used in epidemiologic studies. Insufficient standardization of the measurements processes has, however, complicated the comparison of results between studies. We aimed to investigate whether DNA extraction methods have an influence on measured values of relative telomere length (RTL) and whether this has consequences for epidemiological studies. We performed four experiments with RTL measurement in quadruplicate by qPCR using DNA extracted with different methods: 1) a standardized validation experiment including three extraction methods (magnetic-particle-method EZ1, salting-out-method INV, phenol-chloroform-isoamyl-alcohol PCI) each in the same 20 samples demonstrated pronounced differences in RTL with lowest values with EZ1 followed by INV and PCI-isolated DNA; 2) a comparison of 307 samples from an epidemiological study showing EZ1-measurements 40% lower than INV-measurements; 3) a matching-approach of two similar non-diseased control groups including 143 pairs of subjects revealed significantly shorter RTL in EZ1 than INV-extracted DNA (0.844 ± 0.157 vs. 1.357 ± 0.242); 4) an association analysis of RTL with prevalent cardiovascular disease detected a stronger association with INV than with EZ1-extracted DNA. In summary, DNA extraction methods have a pronounced influence on the measured RTL-values. This might result in spurious or lost associations in epidemiological studies under certain circumstances.

  6. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure tha...

  7. Rapid DNA sequencing by horizontal ultrathin gel electrophoresis.

    OpenAIRE

    Brumley, R L; Smith, L M

    1991-01-01

    A horizontal polyacrylamide gel electrophoresis apparatus has been developed that decreases the time required to separate the DNA fragments produced in enzymatic sequencing reactions. The configuration of this apparatus and the use of circulating coolant directly under the glass plates result in heat exchange that is approximately nine times more efficient than passive thermal transfer methods commonly used. Bubble-free gels as thin as 25 microns can be routinely cast on this device. The appl...

  8. Development of a defined-sequence DNA system for use in DNA misrepair studies

    International Nuclear Information System (INIS)

    Sutton, S.; Tobias, C.A.

    1984-01-01

    The authors have developed a system that allows them to study cellular DNA repair processes at the molecular level. In particular, the authors are using this system to examine the consequences of a misrepair of radiation-induced DNA damage, as a function of dose. The cells being used are specially engineered haploid yeast cells. Maintained in the cells, at one copy per cell, is a cen plasmid, a plasmid that behaves like a functional chromosome. This plasmid carries a small defined sequence of DNA from the E. coli lac z gene. It is this lac z region (called the alpha region) that serves as the target for radiation damage. Two copies of the complimentary portion of the lac z gene are integrated into the yeast genome. Irradiated cells are screened for possible mutation in the alpha region by testing the cells' ability to hydrolyze xgal, a lactose substrate. The DNA of interest is then extracted from the cells, sequenced, and the sequence is compared to that of the control. Unlike the usual defined-sequence DNA systems, theirs is an in vivo system. A disadvantage is the relatively high background mutation rate. Results achieved with this system, as well as future applications, are discussed

  9. Gene length and detection bias in single cell RNA sequencing protocols [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Belinda Phipson

    2017-04-01

    Full Text Available Background: Single cell RNA sequencing (scRNA-seq has rapidly gained popularity for profiling transcriptomes of hundreds to thousands of single cells. This technology has led to the discovery of novel cell types and revealed insights into the development of complex tissues. However, many technical challenges need to be overcome during data generation. Due to minute amounts of starting material, samples undergo extensive amplification, increasing technical variability. A solution for mitigating amplification biases is to include unique molecular identifiers (UMIs, which tag individual molecules. Transcript abundances are then estimated from the number of unique UMIs aligning to a specific gene, with PCR duplicates resulting in copies of the UMI not included in expression estimates. Methods: Here we investigate the effect of gene length bias in scRNA-Seq across a variety of datasets that differ in terms of capture technology, library preparation, cell types and species. Results: We find that scRNA-seq datasets that have been sequenced using a full-length transcript protocol exhibit gene length bias akin to bulk RNA-seq data. Specifically, shorter genes tend to have lower counts and a higher rate of dropout. In contrast, protocols that include UMIs do not exhibit gene length bias, with a mostly uniform rate of dropout across genes of varying length. Across four different scRNA-Seq datasets profiling mouse embryonic stem cells (mESCs, we found the subset of genes that are only detected in the UMI datasets tended to be shorter, while the subset of genes detected only in the full-length datasets tended to be longer. Conclusions: We find that the choice of scRNA-seq protocol influences the detection rate of genes, and that full-length datasets exhibit gene-length bias. In addition, despite clear differences between UMI and full-length transcript data, we illustrate that full-length and UMI data can be combined to reveal the underlying biology

  10. Complete chloroplast DNA sequence from a Korean endemic genus, Megaleranthis saniculifolia, and its evolutionary implications.

    Science.gov (United States)

    Kim, Young-Kyu; Park, Chong-wook; Kim, Ki-Joong

    2009-03-31

    The chloroplast DNA sequences of Megaleranthis saniculifolia, an endemic and monotypic endangered plant species, were completed in this study (GenBank FJ597983). The genome is 159,924 bp in length. It harbors a pair of IR regions consisting of 26,608 bp each. The lengths of the LSC and SSC regions are 88,326 bp and 18,382 bp, respectively. The structural organizations, gene and intron contents, gene orders, AT contents, codon usages, and transcription units of the Megaleranthis chloroplast genome are similar to those of typical land plant cp DNAs. However, the detailed features of Megaleranthis chloroplast genomes are substantially different from that of Ranunculus, which belongs to the same family, the Ranunculaceae. First, the Megaleranthis cp DNA was 4,797 bp longer than that of Ranunculus due to an expanded IR region into the SSC region and duplicated sequence elements in several spacer regions of the Megaleranthis cp genome. Second, the chloroplast genomes of Megaleranthis and Ranunculus evidence 5.6% sequence divergence in the coding regions, 8.9% sequence divergence in the intron regions, and 18.7% sequence divergence in the intergenic spacer regions, respectively. In both the coding and noncoding regions, average nucleotide substitution rates differed markedly, depending on the genome position. Our data strongly implicate the positional effects of the evolutionary modes of chloroplast genes. The genes evidencing higher levels of base substitutions also have higher incidences of indel mutations and low Ka/Ks ratios. A total of 54 simple sequence repeat loci were identified from the Megaleranthis cp genome. The existence of rich cp SSR loci in the Megaleranthis cp genome provides a rare opportunity to study the population genetic structures of this endangered species. Our phylogenetic trees based on the two independent markers, the nuclear ITS and chloroplast matK sequences, strongly support the inclusion of the Megaleranthis to the Trollius. Therefore, our

  11. ThreaDNA: predicting DNA mechanics' contribution to sequence selectivity of proteins along whole genomes.

    Science.gov (United States)

    Cevost, Jasmin; Vaillant, Cédric; Meyer, Sam; Rost, Burkhard

    2018-02-15

    Many DNA-binding proteins recognize their target sequences indirectly, by sensing DNA's response to mechanical distortion. ThreaDNA estimates this response based on high-resolution structures of the protein-DNA complex of interest. Implementing an efficient nanoscale modeling of DNA deformations involving essentially no adjustable parameters, it returns the profile of deformation energy along whole genomes, at base-pair resolution, within minutes on usual laptop/desktop computers. Our predictions can also be easily combined with estimations of direct selectivity through a generalized form of position-weight-matrices. The formalism of ThreaDNA is accessible to a wide audience. We demonstrate the importance of indirect readout for the nucleosome as well as the bacterial regulators Fis and CRP. Combined with the direct contribution provided by usual sequence motifs, it significantly improves the prediction of sequence selectivity, and allows quantifying the two distinct physical mechanisms underlying it. Python software available at bioinfo.insa-lyon.fr, natively executable on Linux/MacOS systems with a user-friendly graphical interface. Galaxy webserver version available. sam.meyer@insa-lyon.fr. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  12. Sequence Dependent Electrophoretic Separations of DNA in Pluronic F127 Gels

    Science.gov (United States)

    You, Seungyong; van Winkle, David H.

    2010-03-01

    Two-dimensional (2-D) electrophoresis has successfully been used to visualize the separation of DNA fragments of the same length. We electrophorese a double-stranded DNA ladder in an Agarose gel for the first dimension and in gels of Pluronic F127 for the second dimension at room temperature. The 1000 bp band that travels together as a single band in an Agarose gel is split into two bands in Pluronic gels. The slower band follows the exponential decay trend that the other ladder constituents do. After sequencing the DNA fragments, the faster band has an apparently random sequence, while the slower band and the others have two A-tracts in each 250 bp segment. The A-tracts consist of a series of at least five adenine bases pairing with thymine bases. This result leads to the conclusion that the migration of the DNA molecules bent with A-tracts is more retarded in Pluronic gels than the wild-type of DNA molecules.

  13. Comparison of DNA Quantification Methods for Next Generation Sequencing.

    Science.gov (United States)

    Robin, Jérôme D; Ludlow, Andrew T; LaRanger, Ryan; Wright, Woodring E; Shay, Jerry W

    2016-04-06

    Next Generation Sequencing (NGS) is a powerful tool that depends on loading a precise amount of DNA onto a flowcell. NGS strategies have expanded our ability to investigate genomic phenomena by referencing mutations in cancer and diseases through large-scale genotyping, developing methods to map rare chromatin interactions (4C; 5C and Hi-C) and identifying chromatin features associated with regulatory elements (ChIP-seq, Bis-Seq, ChiA-PET). While many methods are available for DNA library quantification, there is no unambiguous gold standard. Most techniques use PCR to amplify DNA libraries to obtain sufficient quantities for optical density measurement. However, increased PCR cycles can distort the library's heterogeneity and prevent the detection of rare variants. In this analysis, we compared new digital PCR technologies (droplet digital PCR; ddPCR, ddPCR-Tail) with standard methods for the titration of NGS libraries. DdPCR-Tail is comparable to qPCR and fluorometry (QuBit) and allows sensitive quantification by analysis of barcode repartition after sequencing of multiplexed samples. This study provides a direct comparison between quantification methods throughout a complete sequencing experiment and provides the impetus to use ddPCR-based quantification for improvement of NGS quality.

  14. Inferring relative proportions of DNA variants from sequencing electropherograms.

    Science.gov (United States)

    Carr, I M; Robinson, J I; Dimitriou, R; Markham, A F; Morgan, A W; Bonthron, D T

    2009-12-15

    Determination of the relative copy number of single-nucleotide sequence variants (SNVs) within a DNA sample is a frequent experimental goal. Various methods can be applied to this problem, although hybridization-based approaches tend to suffer from high-setup cost and poor adaptability, while others (such as pyrosequencing) may not be accessible to all laboratories. The potential to extract relative copy number information from standard dye-terminator electropherograms has been little explored, yet this technology is cheap and widely accessible. Since several biologically important loci have paralogous copies that interfere with genotyping, and which may also display copy number variation (CNV), there are many situations in which determination of the relative copy number of SNVs is desirable. We have developed a desktop application, QSVanalyzer, which allows high-throughput quantification of the proportions of DNA sequences containing SNVs. In reconstruction experiments, QSVanalyzer accurately estimated the known relative proportions of SNVs. By analyzing a large panel of genomic DNA samples, we demonstrate the ability of the software to analyze not only common biallelic SNVs, but also SNVs within a locus at which gene conversion between four genomic paralogs operates, and within another that is subject to CNV. QSVanalyzer is freely available at http://dna.leeds.ac.uk/qsv/. It requires the Microsoft .NET framework version 2.0, which can be installed on all Microsoft operating systems from Windows 98 onwards. msjimc@leeds.ac.uk Supplementary data are available at Bioinformatics online.

  15. Length variation in the internal transcribed spacers of ribosomal DNA in Picea abies and related species.

    Science.gov (United States)

    Karvonen, P; Szmidt, A E; Savolainen, O

    1994-12-01

    The structure and variation of nuclear ribosomal DNA (rDNA) units of Picea abies, (L.) Karst. was studied by restriction mapping and Southern hybridization. Conspicuous length variation was found in the internal transcribed spacer (ITS) region of P. abies, although the length of this region is highly conserved both within and among most of the plant species. Two types of ITS variants (A and B), displaying a size difference of 0.5 kb in the ITS2 region, were present within individuals of P. abies from Sweden, Central Europe and Siberia. A preliminary survey of 14 additional Eurasian and North American species of Picea suggested that length variation in the ITS region is widespread in this genus. Alltogether three length variants (A, B and C) were identified. Within individuals of eight Picea species, two length variants were present within the genome (combinations of A and B variants in P. glehnii, P. maximowiczii, P. omorika, P. polita and P. sitchensis and variants B and C in P. jezoensis, P. likiangensis and P. spinulosa). Within individuals from five species, however only one rDNA variant was present in their genome (variant A in P. aurantiaca, P. engelmannii, P. glauca, P. koraiensis and P. koyamai; variant B in P. bicolor). The ITS length variation will be useful as a molecular marker in evolutionary studies of the Picea species complex, whose phylogeny is controversial. The presence of intraindividual variation in, and shared polymorphism of the, ITS length variants raises questions about the occurrence of interspecific hybridization during the evolutionary history of Picea.

  16. Targeted DNA methylation analysis by next-generation sequencing.

    Science.gov (United States)

    Masser, Dustin R; Stanford, David R; Freeman, Willard M

    2015-02-24

    The role of epigenetic processes in the control of gene expression has been known for a number of years. DNA methylation at cytosine residues is of particular interest for epigenetic studies as it has been demonstrated to be both a long lasting and a dynamic regulator of gene expression. Efforts to examine epigenetic changes in health and disease have been hindered by the lack of high-throughput, quantitatively accurate methods. With the advent and popularization of next-generation sequencing (NGS) technologies, these tools are now being applied to epigenomics in addition to existing genomic and transcriptomic methodologies. For epigenetic investigations of cytosine methylation where regions of interest, such as specific gene promoters or CpG islands, have been identified and there is a need to examine significant numbers of samples with high quantitative accuracy, we have developed a method called Bisulfite Amplicon Sequencing (BSAS). This method combines bisulfite conversion with targeted amplification of regions of interest, transposome-mediated library construction and benchtop NGS. BSAS offers a rapid and efficient method for analysis of up to 10 kb of targeted regions in up to 96 samples at a time that can be performed by most research groups with basic molecular biology skills. The results provide absolute quantitation of cytosine methylation with base specificity. BSAS can be applied to any genomic region from any DNA source. This method is useful for hypothesis testing studies of target regions of interest as well as confirmation of regions identified in genome-wide methylation analyses such as whole genome bisulfite sequencing, reduced representation bisulfite sequencing, and methylated DNA immunoprecipitation sequencing.

  17. DNA-directed alkylating ligands as potential antitumor agents: sequence specificity of alkylation by intercalating aniline mustards.

    Science.gov (United States)

    Prakash, A S; Denny, W A; Gourdie, T A; Valu, K K; Woodgate, P D; Wakelin, L P

    1990-10-23

    The sequence preferences for alkylation of a series of novel parasubstituted aniline mustards linked to the DNA-intercalating chromophore 9-aminoacridine by an alkyl chain of variable length were studied by using procedures analogous to Maxam-Gilbert reactions. The compounds alkylate DNA at both guanine and adenine sites. For mustards linked to the acridine by a short alkyl chain through a para O- or S-link group, 5'-GT sequences are the most preferred sites at which N7-guanine alkylation occurs. For analogues with longer chain lengths, the preference of 5'-GT sequences diminishes in favor of N7-adenine alkylation at the complementary 5'-AC sequence. Magnesium ions are shown to selectively inhibit alkylation at the N7 of adenine (in the major groove) by these compounds but not the alkylation at the N3 of adenine (in the minor groove) by the antitumor antibiotic CC-1065. Effects of chromophore variation were also studied by using aniline mustards linked to quinazoline and sterically hindered tert-butyl-9-aminoacridine chromophores. The results demonstrate that in this series of DNA-directed mustards the noncovalent interactions of the carrier chromophores with DNA significantly modify the sequence selectivity of alkylation by the mustard. Relationships between the DNA alkylation patterns of these compounds and their biological activities are discussed.

  18. A unique DNA repair and recombination gene (recN) sequence for ...

    Indian Academy of Sciences (India)

    2013-04-23

    Apr 23, 2013 ... A unique DNA repair and recombination gene (recN) sequence for identification and intraspecific molecular typing of bacterial wilt pathogen Ralstonia solanacearum and its comparative analysis with ribosomal DNA sequences. AUNDY KUMAR. 1,*, THEKKAN PUTHIYAVEEDU PRAMEELA.

  19. From the chromosome to DNA: Restriction fragment length polymorphism analysis and its clinical application.

    Science.gov (United States)

    Todd, R; Donoff, R B; Kim, Y; Wong, D T

    2001-06-01

    Understanding how chromosomal alterations contribute to acquired and inherited human disease requires the ability to manage the enormous physical and informational complexity of the deoxyribonucleic acid (DNA) packaged within. Important concepts and techniques involved in the analysis of DNA include restriction enzymes, Southern blotting, and restriction fragment length polymorphism/linkage analysis. These techniques have been essential in the understanding and diagnosis of several syndromes associated with the head and neck. The purpose of this article is to introduce DNA structure, describe some techniques fundamental to DNA analysis, and provide a brief overview of the clinical applications of this technology with respect to dentinogenesis imperfecta and oral field cancerization. Copyright 2001 American Association of Oral and Maxillofacial Surgeons.

  20. Bisulfite sequencing of chromatin immunoprecipitated DNA (BisChIP-seq) directly informs methylation status of histone-modified DNA

    NARCIS (Netherlands)

    Statham, A.L.; Robinson, M.D.; Song, J.Z.; Coolen, M.W.; Stirzaker, C.; Clark, S. J.

    2012-01-01

    The complex relationship between DNA methylation, chromatin modification, and underlying DNA sequence is often difficult to unravel with existing technologies. Here, we describe a novel technique based on high-throughput sequencing of bisulfite-treated chromatin immunoprecipitated DNA (BisChIP-seq),

  1. Fidelity and mutational spectrum of Pfu DNA polymerase on a human mitochondrial DNA sequence.

    Science.gov (United States)

    André, P; Kim, A; Khrapko, K; Thilly, W G

    1997-08-01

    The study of rare genetic changes in human tissues requires specialized techniques. Point mutations at fractions at or below 10(-6) must be observed to discover even the most prominent features of the point mutational spectrum. PCR permits the increase in number of mutant copies but does so at the expense of creating many additional mutations or "PCR noise". Thus, each DNA sequence studied must be characterized with regard to the DNA polymerase and conditions used to avoid interpreting a PCR-generated mutation as one arising in human tissue. The thermostable DNA polymerase derived from Pyrococcus furiosus designated Pfu has the highest fidelity of any DNA thermostable polymerase studied to date, and this property recommends it for analyses of tissue mutational spectra. Here, we apply constant denaturant capillary electrophoresis (CDCE) to separate and isolate the products of DNA amplification. This new strategy permitted direct enumeration and identification of point mutations created by Pfu DNA polymerase in a 96-bp low melting domain of a human mitochondrial sequence despite the very low mutant fractions generated in the PCR process. This sequence, containing part of the tRNA glycine and NADH dehydrogenase subunit 3 genes, is the target of our studies of mitochondrial mutagenesis in human cells and tissues. Incorrectly synthesized sequences were separated from the wild type as mutant/wild-type heteroduplexes by sequential enrichment on CDCE. An artificially constructed mutant was used as an internal standard to permit calculation of the mutant fraction. Our study found that the average error rate (mutations per base pair duplication) of Pfu was 6.5 x 10(-7), and five of its more frequent mutations (hot spots) consisted of three transversions (GC-->TA, AT-->TA, and AT-->CG), one transition (AT-->GC), and one 1-bp deletion (in an AAAAAA sequence). To achieve an even higher sensitivity, the amount of Pfu-induced mutants must be reduced.

  2. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    Science.gov (United States)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  3. Nucleotide sequencing and analysis of 16S rDNA and 16S-23S rDNA internal spacer region (ISR) of Taylorella equigenitalis, as an important pathogen for contagious equine metritis (CEM).

    Science.gov (United States)

    Kagawa, S; Nagano, Y; Tazumi, A; Murayama, O; Millar, B C; Moore, J E; Matsuda, M

    2006-05-01

    The primer set for 16S rDNA amplified an amplicon of about 1500 bp in length for three strains of Taylorella equigenitalis (NCTC11184(T), Kentucky188 and EQ59). Sequence differences of the 16S rDNA among the six sequences, including three reference sequences, occurred at only a few nucleotide positions and thus, an extremely high sequence similarity of the 16S rDNA was first demonstrated among the six sequences. In addition, the primer set for 16S-23S rDNA internal spacer region (ISR) amplified two amplicons about 1300 bp and 1200 bp in length for the three strains. The ISRs were estimated to be about 920 bp in length for large ISR-A and about 830 bp for small ISR-B. Sequence alignment of the ISR-A and ISR-B demonstrated about 10 base differences between NCTC11184(T) and EQ59 and between Kentucky188 and EQ59. However, only minor sequence differences were demonstrated between the ISR-A and ISR-B from NCTC11184(T) and Kentucky188, respectively. A typical order of the intercistronic tRNAs with the 29 nucleotide spacer of 5'-16S rDNA-tRNA(Ile)-tRNA(Ala)-23S rDNA-3' was demonstrated in the all ISRs. The ISRs may be useful for the discrimination amongst isolates of T. equigenitalis if sequencing is employed.

  4. DNA Targeting Sequence Improves Magnetic Nanoparticle-Based Plasmid DNA Transfection Efficiency in Model Neurons.

    Science.gov (United States)

    Vernon, Matthew M; Dean, David A; Dobson, Jon

    2015-08-17

    Efficient non-viral plasmid DNA transfection of most stem cells, progenitor cells and primary cell lines currently presents an obstacle for many applications within gene therapy research. From a standpoint of efficiency and cell viability, magnetic nanoparticle-based DNA transfection is a promising gene vectoring technique because it has demonstrated rapid and improved transfection outcomes when compared to alternative non-viral methods. Recently, our research group introduced oscillating magnet arrays that resulted in further improvements to this novel plasmid DNA (pDNA) vectoring technology. Continued improvements to nanomagnetic transfection techniques have focused primarily on magnetic nanoparticle (MNP) functionalization and transfection parameter optimization: cell confluence, growth media, serum starvation, magnet oscillation parameters, etc. Noting that none of these parameters can assist in the nuclear translocation of delivered pDNA following MNP-pDNA complex dissociation in the cell's cytoplasm, inclusion of a cassette feature for pDNA nuclear translocation is theoretically justified. In this study incorporation of a DNA targeting sequence (DTS) feature in the transfecting plasmid improved transfection efficiency in model neurons, presumably from increased nuclear translocation. This observation became most apparent when comparing the response of the dividing SH-SY5Y precursor cell to the non-dividing and differentiated SH-SY5Y neuroblastoma cells.

  5. Using Synthetic Nanopores for Single-Molecule Analyses: Detecting SNPs, Trapping DNA Molecules, and the Prospects for Sequencing DNA

    Science.gov (United States)

    Dimitrov, Valentin V.

    2009-01-01

    This work focuses on studying properties of DNA molecules and DNA-protein interactions using synthetic nanopores, and it examines the prospects of sequencing DNA using synthetic nanopores. We have developed a method for discriminating between alleles that uses a synthetic nanopore to measure the binding of a restriction enzyme to DNA. There exists…

  6. Synthesis of DNA

    Science.gov (United States)

    Mariella, Jr., Raymond P.

    2008-11-18

    A method of synthesizing a desired double-stranded DNA of a predetermined length and of a predetermined sequence. Preselected sequence segments that will complete the desired double-stranded DNA are determined. Preselected segment sequences of DNA that will be used to complete the desired double-stranded DNA are provided. The preselected segment sequences of DNA are assembled to produce the desired double-stranded DNA.

  7. Complete mitochondrial DNA sequence and phylogenic analysis of Oxyeleotris lineolatus (Perciformes, Eleotridae).

    Science.gov (United States)

    Zang, Xue; Yin, Danqing; Wang, Ruoran; Yin, Shaowu; Tao, Panfeng; Chen, Jiawei; Zhang, Guosong

    2016-07-01

    In this study, the mitochondrial genome of Oxyeleotris lineolatus was first determined. The length of entire mtDNA sequence was 16,522 bp with (A + T) content of 53.81%, and it contained 13 protein-coding genes, two rRNAs, 22 tRNAs, and a control region. The gene order and the orientation are similar to some typical fish species. The data will provide useful molecular information for phylogenetic studies concerning O. lineolatus and its related species.

  8. cDNA encoding a polypeptide including a hevein sequence

    Energy Technology Data Exchange (ETDEWEB)

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    2000-07-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  9. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing

    Directory of Open Access Journals (Sweden)

    Zdepski Anna

    2011-05-01

    Full Text Available Abstract Background High throughput sequencing (HTS technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. Results We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR. We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Conclusions Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants.

  10. Generating Exome Enriched Sequencing Libraries from Formalin-Fixed, Paraffin-Embedded Tissue DNA for Next Generation Sequencing

    Science.gov (United States)

    Marosy, Beth A.; Craig, Brian D.; Hetrick, Kurt N.; Witmer, P. Dane; Ling, Hua; Griffith, Sean M.; Myers, Ben; Ostrander, Elaine A.; Stanford, Janet L.; Brody, Lawrence C.; Doheny, Kimberly F.

    2016-01-01

    This unit describes a protocol for generating exome enriched sequencing libraries using DNA extracted from Formalin Fixed Paraffin Embedded (FFPE) samples. Utilizing commercially available kits, we present a low input FFPE workflow starting with 50ng of DNA. This procedure includes a repair step to address damage caused by FFPE preservation that improves sequence quality. Subsequently, libraries undergo an in-solution targeted selection for exons, followed by sequencing using the Illumina next generation short read sequencing platform. PMID:28075488

  11. A method for high precision sequencing of near full-length 16S rRNA genes on an Illumina MiSeq

    Directory of Open Access Journals (Sweden)

    Catherine M. Burke

    2016-09-01

    Full Text Available Background The bacterial 16S rRNA gene has historically been used in defining bacterial taxonomy and phylogeny. However, there are currently no high-throughput methods to sequence full-length 16S rRNA genes present in a sample with precision. Results We describe a method for sequencing near full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform and test it using DNA from human skin swab samples. Proof of principle of the approach is demonstrated, with the generation of 1,604 sequences greater than 1,300 nt from a single Nano MiSeq run, with accuracy estimated to be 100-fold higher than standard Illumina reads. The reads were chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection. Conclusions This method could be scaled up to generate many thousands of sequences per MiSeq run and could be applied to other sequencing platforms. This has great potential for populating databases with high quality, near full-length 16S rRNA gene sequences from under-represented taxa and environments and facilitates analyses of microbial communities at higher resolution.

  12. DNA Sequencing as a Tool to Monitor Marine Ecological Status

    Directory of Open Access Journals (Sweden)

    Kelly D. Goodwin

    2017-05-01

    Full Text Available Many ocean policies mandate integrated, ecosystem-based approaches to marine monitoring, driving a global need for efficient, low-cost bioindicators of marine ecological quality. Most traditional methods to assess biological quality rely on specialized expertise to provide visual identification of a limited set of specific taxonomic groups, a time-consuming process that can provide a narrow view of ecological status. In addition, microbial assemblages drive food webs but are not amenable to visual inspection and thus are largely excluded from detailed inventory. Molecular-based assessments of biodiversity and ecosystem function offer advantages over traditional methods and are increasingly being generated for a suite of taxa using a “microbes to mammals” or “barcodes to biomes” approach. Progress in these efforts coupled with continued improvements in high-throughput sequencing and bioinformatics pave the way for sequence data to be employed in formal integrated ecosystem evaluation, including food web assessments, as called for in the European Union Marine Strategy Framework Directive. DNA sequencing of bioindicators, both traditional (e.g., benthic macroinvertebrates, ichthyoplankton and emerging (e.g., microbial assemblages, fish via eDNA, promises to improve assessment of marine biological quality by increasing the breadth, depth, and throughput of information and by reducing costs and reliance on specialized taxonomic expertise.

  13. cgDNA: a software package for the prediction of sequence-dependent coarse-grain free energies of B-form DNA.

    Science.gov (United States)

    Petkevičiūtė, D; Pasi, M; Gonzalez, O; Maddocks, J H

    2014-11-10

    cgDNA is a package for the prediction of sequence-dependent configuration-space free energies for B-form DNA at the coarse-grain level of rigid bases. For a fragment of any given length and sequence, cgDNA calculates the configuration of the associated free energy minimizer, i.e. the relative positions and orientations of each base, along with a stiffness matrix, which together govern differences in free energies. The model predicts non-local (i.e. beyond base-pair step) sequence dependence of the free energy minimizer. Configurations can be input or output in either the Curves+ definition of the usual helical DNA structural variables, or as a PDB file of coordinates of base atoms. We illustrate the cgDNA package by comparing predictions of free energy minimizers from (a) the cgDNA model, (b) time-averaged atomistic molecular dynamics (or MD) simulations, and (c) NMR or X-ray experimental observation, for (i) the Dickerson-Drew dodecamer and (ii) three oligomers containing A-tracts. The cgDNA predictions are rather close to those of the MD simulations, but many orders of magnitude faster to compute. Both the cgDNA and MD predictions are in reasonable agreement with the available experimental data. Our conclusion is that cgDNA can serve as a highly efficient tool for studying structural variations in B-form DNA over a wide range of sequences. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Complete nuclear ribosomal DNA sequence amplification and molecular analyses of Bangia (Bangiales, Rhodophyta) from China

    Science.gov (United States)

    Xu, Jiajie; Jiang, Bo; Chai, Sanming; He, Yuan; Zhu, Jianyi; Shen, Zonggen; Shen, Songdong

    2016-09-01

    Filamentous Bangia, which are distributed extensively throughout the world, have simple and similar morphological characteristics. Scientists can classify these organisms using molecular markers in combination with morphology. We successfully sequenced the complete nuclear ribosomal DNA, approximately 13 kb in length, from a marine Bangia population. We further analyzed the small subunit ribosomal DNA gene (nrSSU) and the internal transcribed spacer (ITS) sequence regions along with nine other marine, and two freshwater Bangia samples from China. Pairwise distances of the nrSSU and 5.8S ribosomal DNA gene sequences show the marine samples grouping together with low divergences (00.003; 0-0.006, respectively) from each other, but high divergences (0.123-0.126; 0.198, respectively) from freshwater samples. An exception is the marine sample collected from Weihai, which shows high divergence from both other marine samples (0.063-0.065; 0.129, respectively) and the freshwater samples (0.097; 0.120, respectively). A maximum likelihood phylogenetic tree based on a combined SSU-ITS dataset with maximum likelihood method shows the samples divided into three clades, with the two marine sample clades containing Bangia spp. from North America, Europe, Asia, and Australia; and one freshwater clade, containing Bangia atropurpurea from North America and China.

  15. A MapReduce Framework for DNA Sequencing Data Processing

    Directory of Open Access Journals (Sweden)

    Samy Ghoneimy

    2016-12-01

    Full Text Available Genomics and Next Generation Sequencers (NGS like Illumina Hiseq produce data in the order of ‎‎200 billion base pairs in a single one-week run for a 60x human genome coverage, which ‎requires modern high-throughput experimental technologies that can ‎only be tackled with high performance computing (HPC and specialized software algorithms called ‎‎“short read aligners”. This paper focuses on the implementation of the DNA sequencing as a set of MapReduce programs that will accept a DNA data set as a FASTQ file and finally generate a VCF (variant call format file, which has variants for a given DNA data set. In this paper MapReduce/Hadoop along with Burrows-Wheeler Aligner (BWA, Sequence Alignment/Map (SAM ‎tools, are fully utilized to provide various utilities for manipulating alignments, including sorting, merging, indexing, ‎and generating alignments. The Map-Sort-Reduce process is designed to be suited for a Hadoop framework in ‎which each cluster is a traditional N-node Hadoop cluster to utilize all of the Hadoop features like HDFS, program ‎management and fault tolerance. The Map step performs multiple instances of the short read alignment algorithm ‎‎(BoWTie that run in parallel in Hadoop. The ordered list of the sequence reads are used as input tuples and the ‎output tuples are the alignments of the short reads. In the Reduce step many parallel instances of the Short ‎Oligonucleotide Analysis Package for SNP (SOAPsnp algorithm run in the cluster. Input tuples are sorted ‎alignments for a partition and the output tuples are SNP calls. Results are stored via HDFS, and then archived in ‎SOAPsnp format. ‎ The proposed framework enables extremely fast discovering somatic mutations, inferring population genetical ‎parameters, and performing association tests directly based on sequencing data without explicit genotyping or ‎linkage-based imputation. It also demonstrate that this method achieves comparable

  16. Exploration of methods to localize DNA sequences missing from c-locus deletions

    International Nuclear Information System (INIS)

    Albritton, L.M.; Russell, L.B.; Montgomery, C.S.

    1987-01-01

    The authors have earlier characterized a large number of radiation-induced mutations at the c locus (on Chromosome 7) through genetic analysis, including extensive complementation tests. Based on this work, they have postulated that many of these mutations are deletions of various lengths, overlapping at c (the marker used in the mutation-rate experiments that generated the mutants). It was possible to apportion these deletions among 13 complementation groups and to fit them to a linear map of 8 functional units. Collectively, the deletions extend from a point between tp and c to one between sh-1 and Hbb, i.e., a genetic distance of from 6 to 10 cM, corresponding to at least 10 4 Kb of DNA. This year, the authors completed a pilot study designed to explore methods for finding DNA sequences that map to the region covered by the various c-deletions. The general plan was to probe DNA with clones derived from Chromosome-7-enriched libraries or with sequences known (or suspected) to reside in Chromosome 7. Three methods were explored for deriving the c-region-deficient DNA: (a) from mouse-hamster somatic-cell hydrids retaining a deleted mouse Chromosome 7, but no homologue; (b) from F 1 hybrids of M. musculus domesticus (carrying a c-locus deletion) by M. spretus; and (c) from F 1 hybrids of M. domesticus stocks carrying complementing deletions

  17. Peptide Synthesis on a Next-Generation DNA Sequencing Platform.

    Science.gov (United States)

    Svensen, Nina; Peersen, Olve B; Jaffrey, Samie R

    2016-09-02

    Methods for displaying large numbers of peptides on solid surfaces are essential for high-throughput characterization of peptide function and binding properties. Here we describe a method for converting the >10(7) flow cell-bound clusters of identical DNA strands generated by the Illumina DNA sequencing technology into clusters of complementary RNA, and subsequently peptide clusters. We modified the flow-cell-bound primers with ribonucleotides thus enabling them to be used by poliovirus polymerase 3D(pol) . The primers hybridize to the clustered DNA thus leading to RNA clusters. The RNAs fold into functional protein- or small molecule-binding aptamers. We used the mRNA-display approach to synthesize flow-cell-tethered peptides from these RNA clusters. The peptides showed selective binding to cognate antibodies. The methods described here provide an approach for using DNA clusters to template peptide synthesis on an Illumina flow cell, thus providing new opportunities for massively parallel peptide-based assays. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. Programmable in vivo selection of arbitrary DNA sequences.

    Directory of Open Access Journals (Sweden)

    Tuval Ben Yehezkel

    Full Text Available The extraordinary fidelity, sensory and regulatory capacity of natural intracellular machinery is generally confined to their endogenous environment. Nevertheless, synthetic bio-molecular components have been engineered to interface with the cellular transcription, splicing and translation machinery in vivo by embedding functional features such as promoters, introns and ribosome binding sites, respectively, into their design. Tapping and directing the power of intracellular molecular processing towards synthetic bio-molecular inputs is potentially a powerful approach, albeit limited by our ability to streamline the interface of synthetic components with the intracellular machinery in vivo. Here we show how a library of synthetic DNA devices, each bearing an input DNA sequence and a logical selection module, can be designed to direct its own probing and processing by interfacing with the bacterial DNA mismatch repair (MMR system in vivo and selecting for the most abundant variant, regardless of its function. The device provides proof of concept for programmable, function-independent DNA selection in vivo and provides a unique example of a logical-functional interface of an engineered synthetic component with a complex endogenous cellular system. Further research into the design, construction and operation of synthetic devices in vivo may lead to other functional devices that interface with other complex cellular processes for both research and applied purposes.

  19. Retroviral DNA Sequences as a Means for Determining Ancient Diets.

    Directory of Open Access Journals (Sweden)

    Jessica I Rivera-Perez

    Full Text Available For ages, specialists from varying fields have studied the diets of the primeval inhabitants of our planet, detecting diet remains in archaeological specimens using a range of morphological and biochemical methods. As of recent, metagenomic ancient DNA studies have allowed for the comparison of the fecal and gut microbiomes associated to archaeological specimens from various regions of the world; however the complex dynamics represented in those microbial communities still remain unclear. Theoretically, similar to eukaryote DNA the presence of genes from key microbes or enzymes, as well as the presence of DNA from viruses specific to key organisms, may suggest the ingestion of specific diet components. In this study we demonstrate that ancient virus DNA obtained from coprolites also provides information reconstructing the host's diet, as inferred from sequences obtained from pre-Columbian coprolites. This depicts a novel and reliable approach to determine new components as well as validate the previously suggested diets of extinct cultures and animals. Furthermore, to our knowledge this represents the first description of the eukaryotic viral diversity found in paleofaeces belonging to pre-Columbian cultures.

  20. Mitochondrial DNA sequencing of cat hair: an informative forensic tool.

    Science.gov (United States)

    Tarditi, Christy R; Grahn, Robert A; Evans, Jeffrey J; Kurushima, Jennifer D; Lyons, Leslie A

    2011-01-01

    Approximately 81.7 million cats are in 37.5 million U.S. households. Shed fur can be criminal evidence because of transfer to victims, suspects, and/or their belongings. To improve cat hairs as forensic evidence, the mtDNA control region from single hairs, with and without root tags, was sequenced. A dataset of a 402-bp control region segment from 174 random-bred cats representing four U.S. geographic areas was generated to determine the informativeness of the mtDNA region. Thirty-two mtDNA mitotypes were observed ranging in frequencies from 0.6-27%. Four common types occurred in all populations. Low heteroplasmy, 1.7%, was determined. Unique mitotypes were found in 18 individuals, 10.3% of the population studied. The calculated discrimination power implied that 8.3 of 10 randomly selected individuals can be excluded by this region. The genetic characteristics of the region and the generated dataset support the use of this cat mtDNA region in forensic applications. 2010 American Academy of Forensic Sciences. Published 2010. This article is a U.S. Government work and is in the public domain in the U.S.A.

  1. Targeted deep DNA methylation analysis of circulating cell-free DNA in plasma using massively parallel semiconductor sequencing.

    Science.gov (United States)

    Vaca-Paniagua, Felipe; Oliver, Javier; Nogueira da Costa, Andre; Merle, Philippe; McKay, James; Herceg, Zdenko; Holmila, Reetta

    2015-01-01

    To set up a targeted methylation analysis using semiconductor sequencing and evaluate the potential for studying methylation in circulating cell-free DNA (cfDNA). Methylation of VIM, FBLN1, LTBP2, HINT2, h19 and IGF2 was analyzed in plasma cfDNA and white blood cell DNA obtained from eight hepatocellular carcinoma patients and eight controls using Ion Torrent™ PGM sequencer. h19 and IGF2 showed consistent methylation levels and methylation was detected for VIM and FBLN1, whereas LTBP2 and HINT2 did not show methylation for target regions. VIM gene promoter methylation was higher in HCC cfDNA than in cfDNA of controls or white blood cell DNA. Semiconductor sequencing is a suitable method for analyzing methylation profiles in cfDNA. Furthermore, differences in cfDNA methylation can be detected between controls and hepatocellular carcinoma cases, even though due to the small sample set these results need further validation.

  2. Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs

    Directory of Open Access Journals (Sweden)

    Sugano Sumio

    2009-07-01

    Full Text Available Abstract Background Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites. Results In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes. Conclusion Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of

  3. A Simulation of DNA Sequencing Utilizing 3M Post-It[R] Notes

    Science.gov (United States)

    Christensen, Doug

    2009-01-01

    An inexpensive and equipment free approach to teaching the technical aspects of DNA sequencing. The activity described requires an instructor with a familiarity of DNA sequencing technology but provides a straight forward method of teaching the technical aspects of sequencing in the absence of expensive sequencing equipment. The final sequence…

  4. Entropy and long-range correlations in DNA sequences.

    Science.gov (United States)

    Melnik, S S; Usatenko, O V

    2014-12-01

    We analyze the structure of DNA molecules of different organisms by using the additive Markov chain approach. Transforming nucleotide sequences into binary strings, we perform statistical analysis of the corresponding "texts". We develop the theory of N-step additive binary stationary ergodic Markov chains and analyze their differential entropy. Supposing that the correlations are weak we express the conditional probability function of the chain by means of the pair correlation function and represent the entropy as a functional of the pair correlator. Since the model uses two point correlators instead of probability of block occurring, it makes possible to calculate the entropy of subsequences at much longer distances than with the use of the standard methods. We utilize the obtained analytical result for numerical evaluation of the entropy of coarse-grained DNA texts. We believe that the entropy study can be used for biological classification of living species. Copyright © 2014. Published by Elsevier Ltd.

  5. The role of nucleotide sequence in the immune-active structure photochemically induced in double-stranded DNA by ultraviolet irradiation

    International Nuclear Information System (INIS)

    Wakizaka, Akira; Okuhara, Eiji

    1982-01-01

    Pyrimidine, purine, and mixed sequence oligonucleotides from ultraviolet-irradiated DNA were tested for their inhibitory activities on the interaction of [ 3 H]ultraviolet-irradiated DNA with its antibody raised in rabbit. Thymine dimer containing pyrimidine oligonucleotides from irradiated DNA failed to inhibit the interaction, while mixed sequence oligonucleotides, especially those with 8 or more nucleotides, exhibited potent inhibition. Purine clusters from irradiated DNA and mixed sequence oligomers from unirradiated DNA showed no inhibition. Dimerized thymine, which appears to be a critical part of the antigenic determinant, did not inhibit the interaction by itself. The same observations were made for ultraviolet-irradiated thymidine and thymidylic acid. The results suggest that a structure composed of a mixed pyrimidine and purine sequence with a certain chain length seems to be essential for the antigenicity induced in the irradiated DNA. On this nucleotide chain backbone, photochemically modified bases (mostly thymine dimer) can form an immune-active structure. (author)

  6. New scoring schema for finding motifs in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Nowzari-Dalini Abbas

    2009-03-01

    Full Text Available Abstract Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple

  7. DNA interaction with platinum-based cytostatics revealed by DNA sequencing.

    Science.gov (United States)

    Smerkova, Kristyna; Vaculovic, Tomas; Vaculovicova, Marketa; Kynicky, Jindrich; Brtnicky, Martin; Eckschlager, Tomas; Stiborova, Marie; Hubalek, Jaromir; Adam, Vojtech

    2017-12-15

    The main mechanism of action of platinum-based cytostatic drugs - cisplatin, oxaliplatin and carboplatin - is the formation of DNA cross-links, which restricts the transcription due to the disability of DNA to enter the active site of the polymerase. The polymerase chain reaction (PCR) was employed as a simplified model of the amplification process in the cell nucleus. PCR with fluorescently labelled dideoxynucleotides commonly employed for DNA sequencing was used to monitor the effect of platinum-based cytostatics on DNA in terms of decrease in labeling efficiency dependent on a presence of the DNA-drug cross-link. It was found that significantly different amounts of the drugs - cisplatin (0.21 μg/mL), oxaliplatin (5.23 μg/mL), and carboplatin (71.11 μg/mL) - were required to cause the same quenching effect (50%) on the fluorescent labelling of 50 μg/mL of DNA. Moreover, it was found that even though the amounts of the drugs was applied to the reaction mixture differing by several orders of magnitude, the amount of incorporated platinum, quantified by inductively coupled plasma mass spectrometry, was in all cases at the level of tenths of μg per 5 μg of DNA. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Restriction fragment length polymorphisms of the DNA of selected Naegleria and Acanthamoeba amebae.

    Science.gov (United States)

    McLaughlin, G L; Brandt, F H; Visvesvara, G S

    1988-09-01

    Fourteen strains of Naegleria fowleri, two strains of N. gruberi, and one strain each of N. australiensis, N. jadini, N. lovaniensis, Acanthamoeba sp., A. castellanii, A. polyphaga, and A. comandoni isolated from patients, soil, or water were characterized by restriction fragment length polymorphisms. Total cellular DNA (1 microgram) was digested with either HindIII, BglII, or EcoRI; separated on agarose gels; and stained with ethidium bromide. From 2 to 15 unusually prominent repetitive restriction fragment bands, totaling 15 to 50 kilobases in length and constituting probably more than 30% of the total DNA, were detected for all ameba strains. Each species displayed a characteristic pattern of repetitive restriction fragments. Digests of the four Acanthamoeba spp. displayed fewer, less intensely staining repetitive fragments than those of the Naegleria spp. All N. fowleri strains, whether isolated from the cerebrospinal fluid of patients from different parts of the world or from hot springs, had repetitive restriction fragment bands of similar total lengths (ca. 45 kilobases), and most repetitive bands displayed identical mobilities. However, polymorphic bands were useful in identifying particular isolates. Restriction fragment length polymorphism analysis generally was consistent with taxonomy based on studies of infectivity, morphology, isoenzyme patterns, and antibody reactivity and suggests that this technique may help classify amebae isolated from clinical specimens or from the environment.

  9. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  10. Microelectrophoresis devices with integrated fluorescence detectors and reactors for high-throughput DNA sequencing

    Science.gov (United States)

    Soper, Steven A.; Ford, Sean M.; Davies, Jack; Williams, Daryl C.; Cheng, Benxu; Klopf, J. Michael; Calderon, Gina M.; Saile, Volker

    1997-05-01

    This work describes the development of micro-devices for high-throughput DNA sequencing applications. Basically, two research efforts will be discussed; (1) fabrication and characterization of micro-reactors to prepare Sanger chain terminated DNA sequencing fragments on a nanoliter scale and; (2) x-ray photolithography of PMMA substrates for the high aspect ratio preparation of electrophoresis devices. The micro-reactor consisted of a 5'-biotinylated catfish olfactory gene, which was amplified by PCR, and attached to the interior wall of an aminoalkylisilane derivatized fused- silica capillary tube via a streptavidin/biotin linkage. Coverage of the interior capillary wall with biotinylated DNA averaged 77 percent. Stability of the anchored template under pressure and electroosmotic rinsing was favorable, requiring approximately 150 h of continuous rinsing to reduce the coverage by only 50 percent. The capillary micro- reactor was placed inside an air thermocycler to control temperature during Sanger ddNTP chain extension and directly coupled to a capillary separation column filled with a LPA solution via low dead volume capillary interlocks. The complimentary DNA fragments generated in the reactor were heat denatured from the immobilized template and directly injected onto a gel-filled capillary using electropumping for size fractionation and detection using NIR-LIF analysis. The total amount of termination fragments in the 31 nL reactor volume was estimated to be 5.2 X 1013 moles and sequencing was shown to produce read lengths on the order to 400 bases. Work will also be described concerning the development of micro-electrophoresis devices in x-ray sensitive photoresists using LIGA techniques. An electrophoresis device with an integrated fluorescence detector was constructed for the high resolution separation of DNA oligonucleotides. The choice of substrate for the electrophoresis was PMMA, due to its intrinsic low electroosmotic flow. Using x-ray lithography in

  11. Automated hybridization/imaging device for fluorescent multiplex DNA sequencing

    Science.gov (United States)

    Weiss, Robert B.; Kimball, Alvin W.; Gesteland, Raymond F.; Ferguson, F. Mark; Dunn, Diane M.; Di Sera, Leonard J.; Cherry, Joshua L.

    1995-01-01

    A method is disclosed for automated multiplex sequencing of DNA with an integrated automated imaging hybridization chamber system. This system comprises an hybridization chamber device for mounting a membrane containing size-fractionated multiplex sequencing reaction products, apparatus for fluid delivery to the chamber device, imaging apparatus for light delivery to the membrane and image recording of fluorescence emanating from the membrane while in the chamber device, and programmable controller apparatus for controlling operation of the system. The multiplex reaction products are hybridized with a probe, then an enzyme (such as alkaline phosphatase) is bound to a binding moiety on the probe, and a fluorogenic substrate (such as a benzothiazole derivative) is introduced into the chamber device by the fluid delivery apparatus. The enzyme converts the fluorogenic substrate into a fluorescent product which, when illuminated in the chamber device with a beam of light from the imaging apparatus, excites fluorescence of the fluorescent product to produce a pattern of hybridization. The pattern of hybridization is imaged by a CCD camera component of the imaging apparatus to obtain a series of digital signals. These signals are converted by the controller apparatus into a string of nucleotides corresponding to the nucleotide sequence an automated sequence reader. The method and apparatus are also applicable to other membrane-based applications such as colony and plaque hybridization and Southern, Northern, and Western blots.

  12. Maternal Plasma DNA and RNA Sequencing for Prenatal Testing.

    Science.gov (United States)

    Tamminga, Saskia; van Maarle, Merel; Henneman, Lidewij; Oudejans, Cees B M; Cornel, Martina C; Sistermans, Erik A

    2016-01-01

    Cell-free DNA (cfDNA) testing has recently become indispensable in diagnostic testing and screening. In the prenatal setting, this type of testing is often called noninvasive prenatal testing (NIPT). With a number of techniques, using either next-generation sequencing or single nucleotide polymorphism-based approaches, fetal cfDNA in maternal plasma can be analyzed to screen for rhesus D genotype, common chromosomal aneuploidies, and increasingly for testing other conditions, including monogenic disorders. With regard to screening for common aneuploidies, challenges arise when implementing NIPT in current prenatal settings. Depending on the method used (targeted or nontargeted), chromosomal anomalies other than trisomy 21, 18, or 13 can be detected, either of fetal or maternal origin, also referred to as unsolicited or incidental findings. For various biological reasons, there is a small chance of having either a false-positive or false-negative NIPT result, or no result, also referred to as a "no-call." Both pre- and posttest counseling for NIPT should include discussing potential discrepancies. Since NIPT remains a screening test, a positive NIPT result should be confirmed by invasive diagnostic testing (either by chorionic villus biopsy or by amniocentesis). As the scope of NIPT is widening, professional guidelines need to discuss the ethics of what to offer and how to offer. In this review, we discuss the current biochemical, clinical, and ethical challenges of cfDNA testing in the prenatal setting and its future perspectives including novel applications that target RNA instead of DNA. © 2016 Elsevier Inc. All rights reserved.

  13. Introducing a model of pairing based on base pair specific interactions between identical DNA sequences

    Science.gov (United States)

    (O’ Lee, Dominic J.

    2018-02-01

    At present, there have been suggested two types of physical mechanism that may facilitate preferential pairing between DNA molecules, with identical or similar base pair texts, without separation of base pairs. One mechanism solely relies on base pair specific patterns of helix distortion being the same on the two molecules, discussed extensively in the past. The other mechanism proposes that there are preferential interactions between base pairs of the same composition. We introduce a model, built on this second mechanism, where both thermal stretching and twisting fluctuations are included, as well as the base pair specific helix distortions. Firstly, we consider an approximation for weak pairing interactions, or short molecules. This yields a dependence of the energy on the square root of the molecular length, which could explain recent experimental data. However, analysis suggests that this approximation is no longer valid at large DNA lengths. In a second approximation, for long molecules, we define two adaptation lengths for twisting and stretching, over which the pairing interaction can limit the accumulation of helix disorder. When the pairing interaction is sufficiently strong, both adaptation lengths are finite; however, as we reduce pairing strength, the stretching adaptation length remains finite but the torsional one becomes infinite. This second state persists to arbitrarily weak values of the pairing strength; suggesting that, if the molecules are long enough, the pairing energy scales as length. To probe differences between the two pairing mechanisms, we also construct a model of similar form. However, now, pairing between identical sequences solely relies on the intrinsic helix distortion patterns. Between the two models, we see interesting qualitative differences. We discuss our findings, and suggest new work to distinguish between the two mechanisms.

  14. Magnetic bead purification of labeled DNA fragments forhigh-throughput capillary electrophoresis sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Elkin, Christopher; Kapur, Hitesh; Smith, Troy; Humphries, David; Pollard, Martin; Hammon, Nancy; Hawkins, Trevor

    2001-09-15

    We have developed an automated purification method for terminator sequencing products based on a magnetic bead technology. This 384-well protocol generates labeled DNA fragments that are essentially free of contaminates for less than $0.005 per reaction. In comparison to laborious ethanol precipitation protocols, this method increases the phred20 read length by forty bases with various DNA templates such as PCR fragments, Plasmids, Cosmids and RCA products. Our method eliminates centrifugation and is compatible with both the MegaBACE 1000 and ABIPrism 3700 capillary instruments. As of September 2001, this method has produced over 1.6 million samples with 93 percent averaging 620 phred20 bases as part of Joint Genome Institutes Production Process.

  15. Genomic integration of the full-length dystrophin coding sequence in Duchenne muscular dystrophy induced pluripotent stem cells.

    Science.gov (United States)

    Farruggio, Alfonso P; Bhakta, Mital S; du Bois, Haley; Ma, Julia; P Calos, Michele

    2017-04-01

    The plasmid vectors that express the full-length human dystrophin coding sequence in human cells was developed. Dystrophin, the protein mutated in Duchenne muscular dystrophy, is extraordinarily large, providing challenges for cloning and plasmid production in Escherichia coli. The authors expressed dystrophin from the strong, widely expressed CAG promoter, along with co-transcribed luciferase and mCherry marker genes useful for tracking plasmid expression. Introns were added at the 3' and 5' ends of the dystrophin sequence to prevent translation in E. coli, resulting in improved plasmid yield. Stability and yield were further improved by employing a lower-copy number plasmid origin of replication. The dystrophin plasmids also carried an attB site recognized by phage phiC31 integrase, enabling the plasmids to be integrated into the human genome at preferred locations by phiC31 integrase. The authors demonstrated single-copy integration of plasmid DNA into the genome and production of human dystrophin in the human 293 cell line, as well as in induced pluripotent stem cells derived from a patient with Duchenne muscular dystrophy. Plasmid-mediated dystrophin expression was also demonstrated in mouse muscle. The dystrophin expression plasmids described here will be useful in cell and gene therapy studies aimed at ameliorating Duchenne muscular dystrophy. Copyright © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Isolation and sequence analysis of the wheat B genome subtelomeric DNA

    Directory of Open Access Journals (Sweden)

    Huneau Cecile

    2009-09-01

    Full Text Available Abstract Background Telomeric and subtelomeric regions are essential for genome stability and regular chromosome replication. In this work, we have characterized the wheat BAC (bacterial artificial chromosome clones containing Spelt1 and Spelt52 sequences, which belong to the subtelomeric repeats of the B/G genomes of wheats and Aegilops species from the section Sitopsis. Results The BAC library from Triticum aestivum cv. Renan was screened using Spelt1 and Spelt52 as probes. Nine positive clones were isolated; of them, clone 2050O8 was localized mainly to the distal parts of wheat chromosomes by in situ hybridization. The distribution of the other clones indicated the presence of different types of repetitive sequences in BACs. Use of different approaches allowed us to prove that seven of the nine isolated clones belonged to the subtelomeric chromosomal regions. Clone 2050O8 was sequenced and its sequence of 119 737 bp was annotated. It is composed of 33% transposable elements (TEs, 8.2% Spelt52 (namely, the subfamily Spelt52.2 and five non-TE-related genes. DNA transposons are predominant, making up 24.6% of the entire BAC clone, whereas retroelements account for 8.4% of the clone length. The full-length CACTA transposon Caspar covers 11 666 bp, encoding a transposase and CTG-2 proteins, and this transposon accounts for 40% of the DNA transposons. The in situ hybridization data for 2050O8 derived subclones in combination with the BLAST search against wheat mapped ESTs (expressed sequence tags suggest that clone 2050O8 is located in the terminal bin 4BL-10 (0.95-1.0. Additionally, four of the predicted 2050O8 genes showed significant homology to four putative orthologous rice genes in the distal part of rice chromosome 3S and confirm the synteny to wheat 4BL. Conclusion Satellite DNA sequences from the subtelomeric regions of diploid wheat progenitor can be used for selecting the BAC clones from the corresponding regions of hexaploid wheat

  17. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    Science.gov (United States)

    Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

    2009-01-01

    Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an

  18. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    Directory of Open Access Journals (Sweden)

    Alamar Santiago

    2009-09-01

    Full Text Available Abstract Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new

  19. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus.

    Science.gov (United States)

    Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

    2009-09-11

    Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. The new EST collection denotes an important step towards the

  20. Recent progress in atomistic simulation of electrical current DNA sequencing.

    Science.gov (United States)

    Kim, Han Seul; Kim, Yong-Hoon

    2015-07-15

    We review recent advances in the DNA sequencing method based on measurements of transverse electrical currents. Device configurations proposed in the literature are classified according to whether the molecular fingerprints appear as the major (Mode I) or perturbing (Mode II) current signals. Scanning tunneling microscope and tunneling electrode gap configurations belong to the former category, while the nanochannels with or without an embedded nanopore belong to the latter. The molecular sensing mechanisms of Modes I and II roughly correspond to the electron tunneling and electrochemical gating, respectively. Special emphasis will be given on the computer simulation studies, which have been playing a critical role in the initiation and development of the field. We also highlight low-dimensional nanomaterials such as carbon nanotubes, graphene, and graphene nanoribbons that allow the novel Mode II approach. Finally, several issues in previous computational studies are discussed, which points to future research directions toward more reliable simulation of electrical current DNA sequencing devices. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Targeted bisulfite sequencing of the dynamic DNA methylome.

    Science.gov (United States)

    Ziller, Michael J; Stamenova, Elena K; Gu, Hongcang; Gnirke, Andreas; Meissner, Alexander

    2016-01-01

    The ability to measure DNA methylation precisely and efficiently continues to drive our understanding of this modification in development and disease. Whole genome bisulfite sequencing has the advantage of theoretically capturing all cytosines in the genome at single-nucleotide resolution, but it has a number of significant practical drawbacks that become amplified with increasing sample numbers. All other technologies capture only a fraction of the cytosines that show dynamic regulation across cell and tissue types. Here, we present a novel hybrid selection design focusing on loci with dynamic methylation that captures a large number of differentially methylated gene-regulatory elements. We benchmarked this assay against matched whole genome data and profiled 25 human tissue samples to explore its ability to detect differentially methylated regions. Our target capture design fills a major gap left by all other assays that exist to map DNA methylation. It maintains the ability to link cytosine methylation to genetic differences, the single-base resolution and the analysis of neighboring cytosines while notably reducing the cost per sample by focusing the sequencing effort on the most informative and relevant regions of the genome.

  2. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  3. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    Science.gov (United States)

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  4. Characterization of cDNA clones encoding rabbit and human serum paraoxonase: The mature protein retains its signal sequence

    Energy Technology Data Exchange (ETDEWEB)

    Hassett, C.; Richter, R.J.; Humbert, R.; Omiecinski, C.J.; Furlong, C.E. (Univ. of Washington, Seattle (United States)); Chapline, C.; Crabb, J.W. (W.Alton Jones Cell Science Center, Lake Placid, NY (United States))

    1991-10-22

    Serum paraoxonase hydrolyzes the toxic metabolites of a variety of organophosphorus insecticides. High serum paraoxonase levels appear to protect against the neurotoxic effects of organophosphorus substrates of this enzyme. The amino acid sequence accounting for 42% of rabbit paraoxonase was determined. From these data, two oligonucleotide probes were synthesized and used to screen a rabbit liver cDNA library. Human paraoxonase clones were isolated from a liver cDNA library by using the rabbit cDNA as a hybridization probe. Inserts from three of the longest clones were sequenced, and one full-length clone contained an open reading frame encoding 355 amino acids, four less than the rabbit paraoxonase protein. Amino-terminal sequences derived from purified rabbit and human paraoxonase proteins suggested that the signal sequence is retained, with the exception of the initiator methionine residue. Characterization of the rabbit and human paraoxonase cDNA clones confirms that the signal sequences are not processed, except for the N-terminal methionine residue. The rabbit and human cDNA clones demonstrate striking nucleotide and deduced amino acid similarities (greater than 85%), suggesting an important metabolic role and constraints on the evolution of this protein.

  5. Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: Effects of sequence length and practice

    Directory of Open Access Journals (Sweden)

    Kevin J Reilly

    2013-10-01

    Full Text Available The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, 5 adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria.

  6. Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: effects of sequence length and practice

    Science.gov (United States)

    Reilly, Kevin J.; Spencer, Kristie A.

    2013-01-01

    The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121

  7. Rapid infectious disease identification by next-generation DNA sequencing.

    Science.gov (United States)

    Ellis, Jeremy E; Missan, Dara S; Shabilla, Matthew; Martinez, Delyn; Fry, Stephen E

    2017-07-01

    Currently, there is a critical need to rapidly identify infectious organisms in clinical samples. Next-Generation Sequencing (NGS) could surmount the deficiencies of culture-based methods; however, there are no standardized, automated programs to process NGS data. To address this deficiency, we developed the Rapid Infectious Disease Identification (RIDI™) system. The system requires minimal guidance, which reduces operator errors. The system is compatible with the three major NGS platforms. It automatically interfaces with the sequencing system, detects their data format, configures the analysis type, applies appropriate quality control, and analyzes the results. Sequence information is characterized using both the NCBI database and RIDI™ specific databases. RIDI™ was designed to identify high probability sequence matches and more divergent matches that could represent different or novel species. We challenged the system using defined American Type Culture Collection (ATCC) reference standards of 27 species, both individually and in varying combinations. The system was able to rapidly detect known organisms in DNA sequence reads at the genus-level and 75.3% at the species-level in reference standards. It has a limit of detection of 146cells/ml in simulated clinical samples, and is also able to identify the components of polymicrobial samples with 16.9% discrepancy at the genus-level and 31.2% at the species-level. Thus, the system's effectiveness may exceed current methods, especially in situations where culture methods could produce false negatives or where rapid results would influence patient outcomes. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties

    Directory of Open Access Journals (Sweden)

    Gaofeng Pan

    2018-02-01

    Full Text Available DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods—especially machine learning methods—have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria—area under the receiver operating characteristic curve (AUC, Matthew’s correlation coefficient (MCC, accuracy (ACC, sensitivity (SN, and specificity—are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.

  9. A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties.

    Science.gov (United States)

    Pan, Gaofeng; Jiang, Limin; Tang, Jijun; Guo, Fei

    2018-02-08

    DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k -gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.

  10. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger.

    Science.gov (United States)

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-10-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×10(6) pfu/ml and 1.62×10(9) pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142 bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers.

  11. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger

    Science.gov (United States)

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-01-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×106 pfu/ml and 1.62×109 pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers. PMID:20941376

  12. Determination of cDNA and genomic DNA sequences of hevamine, a chitinase from the rubber tree Hevea brasiliensis

    NARCIS (Netherlands)

    Bokma, E; Spiering, M; Chow, KS; Mulder, PPMFA; Subroto, T; Beintema, JJ

    Hevamine is a chitinase from the rubber tree Hevea brasiliensis and belongs to the family 18 glycosyl hydrolases. This paper describes the cloning of hevamine DNA and cDNA sequences. Hevamine contains a signal peptide at the N-terminus and a putative vacuolar targeting sequence at the C-terminus

  13. The evolution processes of DNA sequences, languages and carols

    Science.gov (United States)

    Hauck, Jürgen; Henkel, Dorothea; Mika, Klaus

    2001-04-01

    The sequences of bases A, T, C and G of about 100 enolase, secA and cytochrome DNA were analyzed for attractive or repulsive interactions by the numbers T 1,T 2,T 3; r of nearest, next-nearest and third neighbor bases of the same kind and the concentration r=other bases/analyzed base. The area of possible T1, T2 values is limited by the linear borders T 2=2T 1-2, T 2=0 or T1=0 for clustering, attractive or repulsive interactions and the border T2=-2 T1+2(2- r) for a variation from repulsive to attractive interactions at r⩽2. Clustering is preferred by most bases in sequences of enolases and secA’ s. Major deviations with repulsive interactions of some bases are observed for archaea bacteria in secA and for highly developed animals and the human species in enolase sequences. The borders of the structure map for enthalpy stabilized structures with maximum interactions are approached in few cases. Most letters of the natural languages and some music notes are at the borders of the structure map.

  14. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    Science.gov (United States)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  15. FastGroup: A program to dereplicate libraries of 16S rDNA sequences

    Directory of Open Access Journals (Sweden)

    Rohwer Forest

    2001-10-01

    Full Text Available Abstract Background Ribosomal 16S DNA sequences are an essential tool for identifying and classifying microbes. High-throughput DNA sequencing now makes it economically possible to produce very large datasets of 16S rDNA sequences in short time periods, necessitating new computer tools for analyses. Here we describe FastGroup, a Java program designed to dereplicate libraries of 16S rDNA sequences. By dereplication we mean to: 1 compare all the sequences in a data set to each other, 2 group similar sequences together, and 3 output a representative sequence from each group. In this way, duplicate sequences are removed from a library. Results FastGroup was tested using a library of single-pass, bacterial 16S rDNA sequences cloned from coral-associated bacteria. We found that the optimal strategy for dereplicating these sequences was to: 1 trim ambiguous bases from the 5' end of the sequences and all sequence 3' of the conserved Bact517 site, 2 match the sequences from the 3' end, and 3 group sequences >=97% identical to each other. Conclusions The FastGroup program simplifies the dereplication of 16S rDNA sequence libraries and prepares the raw sequences for subsequent analyses.

  16. Structural analysis of DNA sequence: evidence for lateral gene transfer in Thermotoga maritima

    DEFF Research Database (Denmark)

    Worning, Peder; Jensen, Lars Juhl; Nelson, K. E.

    2000-01-01

    The recently published complete DNA sequence of the bacterium Thermotoga maritima provides evidence, based on protein sequence conservation, for lateral gene transfer between Archaea and Bacteria. We introduce a new method of periodicity analysis of DNA sequences, based on structural parameters......, which brings independent evidence for the lateral gene transfer in the genome of T.maritima, The structural analysis relates the Archaea-like DNA sequences to the genome of Pyrococcus horikoshii. Analysis of 24 complete genomic DNA sequences shows different periodicity patterns for organisms...

  17. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  18. DNA fingerprinting of Mycobacterium tuberculosis: from phage typing to whole-genome sequencing.

    Science.gov (United States)

    Schürch, Anita C; van Soolingen, Dick

    2012-06-01

    Current typing methods for Mycobacterium tuberculosis complex evolved from simple phenotypic approaches like phage typing and drug susceptibility profiling to DNA-based strain typing methods, such as IS6110-restriction fragment length polymorphisms (RFLP) and variable number of tandem repeats (VNTR) typing. Examples of the usefulness of molecular typing are source case finding and epidemiological linkage of tuberculosis (TB) cases, international transmission of MDR/XDR-TB, the discrimination between endogenous reactivation and exogenous re-infection as a cause of relapses after curative treatment of tuberculosis, the evidence of multiple M. tuberculosis infections, and the disclosure of laboratory cross-contaminations. Simultaneously, phylogenetic analyses were developed based on single nucleotide polymorphisms (SNPs), genomic deletions usually referred to as regions of difference (RDs) and spoligotyping which served both strain typing and phylogenetic analysis. National and international initiatives that rely on the application of these typing methods have brought significant insight into the molecular epidemiology of tuberculosis. However, current DNA fingerprinting methods have important limitations. They can often not distinguish between genetically closely related strains and the turn-over of these markers is variable. Moreover, the suitability of most DNA typing methods for phylogenetic reconstruction is limited as they show a high propensity of convergent evolution or misinfer genetic distances. In order to fully explore the possibilities of genotyping in the molecular epidemiology of tuberculosis and to study the phylogeny of the causative bacteria reliably, the application of whole-genome sequencing (WGS) analysis for all M. tuberculosis isolates is the optimal, although currently still a costly solution. In the last years WGS for typing of pathogens has been explored and yielded important additional information on strain diversity in comparison to the

  19. Sequencing degraded DNA from non-destructively sampled museum specimens for RAD-tagging and low-coverage shotgun phylogenetics.

    Directory of Open Access Journals (Sweden)

    Mandy Man-Ying Tin

    Full Text Available Ancient and archival DNA samples are valuable resources for the study of diverse historical processes. In particular, museum specimens provide access to biotas distant in time and space, and can provide insights into ecological and evolutionary changes over time. However, archival specimens are difficult to handle; they are often fragile and irreplaceable, and typically contain only short segments of denatured DNA. Here we present a set of tools for processing such samples for state-of-the-art genetic analysis. First, we report a protocol for minimally destructive DNA extraction of insect museum specimens, which produced sequenceable DNA from all of the samples assayed. The 11 specimens analyzed had fragmented DNA, rarely exceeding 100 bp in length, and could not be amplified by conventional PCR targeting the mitochondrial cytochrome oxidase I gene. Our approach made these samples amenable to analysis with commonly used next-generation sequencing-based molecular analytic tools, including RAD-tagging and shotgun genome re-sequencing. First, we used museum ant specimens from three species, each with its own reference genome, for RAD-tag mapping. Were able to use the degraded DNA sequences, which were sequenced in full, to identify duplicate reads and filter them prior to base calling. Second, we re-sequenced six Hawaiian Drosophila species, with millions of years of divergence, but with only a single available reference genome. Despite a shallow coverage of 0.37 ± 0.42 per base, we could recover a sufficient number of overlapping SNPs to fully resolve the species tree, which was consistent with earlier karyotypic studies, and previous molecular studies, at least in the regions of the tree that these studies could resolve. Although developed for use with degraded DNA, all of these techniques are readily applicable to more recent tissue, and are suitable for liquid handling automation.

  20. Sequencing degraded DNA from non-destructively sampled museum specimens for RAD-tagging and low-coverage shotgun phylogenetics.

    Science.gov (United States)

    Tin, Mandy Man-Ying; Economo, Evan Philip; Mikheyev, Alexander Sergeyevich

    2014-01-01

    Ancient and archival DNA samples are valuable resources for the study of diverse historical processes. In particular, museum specimens provide access to biotas distant in time and space, and can provide insights into ecological and evolutionary changes over time. However, archival specimens are difficult to handle; they are often fragile and irreplaceable, and typically contain only short segments of denatured DNA. Here we present a set of tools for processing such samples for state-of-the-art genetic analysis. First, we report a protocol for minimally destructive DNA extraction of insect museum specimens, which produced sequenceable DNA from all of the samples assayed. The 11 specimens analyzed had fragmented DNA, rarely exceeding 100 bp in length, and could not be amplified by conventional PCR targeting the mitochondrial cytochrome oxidase I gene. Our approach made these samples amenable to analysis with commonly used next-generation sequencing-based molecular analytic tools, including RAD-tagging and shotgun genome re-sequencing. First, we used museum ant specimens from three species, each with its own reference genome, for RAD-tag mapping. Were able to use the degraded DNA sequences, which were sequenced in full, to identify duplicate reads and filter them prior to base calling. Second, we re-sequenced six Hawaiian Drosophila species, with millions of years of divergence, but with only a single available reference genome. Despite a shallow coverage of 0.37 ± 0.42 per base, we could recover a sufficient number of overlapping SNPs to fully resolve the species tree, which was consistent with earlier karyotypic studies, and previous molecular studies, at least in the regions of the tree that these studies could resolve. Although developed for use with degraded DNA, all of these techniques are readily applicable to more recent tissue, and are suitable for liquid handling automation.

  1. Partial DNA sequencing of Douglas-fir cDNAs used in RFLP mapping

    Science.gov (United States)

    K.D. Jermstad; D.L. Bassoni; C.S. Kinlaw; D.B. Neale

    1998-01-01

    DNA sequences from 87 Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) cDNA RFLP probes were determined. Sequences were submitted to the GenBank dbEST database and searched for similarity against nucleotide and protein databases using the BLASTn and BLASTx programs. Twenty-one sequences (24%) were assigned putative functions; 18 of which...

  2. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human.

    Science.gov (United States)

    Wu, Chengchao; Yao, Shixin; Li, Xinghao; Chen, Chujia; Hu, Xuehai

    2017-02-16

    DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.

  3. Magnitude and direction of DNA bending induced by screw-axis orientation: influence of sequence, mismatches and abasic sites.

    Science.gov (United States)

    Curuksu, Jeremy; Zakrzewska, Krystyna; Zacharias, Martin

    2008-04-01

    DNA-bending flexibility is central for its many biological functions. A new bending restraining method for use in molecular mechanics calculations and molecular dynamics simulations was developed. It is based on an average screw rotation axis definition for DNA segments and allows inducing continuous and smooth bending deformations of a DNA oligonucleotide. In addition to controlling the magnitude of induced bending it is also possible to control the bending direction so that the calculation of a complete (2-dimensional) directional DNA-bending map is now possible. The method was applied to several DNA oligonucleotides including A(adenine)-tract containing sequences known to form stable bent structures and to DNA containing mismatches or an abasic site. In case of G:A and C:C mismatches a greater variety of conformations bent in various directions compared to regular B-DNA was found. For comparison, a molecular dynamics implementation of the approach was also applied to calculate the free energy change associated with bending of A-tract containing DNA, including deformations significantly beyond the optimal curvature. Good agreement with available experimental data was obtained offering an atomic level explanation for stable bending of A-tract containing DNA molecules. The DNA-bending persistence length estimated from the explicit solvent simulations is also in good agreement with experiment whereas the adiabatic mapping calculations with a GB solvent model predict a bending rigidity roughly two times larger.

  4. Isolation of a full-length mitotic cyclin cDNA clone CycIIIMs from Medicago sativa: chromosomal mapping and expression.

    Science.gov (United States)

    Savouré, A; Fehér, A; Kaló, P; Petrovics, G; Csanádi, G; Szécsi, J; Kiss, G; Brown, S; Kondorosi, A; Kondorosi, E

    1995-03-01

    Cyclins in association with the protein kinase p34cdc2 and related cyclin-dependent protein kinases (cdks) are key regulatory elements in controlling the cell division cycle. Here, we describe the identification and characterization of a full-length cDNA clone of alfalfa mitotic cyclin, termed CycIIIMs. Computer analysis of known plant cyclin gene sequences revealed that this cyclin belongs to the same structural group as the other known partial alfalfa cyclin sequences. Genetic segregation analysis based on DNA-DNA hybridization data showed that the CycIIIMs gene(s) locates in a single chromosomal region on linkage group 5 of the alfalfa genetic map between RFLP markers UO89A and CG13. The assignment of this cyclin to the mitotic cyclin class was based on its cDNA-derived sequence and its differential expression during G2/M cell cycle phase transition of a partially synchronized alfalfa cell culture. Sequence analysis indicated common motifs with both the A- and B-types of mitotic cyclins similarly to the newly described B3-type of animal cyclins.

  5. Exploring possible DNA structures in real-time polymerase kinetics using Pacific Biosciences sequencer data.

    Science.gov (United States)

    Sawaya, Sterling; Boocock, James; Black, Michael A; Gemmell, Neil J

    2015-01-28

    Pausing of DNA polymerase can indicate the presence of a DNA structure that differs from the canonical double-helix. Here we detail a method to investigate how polymerase pausing in the Pacific Biosciences sequencer reads can be related to DNA sequences. The Pacific Biosciences sequencer uses optics to view a polymerase and its interaction with a single DNA molecule in real-time, offering a unique way to detect potential alternative DNA structures. We have developed a new way to examine polymerase kinetics data and relate it to the DNA sequence by using a wavelet transform of read information from the sequencer. We use this method to examine how polymerase kinetics are related to nucleotide base composition. We then examine tandem repeat sequences known for their ability to form different DNA structures: (CGG)n and (CG)n repeats which can, respectively, form G-quadruplex DNA and Z-DNA. We find pausing around the (CGG)n repeat that may indicate the presence of G-quadruplexes in some of the sequencer reads. The (CG)n repeat does not appear to cause polymerase pausing, but its kinetics signature nevertheless suggests the possibility that alternative nucleotide conformations may sometimes be present. We discuss the implications of using our method to discover DNA sequences capable of forming alternative structures. The analyses presented here can be reproduced on any Pacific Biosciences kinetics data for any DNA pattern of interest using an R package that we have made publicly available.

  6. Beyond DNA Sequencing in Space: Current and Future Omics Capabilities of the Biomolecule Sequencer Payload

    Science.gov (United States)

    Wallace, Sarah

    2017-01-01

    Why do we need a DNA sequencer to support the human exploration of space? (A) Operational environmental monitoring; (1) Identification of contaminating microbes, (2) Infectious disease diagnosis, (3) Reduce down mass (sample return for environmental monitoring, crew health, etc.). (B) Research; (1) Human, (2) Animal, (3) Microbes/Cell lines, (4) Plant. (C) Med Ops; (1) Response to countermeasures, (2) Radiation, (3) Real-time analysis can influence medical intervention. (C) Support astrobiology science investigations; (1) Technology superiorly suited to in situ nucleic acid-based life detection, (2) Functional testing for integration into robotics for extraplanetary exploration mission.

  7. Biomolecule Sequencer: Next-Generation DNA Sequencing Technology for In-Flight Environmental Monitoring, Research, and Beyond

    Science.gov (United States)

    Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.

    2016-01-01

    On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human

  8. Evaluation of a transposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogram quantities of DNA.

    Science.gov (United States)

    Marine, Rachel; Polson, Shawn W; Ravel, Jacques; Hatfull, Graham; Russell, Daniel; Sullivan, Matthew; Syed, Fraz; Dumas, Michael; Wommack, K Eric

    2011-11-01

    Construction of DNA fragment libraries for next-generation sequencing can prove challenging, especially for samples with low DNA yield. Protocols devised to circumvent the problems associated with low starting quantities of DNA can result in amplification biases that skew the distribution of genomes in metagenomic data. Moreover, sample throughput can be slow, as current library construction techniques are time-consuming. This study evaluated Nextera, a new transposon-based method that is designed for quick production of DNA fragment libraries from a small quantity of DNA. The sequence read distribution across nine phage genomes in a mock viral assemblage met predictions for six of the least-abundant phages; however, the rank order of the most abundant phages differed slightly from predictions. De novo genome assemblies from Nextera libraries provided long contigs spanning over half of the phage genome; in four cases where full-length genome sequences were available for comparison, consensus sequences were found to match over 99% of the genome with near-perfect identity. Analysis of areas of low and high sequence coverage within phage genomes indicated that GC content may influence coverage of sequences from Nextera libraries. Comparisons of phage genomes prepared using both Nextera and a standard 454 FLX Titanium library preparation protocol suggested that the coverage biases according to GC content observed within the Nextera libraries were largely attributable to bias in the Nextera protocol rather than to the 454 sequencing technology. Nevertheless, given suitable sequence coverage, the Nextera protocol produced high-quality data for genomic studies. For metagenomics analyses, effects of GC amplification bias would need to be considered; however, the library preparation standardization that Nextera provides should benefit comparative metagenomic analyses.

  9. Analysis of plastid DNA-like sequences within the nuclear genomes of higher plants.

    Science.gov (United States)

    Ayliffe, M A; Scott, N S; Timmis, J N

    1998-06-01

    A wide-ranging examination of plastid (pt)DNA sequence homologies within higher plant nuclear genomes (promiscuous DNA) was undertaken. Digestion with methylation-sensitive restriction enzymes and Southern analysis was used to distinguish plastid and nuclear DNA in order to assess the extent of variability of promiscuous sequences within and between plant species. Some species, such as Gossypium hirsutum (cotton), Nicotiana tabacum (tobacco), and Chenopodium quinoa, showed homogenity of these sequences, while intraspecific sequence variation was observed among different cultivars of Pisum sativum (pea), Hordeum vulgare (barley), and Triticum aestivum (wheat). Hypervariability of plastid sequence homologies was identified in the nuclear genomes of Spinacea oleracea (spinach) and Beta vulgaris (beet), in which individual plants were shown to possess a unique spectrum of nuclear sequences with ptDNA homology. This hypervariability apparently extended to somatic variation in B. vulgaris. No sequences with ptDNA homology were identified by this method in the nuclear genome of Arabidopsis thaliana.

  10. Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

    Directory of Open Access Journals (Sweden)

    Tadashi Imanishi

    2004-06-01

    Full Text Available The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/. It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs, identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA

  11. Complete DNA sequence of the linear mitochondrial genome of the pathogenic yeast Candida parapsilosis

    DEFF Research Database (Denmark)

    Nosek, J.; Novotna, M.; Hlavatovicova, Z.

    2004-01-01

    The complete sequence of the mitochondrial DNA of the opportunistic yeast pathogen Candida parapsilosis was determined. The mitochondrial genome is represented by linear DNA molecules terminating with tandem repeats of a 738-bp unit. The number of repeats varies, thus generating a population...... mitochondrial genome of its close relative C. albicans. The complete sequence has implications for both mitochondrial DNA replication and the evolution of linear DNA genomes....

  12. Predicting promoter activities of primary human DNA sequences

    Science.gov (United States)

    Irie, Takuma; Park, Sung-Joon; Yamashita, Riu; Seki, Masahide; Yada, Tetsushi; Sugano, Sumio; Nakai, Kenta; Suzuki, Yutaka

    2011-01-01

    We developed a computer program that can predict the intrinsic promoter activities of primary human DNA sequences. We observed promoter activity using a quantitative luciferase assay and generated a prediction model using multiple linear regression. Our program achieved a prediction accuracy correlation coefficient of 0.87 between the predicted and observed promoter activities. We evaluated the prediction accuracy of the program using massive sequencing analysis of transcriptional start sites in vivo. We found that it is still difficult to predict transcript levels in a strictly quantitative manner in vivo; however, it was possible to select active promoters in a given cell from the other silent promoters. Using this program, we analyzed the transcriptional landscape of the entire human genome. We demonstrate that many human genomic regions have potential promoter activity, and the expression of some previously uncharacterized putatively non-protein-coding transcripts can be explained by our prediction model. Furthermore, we found that nucleosomes occasionally formed open chromatin structures with RNA polymerase II recruitment where the program predicted significant promoter activities, although no transcripts were observed. PMID:21486745

  13. SMART amplification combined with cDNA size fractionation in order to obtain large full-length clones

    Directory of Open Access Journals (Sweden)

    Poustka Annemarie

    2004-06-01

    Full Text Available Abstract Background cDNA libraries are widely used to identify genes and splice variants, and as a physical resource for full-length clones. Conventionally-generated cDNA libraries contain a high percentage of 5'-truncated clones. Current library construction methods that enrich for full-length mRNA are laborious, and involve several enzymatic steps performed on mRNA, which renders them sensitive to RNA degradation. The SMART technique for full-length enrichment is robust but results in limited cDNA insert size of the library. Results We describe a method to construct SMART full-length enriched cDNA libraries with large insert sizes. Sub-libraries were generated from size-fractionated cDNA with an average insert size of up to seven kb. The percentage of full-length clones was calculated for different size ranges from BLAST results of over 12,000 5'ESTs. Conclusions The presented technique is suitable to generate full-length enriched cDNA libraries with large average insert sizes in a straightforward and robust way. The representation of full-coding clones is high also for large cDNAs (70%, 4–10 kb, when high-quality starting mRNA is used.

  14. Dibenzotetraaza[14]annulene-adenine conjugate recognizes complementary poly dT among ss-DNA/ss-RNA sequences.

    Science.gov (United States)

    Radić Stojković, Marijana; Škugor, Marko; Tomić, Sanja; Grabar, Marina; Smrečki, Vilko; Dudek, Łukasz; Grolik, Jarosław; Eilmes, Julita; Piantanida, Ivo

    2013-06-28

    Among three novel DBTAA derivatives only the DBTAA-propyl-adenine conjugate showed recognition of the consecutive oligo dT sequence by increased affinity and specific induced chirooptical response in comparison to other single stranded RNA and DNA; whereby of particular importance is the up until now unique efficient differentiation between dT and rU. At variance, its close analogue DBTAA-hexyl-adenine did not reveal any selectivity between ss-DNA/RNA pointing out the important role of steric factors (linker length); moreover non-selectivity of the reference compound (, lacking adenine) stressed the importance of adenine interactions in the selectivity.

  15. A leaf sequencing algorithm to enlarge treatment field length in IMRT

    International Nuclear Information System (INIS)

    Xia Ping; Hwang, Andrew B.; Verhey, Lynn J.

    2002-01-01

    With MLC-based IMRT, the maximum usable field size is often smaller than the maximum field size for conventional treatments. This is due to the constraints of the overtravel distances of MLC leaves and/or jaws. Using a new leaf sequencing algorithm, the usable IMRT field length (perpendicular to the MLC motion) can be mostly made equal to the full length of the MLC field without violating the upper jaw overtravel limit. For any given intensity pattern, a criterion was proposed to assess whether an intensity pattern can be delivered without violation of the jaw position constraints. If the criterion is met, the new algorithm will consider the jaw position constraints during the segmentation for the step and shoot delivery method. The strategy employed by the algorithm is to connect the intensity elements outside the jaw overtravel limits with those inside the jaw overtravel limits. Several methods were used to establish these connections during segmentation by modifying a previously published algorithm (areal algorithm), including changing the intensity level, alternating the leaf-sequencing direction, or limiting the segment field size. The algorithm was tested with 1000 random intensity patterns with dimensions of 21x27 cm2, 800 intensity patterns with higher intensity outside the jaw overtravel limit, and three different types of clinical treatment plans that were undeliverable using a segmentation method from a commercial treatment planning system. The new algorithm achieved a success rate of 100% with these test patterns. For the 1000 random patterns, the new algorithm yields a similar average number of segments of 36.9±2.9 in comparison to 36.6±1.3 when using the areal algorithm. For the 800 patterns with higher intensities outside the jaw overtravel limits, the new algorithm results in an increase of 25% in the average number of segments compared to the areal algorithm. However, the areal algorithm fails to create deliverable segments for 90% of these

  16. High Interlaboratory Reprocucibility of DNA Sequence-based Typing of Bacteria in a Multicenter Study

    DEFF Research Database (Denmark)

    Sousa, MA de; Boye, Kit; Lencastre, H de

    2006-01-01

    Current DNA amplification-based typing methods for bacterial pathogens often lack interlaboratory reproducibility. In this international study, DNA sequence-based typing of the Staphylococcus aureus protein A gene (spa, 110 to 422 bp) showed 100% intra- and interlaboratory reproducibility without...... extensive harmonization of protocols for 30 blind-coded S. aureus DNA samples sent to 10 laboratories. Specialized software for automated sequence analysis ensured a common typing nomenclature.......Current DNA amplification-based typing methods for bacterial pathogens often lack interlaboratory reproducibility. In this international study, DNA sequence-based typing of the Staphylococcus aureus protein A gene (spa, 110 to 422 bp) showed 100% intra- and interlaboratory reproducibility without...

  17. Generating Exome Enriched Sequencing Libraries from Formalin-Fixed, Paraffin-Embedded Tissue DNA for Next-Generation Sequencing.

    Science.gov (United States)

    Marosy, Beth A; Craig, Brian D; Hetrick, Kurt N; Witmer, P Dane; Ling, Hua; Griffith, Sean M; Myers, Benjamin; Ostrander, Elaine A; Stanford, Janet L; Brody, Lawrence C; Doheny, Kimberly F

    2017-01-11

    This unit describes a technique for generating exome-enriched sequencing libraries using DNA extracted from formalin-fixed paraffin-embedded (FFPE) samples. Utilizing commercially available kits, we present a low-input FFPE workflow starting with 50 ng of DNA. This procedure includes a repair step to address damage caused by FFPE preservation that improves sequence quality. Subsequently, libraries undergo an in-solution-targeted selection for exons, followed by sequencing using the Illumina next-generation short-read sequencing platform. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  18. Comparison of base composition analysis and Sanger sequencing of mitochondrial DNA for four U.S. population groups.

    Science.gov (United States)

    Kiesler, Kevin M; Coble, Michael D; Hall, Thomas A; Vallone, Peter M

    2014-01-01

    A set of 711 samples from four U.S. population groups was analyzed using a novel mass spectrometry based method for mitochondrial DNA (mtDNA) base composition profiling. Comparison of the mass spectrometry results with Sanger sequencing derived data yielded a concordance rate of 99.97%. Length heteroplasmy was identified in 46% of samples and point heteroplasmy was observed in 6.6% of samples in the combined mass spectral and Sanger data set. Using discrimination capacity as a metric, Sanger sequencing of the full control region had the highest discriminatory power, followed by the mass spectrometry base composition method, which was more discriminating than Sanger sequencing of just the hypervariable regions. This trend is in agreement with the number of nucleotides covered by each of the three assays. Published by Elsevier Ireland Ltd.

  19. Transformation of Cowpea Vigna unguiculata with a Full-Length DNA Copy of Cowpea Mosaic Virus M-RNA

    NARCIS (Netherlands)

    Hille, Jacques; Goldbach, Rob

    1987-01-01

    A full-length DNA copy of the M-RNA of cowpea mosaic virus (CPMV), supplied with either the 35S promoter from cauliflower mosaic virus (CaMV) or the nopaline synthase promoter from Agrobacterium tumefaciens, was introduced into the T-DNA region of a Ti-plasmid-derived gene vector and transferred to

  20. Mapped DNA probes from Ioblolly pine can be used for restriction fragment length polymorphism mapping in other conifers

    Science.gov (United States)

    M.R. Ahuja; M.E. Devey; A.T. Groover; K.D. Jermstad; D.B Neale

    1994-01-01

    A high-density genetic map based on restriction fragment length polymorphisms (RFLPs) is being constructed for loblolly pine (Pinus taeda L.). Consequently, a large number of DNA probes from loblolly pine are potentially available for use in other species. We have used some of these DNA probes to detect RFLPs in 12 conifers and an angiosperm....

  1. DNA fingerprinting of Mycobacterium leprae strains using variable number tandem repeat (VNTR) - fragment length analysis (FLA).

    Science.gov (United States)

    Jensen, Ronald W; Rivest, Jason; Li, Wei; Vissa, Varalakshmi

    2011-07-15

    presence of the desired DNA segments, and then submitted for fluorescent fragment length analysis (FLA) using capillary electrophoresis. DNA from armadillo passaged bacteria with a known number of repeat copies for each locus is used as a positive control. The FLA chromatograms are then examined using Peak Scanner software and fragment length is converted to number of VNTR copies (allele). Finally, the VNTR haplotypes are analyzed for patterns, and when combined with patient clinical data can be used to track distribution of strain types.

  2. Saddlebags: A software interface for submitting full-length HLA allele sequences to the EMBL-ENA nucleotide database.

    Science.gov (United States)

    Matern, B M; Groeneweg, M; Voorter, C E M; Tilanus, M G J

    2018-01-01

    Submission of full-length HLA allele sequences presents a unique challenge, both for high-throughput sequencing laboratories and smaller diagnostic laboratories. HLA's extensive polymorphism means that accurate representation and annotation of allele sequence is of critical importance, and curators of nucleotide databases must establish submission formats to ensure high-quality data and prevent ambiguities. The IPD-IMGT/HLA database is established as the standard repository for HLA sequences, and it is a major goal of the 17th International HLA and Immunogenetics Workshop to fill the IPD-IMGT/HLA database with full-length HLA sequences. The process of preparing sequence annotation and metadata is cumbersome and error prone, and it is desirable to create a straightforward and concise method of preparing sequence submissions. We introduce Saddlebags, a software tool for rapid generation of HLA (novel) full-length allele sequence submissions. HLA allele sequences are submitted first to EMBL European Nucleotide Archive (EMBL-ENA), and metadata is gathered for subsequent preparation of an IPD-IMGT/HLA formatted submission. Combining these steps into a pipeline reduces effort and minimizes errors for submitting laboratories. This software has been used by Maastricht University Medical Center Transplantation Immunology Laboratory to submit 79 novel alleles to EMBL-ENA, and the tool is freely available for the HLA community. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  3. [Whole Genome Sequencing of Human mtDNA Based on Ion Torrent PGM™ Platform].

    Science.gov (United States)

    Cao, Y; Zou, K N; Huang, J P; Ma, K; Ping, Y

    2017-08-01

    To analyze and detect the whole genome sequence of human mitochondrial DNA (mtDNA) by Ion Torrent PGM™ platform and to study the differences of mtDNA sequence in different tissues. Samples were collected from 6 unrelated individuals by forensic postmortem examination, including chest blood, hair, costicartilage, nail, skeletal muscle and oral epithelium. Amplification of whole genome sequence of mtDNA was performed by 4 pairs of primer. Libraries were constructed with Ion Shear™ Plus Reagents kit and Ion Plus Fragment Library kit. Whole genome sequencing of mtDNA was performed using Ion Torrent PGM™ platform. Sanger sequencing was used to determine the heteroplasmy positions and the mutation positions on HVⅠ region. The whole genome sequence of mtDNA from all samples were amplified successfully. Six unrelated individuals belonged to 6 different haplotypes. Different tissues in one individual had heteroplasmy difference. The heteroplasmy positions and the mutation positions on HVⅠ region were verified by Sanger sequencing. After a consistency check by the Kappa method, it was found that the results of mtDNA sequence had a high consistency in different tissues. The testing method used in present study for sequencing the whole genome sequence of human mtDNA can detect the heteroplasmy difference in different tissues, which have good consistency. The results provide guidance for the further applications of mtDNA in forensic science. Copyright© by the Editorial Department of Journal of Forensic Medicine

  4. Statistical aspects of discerning indel-type structural variation via DNA sequence alignment

    Directory of Open Access Journals (Sweden)

    Wilson Richard K

    2009-08-01

    Full Text Available Abstract Background Structural variations in the form of DNA insertions and deletions are an important aspect of human genetics and especially relevant to medical disorders. Investigations have shown that such events can be detected via tell-tale discrepancies in the aligned lengths of paired-end DNA sequencing reads. Quantitative aspects underlying this method remain poorly understood, despite its importance and conceptual simplicity. We report the statistical theory characterizing the length-discrepancy scheme for Gaussian libraries, including coverage-related effects that preceding models are unable to account for. Results Deletion and insertion statistics both depend heavily on physical coverage, but otherwise differ dramatically, refuting a commonly held doctrine of symmetry. Specifically, coverage restrictions render insertions much more difficult to capture. Increased read length has the counterintuitive effect of worsening insertion detection characteristics of short inserts. Variance in library insert length is also a critical factor here and should be minimized to the greatest degree possible. Conversely, no significant improvement would be realized in lowering fosmid variances beyond current levels. Detection power is examined under a straightforward alternative hypothesis and found to be generally acceptable. We also consider the proposition of characterizing variation over the entire spectrum of variant sizes under constant risk of false-positive errors. At 1% risk, many designs will leave a significant gap in the 100 to 200 bp neighborhood, requiring unacceptably high redundancies to compensate. We show that a few modifications largely close this gap and we give a few examples of feasible spectrum-covering designs. Conclusion The theory resolves several outstanding issues and furnishes a general methodology for designing future projects from the standpoint of a spectrum-wide constant risk.

  5. True single-molecule DNA sequencing of a pleistocene horse bone

    Science.gov (United States)

    Orlando, Ludovic; Ginolhac, Aurelien; Raghavan, Maanasa; Vilstrup, Julia; Rasmussen, Morten; Magnussen, Kim; Steinmann, Kathleen E.; Kapranov, Philipp; Thompson, John F.; Zazula, Grant; Froese, Duane; Moltke, Ida; Shapiro, Beth; Hofreiter, Michael; Al-Rasheid, Khaled A.S.; Gilbert, M. Thomas P.; Willerslev, Eske

    2011-01-01

    Second-generation sequencing platforms have revolutionized the field of ancient DNA, opening access to complete genomes of past individuals and extinct species. However, these platforms are dependent on library construction and amplification steps that may result in sequences that do not reflect the original DNA template composition. This is particularly true for ancient DNA, where templates have undergone extensive damage post-mortem. Here, we report the results of the first “true single molecule sequencing” of ancient DNA. We generated 115.9 Mb and 76.9 Mb of DNA sequences from a permafrost-preserved Pleistocene horse bone using the Helicos HeliScope and Illumina GAIIx platforms, respectively. We find that the percentage of endogenous DNA sequences derived from the horse is higher among the Helicos data than Illumina data. This result indicates that the molecular biology tools used to generate sequencing libraries of ancient DNA molecules, as required for second-generation sequencing, introduce biases into the data that reduce the efficiency of the sequencing process and limit our ability to fully explore the molecular complexity of ancient DNA extracts. We demonstrate that simple modifications to the standard Helicos DNA template preparation protocol further increase the proportion of horse DNA for this sample by threefold. Comparison of Helicos-specific biases and sequence errors in modern DNA with those in ancient DNA also reveals extensive cytosine deamination damage at the 3′ ends of ancient templates, indicating the presence of 3′-sequence overhangs. Our results suggest that paleogenomes could be sequenced in an unprecedented manner by combining current second- and third-generation sequencing approaches. PMID:21803858

  6. Rapid and affordable genome-wide bisulfite DNA sequencing by XmaI-reduced representation bisulfite sequencing.

    Science.gov (United States)

    Tanas, Alexander S; Borisova, Marina E; Kuznetsova, Ekaterina B; Rudenko, Viktoria V; Karandasheva, Kristina O; Nemtsova, Marina V; Izhevskaya, Vera L; Simonova, Olga A; Larin, Sergey S; Zaletaev, Dmitry V; Strelnikov, Vladimir V

    2017-06-01

    To develop a reduced representation bisulfite sequencing (RRBS) approach for rapid and affordable genome-wide DNA methylation analysis. We have selected restriction endonuclease XmaI to produce RRBS library fragments. After digestion and partial fill-in DNA fragments were ligated to barcoded adapters, bisulfite converted, size-selected, and sequenced on the Ion Torrent Personal Genome Machine. XmaI-RRBS results were compared with the previously published RRBS data. We have developed an XmaI-RRBS method for rapid and affordable genome-wide DNA methylation analysis, with library preparation taking only 4 days and sequencing possible within 4 h. We have also addressed several challenges in order to further improve the RRBS technology. XmaI-RRBS may be performed on degraded DNA samples and is compatible with the bench-top next-generation sequencing machines.

  7. Evaluation of PacBio sequencing for full-length bacterial 16S rRNA gene classification.

    Science.gov (United States)

    Wagner, Josef; Coupland, Paul; Browne, Hilary P; Lawley, Trevor D; Francis, Suzanna C; Parkhill, Julian

    2016-11-14

    Currently, bacterial 16S rRNA gene analyses are based on sequencing of individual variable regions of the 16S rRNA gene (Kozich, et al Appl Environ Microbiol 79:5112-5120, 2013).This short read approach can introduce biases. Thus, full-length bacterial 16S rRNA gene sequencing is needed to reduced biases. A new alternative for full-length bacterial 16S rRNA gene sequencing is offered by PacBio single molecule, real-time (SMRT) technology. The aim of our study was to validate PacBio P6 sequencing chemistry using three approaches: 1) sequencing the full-length bacterial 16S rRNA gene from a single bacterial species Staphylococcus aureus to analyze error modes and to optimize the bioinformatics pipeline; 2) sequencing the full-length bacterial 16S rRNA gene from a pool of 50 different bacterial colonies from human stool samples to compare with full-length bacterial 16S rRNA capillary sequence; and 3) sequencing the full-length bacterial 16S rRNA genes from 11 vaginal microbiome samples and compare with in silico selected bacterial 16S rRNA V1V2 gene region and with bacterial 16S rRNA V1V2 gene regions sequenced using the Illumina MiSeq. Our optimized bioinformatics pipeline for PacBio sequence analysis was able to achieve an error rate of 0.007% on the Staphylococcus aureus full-length 16S rRNA gene. Capillary sequencing of the full-length bacterial 16S rRNA gene from the pool of 50 colonies from stool identified 40 bacterial species of which up to 80% could be identified by PacBio full-length bacterial 16S rRNA gene sequencing. Analysis of the human vaginal microbiome using the bacterial 16S rRNA V1V2 gene region on MiSeq generated 129 operational taxonomic units (OTUs) from which 70 species could be identified. For the PacBio, 36,000 sequences from over 58,000 raw reads could be assigned to a barcode, and the in silico selected bacterial 16S rRNA V1V2 gene region generated 154 OTUs grouped into 63 species, of which 62% were shared with the MiSeq dataset. The Pac

  8. Uncovering Trophic Interactions in Arthropod Predators through DNA Shotgun-Sequencing of Gut Contents.

    Directory of Open Access Journals (Sweden)

    Débora P Paula

    Full Text Available Characterizing trophic networks is fundamental to many questions in ecology, but this typically requires painstaking efforts, especially to identify the diet of small generalist predators. Several attempts have been devoted to develop suitable molecular tools to determine predatory trophic interactions through gut content analysis, and the challenge has been to achieve simultaneously high taxonomic breadth and resolution. General and practical methods are still needed, preferably independent of PCR amplification of barcodes, to recover a broader range of interactions. Here we applied shotgun-sequencing of the DNA from arthropod predator gut contents, extracted from four common coccinellid and dermapteran predators co-occurring in an agroecosystem in Brazil. By matching unassembled reads against six DNA reference databases obtained from public databases and newly assembled mitogenomes, and filtering for high overlap length and identity, we identified prey and other foreign DNA in the predator guts. Good taxonomic breadth and resolution was achieved (93% of prey identified to species or genus, but with low recovery of matching reads. Two to nine trophic interactions were found for these predators, some of which were only inferred by the presence of parasitoids and components of the microbiome known to be associated with aphid prey. Intraguild predation was also found, including among closely related ladybird species. Uncertainty arises from the lack of comprehensive reference databases and reliance on low numbers of matching reads accentuating the risk of false positives. We discuss caveats and some future prospects that could improve the use of direct DNA shotgun-sequencing to characterize arthropod trophic networks.

  9. Effect of intercalator substituent and nucleotide sequence on the stability of DNA- and RNA-naphthalimide complexes.

    Science.gov (United States)

    Johnson, Charles A; Hudson, Graham A; Hardebeck, Laura K E; Jolley, Elizabeth A; Ren, Yi; Lewis, Michael; Znosko, Brent M

    2015-07-01

    DNA intercalators are commonly used as anti-cancer and anti-tumor agents. As a result, it is imperative to understand how changes in intercalator structure affect binding affinity to DNA. Amonafide and mitonafide, two naphthalimide derivatives that are active against HeLa and KB cells in vitro, were previously shown to intercalate into DNA. Here, a systematic study was undertaken to change the 3-substituent on the aromatic intercalator 1,8-naphthalimide to determine how 11 different functional groups with a variety of physical and electronic properties affect binding of the naphthalimide to DNA and RNA duplexes of different sequence compositions and lengths. Wavelength scans, NMR titrations, and circular dichroism were used to investigate the binding mode of 1,8-naphthalimide derivatives to short synthetic DNA. Optical melting experiments were used to measure the change in melting temperature of the DNA and RNA duplexes due to intercalation, which ranged from 0 to 19.4°C. Thermal stabilities were affected by changing the substituent, and several patterns and idiosyncrasies were identified. By systematically varying the 3-substituent, the binding strength of the same derivative to various DNA and RNA duplexes was compared. The binding strength of different derivatives to the same DNA and RNA sequences was also compared. The results of these comparisons shed light on the complexities of site specificity and binding strength in DNA-intercalator complexes. For example, the consequences of adding a 5'-TpG-3' or 5'-GpT-3' step to a duplex is dependent on the sequence composition of the duplex. When added to a poly-AT duplex, naphthalimide binding was enhanced by 5.6-11.5°C, but when added to a poly-GC duplex, naphthalimide binding was diminished by 3.2-6.9°C. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. The mitochondrial DNA sequence specificity of the anti-tumour drug bleomycin using end-labeled DNA and capillary electrophoresis and a comparison with genome-wide DNA sequencing.

    Science.gov (United States)

    Chung, Long H; Murray, Vincent

    2016-01-01

    The DNA sequence specificity of the cancer chemotherapeutic agent, bleomycin, was investigated in two human mitochondrial DNA sequences. Bleomycin was found to cleave preferentially at 5'-TGT*A-3' DNA sequences (where * is the cleavage site). The bleomycin analysis using capillary electrophoresis with laser-induced fluorescence was determined on both DNA strands and each strand was independently fluorescently labelled at the 3'- and 5'-ends. There was a high level of correlation between the intensity of bleomycin cleavage sites analysed by 3'- and 5'-end labelling. This is the first occasion that a comprehensive comparison has been made between these two end-labelling procedures to quantify cleavage by a DNA damaging agent and to investigate end-label bias. A comparison was also made between the bleomycin DNA sequence specificity obtained from genome-wide next-generation sequencing with that obtained from purified plasmid DNA sequences. This was accomplished by cloning sections of human mitochondrial DNA and comparing these identical mitochondrial DNA in the human mitochondrial genome. At individual sites, there was a very low level of correlation between bleomycin cleavage in plasmid sequencing and genome-wide sequencing. However, the overall bleomycin DNA sequence specificity was very similar in the two environments, namely 5'-TGT*A-3'. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Microsatellite DNA in genomic survey sequences and UniGenes of loblolly pine

    Science.gov (United States)

    Craig S Echt; Surya Saha; Dennis L Deemer; C Dana Nelson

    2011-01-01

    Genomic DNA sequence databases are a potential and growing resource for simple sequence repeat (SSR) marker development in loblolly pine (Pinus taeda L.). Loblolly pine also has many expressed sequence tags (ESTs) available for microsatellite (SSR) marker development. We compared loblolly pine SSR densities in genome survey sequences (GSSs) to those in non-redundant...

  12. A 28,000 years old Cro-Magnon mtDNA sequence differs from all potentially contaminating modern sequences.

    Directory of Open Access Journals (Sweden)

    David Caramelli

    Full Text Available BACKGROUND: DNA sequences from ancient specimens may in fact result from undetected contamination of the ancient specimens by modern DNA, and the problem is particularly challenging in studies of human fossils. Doubts on the authenticity of the available sequences have so far hampered genetic comparisons between anatomically archaic (Neandertal and early modern (Cro-Magnoid Europeans. METHODOLOGY/PRINCIPAL FINDINGS: We typed the mitochondrial DNA (mtDNA hypervariable region I in a 28,000 years old Cro-Magnoid individual from the Paglicci cave, in Italy (Paglicci 23 and in all the people who had contact with the sample since its discovery in 2003. The Paglicci 23 sequence, determined through the analysis of 152 clones, is the Cambridge reference sequence, and cannot possibly reflect contamination because it differs from all potentially contaminating modern sequences. CONCLUSIONS/SIGNIFICANCE: The Paglicci 23 individual carried a mtDNA sequence that is still common in Europe, and which radically differs from those of the almost contemporary Neandertals, demonstrating a genealogical continuity across 28,000 years, from Cro-Magnoid to modern Europeans. Because all potential sources of modern DNA contamination are known, the Paglicci 23 sample will offer a unique opportunity to get insight for the first time into the nuclear genes of early modern Europeans.

  13. A DNA sequence obtained by replacement of the dopamine RNA aptamer bases is not an aptamer

    DEFF Research Database (Denmark)

    Álvarez-Martos, Isabel; Ferapontova, Elena

    2017-01-01

    A unique specificity of the aptamer-ligand biorecognition and binding facilitates bioanalysis and biosensor development, contributing to discrimination of structurally related molecules, such as dopamine and other catecholamine neurotransmitters. The aptamer sequence capable of specific binding...... of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained...... by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus...

  14. CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.

    Science.gov (United States)

    Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan

    2017-06-24

    The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn 2 ) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to

  15. Using specific length amplified fragment sequencing to construct the high-density genetic map for Vitis ( Vitis vinifera L. × Vitis amurensis Rupr.

    Directory of Open Access Journals (Sweden)

    yinshan eGuo

    2015-06-01

    Full Text Available In this study, 149 F1 plants from the interspecific cross between ‘Red Globe’ (Vitis vinifera L. and ‘Shuangyou’ (Vitis amurensis Rupr. and the parent were used to construct a molecular genetic linkage map by using the specific length amplified fragment sequencing technique. DNA sequencing generated 41.282 Gb data consisting of 206,411,693 paired-end reads. The average sequencing depths were 68.35 for ‘Red Globe,’ 63.65 for ‘Shuangyou,’ and 8.01 for each progeny. In all, 115,629 high-quality specific length amplified fragments were detected, of which 42,279 were polymorphic. The genetic map was constructed using 7,199 of these polymorphic markers. These polymorphic markers were assigned to 19 linkage groups; the total length of the map was 1929.13 cM, with an average distance of 0.28 cM between each maker. To our knowledge, the genetic maps constructed in this study contain the largest number of molecular markers. These high-density genetic maps might form the basis for the fine quantitative trait loci mapping and molecular-assisted breeding of grape.

  16. Methylation-capture and Next-Generation Sequencing of free circulating DNA from human plasma.

    Science.gov (United States)

    Warton, Kristina; Lin, Vita; Navin, Tina; Armstrong, Nicola J; Kaplan, Warren; Ying, Kevin; Gloss, Brian; Mangs, Helena; Nair, Shalima S; Hacker, Neville F; Sutherland, Robert L; Clark, Susan J; Samimi, Goli

    2014-06-15

    Free circulating DNA (fcDNA) has many potential clinical applications, due to the non-invasive way in which it is collected. However, because of the low concentration of fcDNA in blood, genome-wide analysis carries many technical challenges that must be overcome before fcDNA studies can reach their full potential. There are currently no definitive standards for fcDNA collection, processing and whole-genome sequencing. We report novel detailed methodology for the capture of high-quality methylated fcDNA, library preparation and downstream genome-wide Next-Generation Sequencing. We also describe the effects of sample storage, processing and scaling on fcDNA recovery and quality. Use of serum versus plasma, and storage of blood prior to separation resulted in genomic DNA contamination, likely due to leukocyte lysis. Methylated fcDNA fragments were isolated from 5 donors using a methyl-binding protein-based protocol and appear as a discrete band of ~180 bases. This discrete band allows minimal sample loss at the size restriction step in library preparation for Next-Generation Sequencing, allowing for high-quality sequencing from minimal amounts of fcDNA. Following sequencing, we obtained 37 × 10(6)-86 × 10(6) unique mappable reads, representing more than 50% of total mappable reads. The methylation status of 9 genomic regions as determined by DNA capture and sequencing was independently validated by clonal bisulphite sequencing. Our optimized methods provide high-quality methylated fcDNA suitable for whole-genome sequencing, and allow good library complexity and accurate sequencing, despite using less than half of the recommended minimum input DNA.

  17. Protein and DNA sequence determinants of thermophilic adaptation.

    Directory of Open Access Journals (Sweden)

    Konstantin B Zeldovich

    2007-01-01

    Full Text Available There have been considerable attempts in the past to relate phenotypic trait--habitat temperature of organisms--to their genotypes, most importantly compositions of their genomes and proteomes. However, despite accumulation of anecdotal evidence, an exact and conclusive relationship between the former and the latter has been elusive. We present an exhaustive study of the relationship between amino acid composition of proteomes, nucleotide composition of DNA, and optimal growth temperature (OGT of prokaryotes. Based on 204 complete proteomes of archaea and bacteria spanning the temperature range from -10 degrees C to 110 degrees C, we performed an exhaustive enumeration of all possible sets of amino acids and found a set of amino acids whose total fraction in a proteome is correlated, to a remarkable extent, with the OGT. The universal set is Ile, Val, Tyr, Trp, Arg, Glu, Leu (IVYWREL, and the correlation coefficient is as high as 0.93. We also found that the G + C content in 204 complete genomes does not exhibit a significant correlation with OGT (R = -0.10. On the other hand, the fraction of A + G in coding DNA is correlated with temperature, to a considerable extent, due to codon patterns of IVYWREL amino acids. Further, we found strong and independent correlation between OGT and the frequency with which pairs of A and G nucleotides appear as nearest neighbors in genome sequences. This adaptation is achieved via codon bias. These findings present a direct link between principles of proteins structure and stability and evolutionary mechanisms of thermophylic adaptation. On the nucleotide level, the analysis provides an example of how nature utilizes codon bias for evolutionary adaptation to extreme conditions. Together these results provide a complete picture of how compositions of proteomes and genomes in prokaryotes adjust to the extreme conditions of the environment.

  18. Demonstration of 5-Methylcytosine-Rich DNA Sequences in Chiroptera.

    Science.gov (United States)

    Schmid, Michael; Steinlein, Claus; Lomb, Christian; Volleth, Marianne

    2017-01-01

    5-Methylcytosine-rich heterochromatic regions were demonstrated in metaphase chromosomes of 5 species of Chiroptera by indirect immunofluorescence using a monoclonal anti-5-methylcytosine antibody. These species belong to 4 genera and 2 families and are characterized by divergent karyotypes. One species (Glauconycteris beatrix) has an extremely low diploid chromosome number of 2n = 22 with only meta- to submetacentric elements and remarkably large amounts of constitutive heterochromatin located in the centromeric and pericentromeric regions of all chromosome pairs. Two species (G. beatrix and Neoromicia cf. guineensis) possess X-autosome translocations. In all species, the hypermethylated chromosome segments correspond to constitutive heterochromatin, and the numbers and positions of hypermethylated chromosome segments in the karyotypes are constant and species-specific. In some species (Pipistrellus hesperidus, Neoromicia cf. somalicus), there are several smaller chromosome pairs in which the bright anti-5-methylcytosine antibody labeling is not restricted to constitutively heterochromatic regions but is observed along the whole lengths of these chromosomes. The nature of these additional hypermethylated regions is discussed. The analysis of 5-methylcytosine-rich chromosome regions elucidates valuable data for chiropteran cytogenetics and reflects the high pace of evolution of the repetitive DNA fraction in their genomes. © 2017 S. Karger AG, Basel.

  19. Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

    Directory of Open Access Journals (Sweden)

    Rachel Caldwell

    2015-01-01

    Full Text Available There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length.

  20. Phylogenetic analysis of Gossypium L. using restriction fragment length polymorphism of repeated sequences.

    Science.gov (United States)

    Zhang, Meiping; Rong, Ying; Lee, Mi-Kyung; Zhang, Yang; Stelly, David M; Zhang, Hong-Bin

    2015-10-01

    Cotton is the world's leading textile fiber crop and is also grown as a bioenergy and food crop. Knowledge of the phylogeny of closely related species and the genome origin and evolution of polyploid species is significant for advanced genomics research and breeding. We have reconstructed the phylogeny of the cotton genus, Gossypium L., and deciphered the genome origin and evolution of its five polyploid species by restriction fragment analysis of repeated sequences. Nuclear DNA of 84 accessions representing 35 species and all eight genomes of the genus were analyzed. The phylogenetic tree of the genus was reconstructed using the parsimony method on 1033 polymorphic repeated sequence restriction fragments. The genome origin of its polyploids was determined by calculating the diploid-polyploid restriction fragment correspondence (RFC). The tree is consistent with the morphological classification, genome designation and geographic distribution of the species at subgenus, section and subsection levels. Gossypium lobatum (D7) was unambiguously shown to have the highest RFC with the D-subgenomes of all five polyploids of the genus, while the common ancestor of Gossypium herbaceum (A1) and Gossypium arboreum (A2) likely contributed to the A-subgenomes of the polyploids. These results provide a comprehensive phylogenetic tree of the cotton genus and new insights into the genome origin and evolution of its polyploid species. The results also further demonstrate a simple, rapid and inexpensive method suitable for phylogenetic analysis of closely related species, especially congeneric species, and the inference of genome origin of polyploids that constitute over 70 % of flowering plants.

  1. Complete sequences of the mitochondrial DNA of the wild Gracilariopsis lemaneiformis and two mutagenic cultivated breeds (Gracilariaceae, Rhodophyta.

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    Full Text Available The complete mitochondrial DNA (mtDNA of Gracilariopsis lemaneiformis was sequenced (25883 bp and mapped to a circular model. The A+T composition was 72.5%. Forty six genes and two potentially functional open reading frames were identified. They include 24 protein-coding genes, 2 rRNA genes, 20 tRNA genes and 2 ORFs (orf60, orf142. There is considerable sequence synteny across the five red algal mtDNAs falling into Florideophyceae including Gr. lemaneiformis in this study and previously sequenced species. A long stem-loop and a hairpin structure were identified in intergenic regions of mt genome of Gr. lemaneiformis, which are believed to be involved with transcription and replication. In addition, the mtDNAs of two mutagenic cultivated breeds ("981" and "07-2" were also sequenced. Compared with the mtDNA of wild Gr. lemaneiformis, the genome size and gene length and order of three strains were completely identical except nine base mutations including eight in the protein-coding genes and one in the tRNA gene. None of the base mutations caused frameshift or a premature stop codon in the mtDNA genes. Phylogenetic analyses based on mitochondrial protein-coding genes and rRNA genes demonstrated Gracilariopsis andersonii had closer phylogenetic relationship with its parasite Gracilariophila oryzoides than Gracilariopsis lemaneiformis which was from the same genus of Gracilariopsis.

  2. Complete sequences of the mitochondrial DNA of the wild Gracilariopsis lemaneiformis and two mutagenic cultivated breeds (Gracilariaceae, Rhodophyta).

    Science.gov (United States)

    Zhang, Lei; Wang, Xumin; Qian, Hao; Chi, Shan; Liu, Cui; Liu, Tao

    2012-01-01

    The complete mitochondrial DNA (mtDNA) of Gracilariopsis lemaneiformis was sequenced (25883 bp) and mapped to a circular model. The A+T composition was 72.5%. Forty six genes and two potentially functional open reading frames were identified. They include 24 protein-coding genes, 2 rRNA genes, 20 tRNA genes and 2 ORFs (orf60, orf142). There is considerable sequence synteny across the five red algal mtDNAs falling into Florideophyceae including Gr. lemaneiformis in this study and previously sequenced species. A long stem-loop and a hairpin structure were identified in intergenic regions of mt genome of Gr. lemaneiformis, which are believed to be involved with transcription and replication. In addition, the mtDNAs of two mutagenic cultivated breeds ("981" and "07-2") were also sequenced. Compared with the mtDNA of wild Gr. lemaneiformis, the genome size and gene length and order of three strains were completely identical except nine base mutations including eight in the protein-coding genes and one in the tRNA gene. None of the base mutations caused frameshift or a premature stop codon in the mtDNA genes. Phylogenetic analyses based on mitochondrial protein-coding genes and rRNA genes demonstrated Gracilariopsis andersonii had closer phylogenetic relationship with its parasite Gracilariophila oryzoides than Gracilariopsis lemaneiformis which was from the same genus of Gracilariopsis.

  3. Factors that affect large subunit ribosomal DNA amplicon sequencing studies of fungal communities: classification method, primer choice, and error.

    Directory of Open Access Journals (Sweden)

    Teresita M Porter

    Full Text Available Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1 a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN; 2 a composition-based method (Ribosomal Database Project naïve bayesian classifier, NBC; and, 3 a phylogeny-based method (Statistical Assignment Package, SAP. We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50-100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys.

  4. Analysis of integrated human papillomavirus type 16 DNA in cervical cancers: amplification of viral sequences together with cellular flanking sequences.

    Science.gov (United States)

    Wagatsuma, M; Hashimoto, K; Matsukura, T

    1990-01-01

    We have isolated four clones of integrated human papillomavirus type 16 (HPV-16) DNA from four different primary cervical cancer specimens. All clones were found to be monomeric or dimeric forms of HPV-16 DNA with cellular flanking sequences at both ends. Analysis of the viral sequences in these clones showed that E6/E7 open reading frames and the long control region were conserved and that no region specific for the integration was detected. Analysis of the cellular flanking sequences revealed no significant homology with any known human DNA sequences, except Alu sequences, and no homology among the clones, indicating no cellular sequence specific for the integration. By probing with single-copy cellular flanking sequences from the clones, it was demonstrated that the integrated HPV-16 DNAs, with different sizes in the same specimens, shared the same cellular flanking sequences at the ends. Furthermore, it was shown that the viral sequences together with cellular flanking sequences were amplified. The possible process of viral integration into cell chromosomes in cervical cancer is discussed. Images PMID:2153245

  5. DNA breaks and repair in interstitial telomere sequences: Influence of chromatin structure

    International Nuclear Information System (INIS)

    Revaud, D.

    2009-06-01

    Interstitial Telomeric Sequences (ITS) are over-involved in spontaneous and radiationinduced chromosome aberrations in chinese hamster cells. We have performed a study to investigate the origin of their instability, spontaneously or after low doses irradiation. Our results demonstr