WorldWideScience

Sample records for length dna sequences

  1. DNA interactions with a Methylene Blue redox indicator depend on the DNA length and are sequence specific.

    Science.gov (United States)

    Farjami, Elaheh; Clima, Lilia; Gothelf, Kurt V; Ferapontova, Elena E

    2010-06-01

    A DNA molecular beacon approach was used for the analysis of interactions between DNA and Methylene Blue (MB) as a redox indicator of a hybridization event. DNA hairpin structures of different length and guanine (G) content were immobilized onto gold electrodes in their folded states through the alkanethiol linker at the 5'-end. Binding of MB to the folded hairpin DNA was electrochemically studied and compared with binding to the duplex structure formed by hybridization of the hairpin DNA to a complementary DNA strand. Variation of the electrochemical signal from the DNA-MB complex was shown to depend primarily on the DNA length and sequence used: the G-C base pairs were the preferential sites of MB binding in the duplex. For short 20 nts long DNA sequences, the increased electrochemical response from MB bound to the duplex structure was consistent with the increased amount of bound and electrochemically readable MB molecules (i.e. MB molecules that are available for the electron transfer (ET) reaction with the electrode). With longer DNA sequences, the balance between the amounts of the electrochemically readable MB molecules bound to the hairpin DNA and to the hybrid was opposite: a part of the MB molecules bound to the long-sequence DNA duplex seem to be electrochemically mute due to long ET distance. The increasing electrochemical response from MB bound to the short-length DNA hybrid contrasts with the decreasing signal from MB bound to the long-length DNA hybrid and allows an "off"-"on" genosensor development.

  2. The function analysis of full-length cDNA sequence from IRM-2 mouse cDNA library

    International Nuclear Information System (INIS)

    Wang Qin; Liu Xiaoqiu; Xu Chang; Du Liqing; Sun Zhijuan; Wang Yan; Liu Qiang; Song Li; Li Jin; Fan Feiyue

    2013-01-01

    Objective: To identify the function of full-length cDNA sequence from IRM-2 mouse cDNA library. Methods: Full-length cDNA products were amplified by PCR from IRM-2 mouse cDNA library according to twenty-one pieces of expressed sequence tag. The expression of full-length cDNAs were detected after mouse embryonic fibroblasts were exposed to 6.5 Gy γ-ray radiation. And the effect on the growth of radiosensitivity cells AT5B1VA transfected with full-length cDNAs was investigated. Results: The expression of No.4, 5 and 2 full-length cDNAs from IRM-2 mouse were higher than that of parental ICR and 615 mouse after mouse embryonic fibroblasts irradiated with γ-ray radiation. And the survival rate of AT5B1VA cells transfected with No.4, 5 and 2 full-length cDNAs was high. Conclusion: No.4, 5 and 2 full-length cDNAs of IRM-2 mouse are of high radioresistance. (authors)

  3. Generation and Analysis of Full-length cDNA Sequences from Elephant Shark (Callorhinchus milii)

    KAUST Repository

    Kodzius, Rimantas

    2009-03-17

    Cartilaginous fishes are the oldest living group of jawed vertebrates and therefore is an important group for understanding the evolution of vertebrate genomes including the human genome. Our laboratory has proposed elephant shark (C. milii) as a model cartilaginous fish genome because of its relatively small genome size (910 Mb). The whole genome of C. milii is being sequenced (first cartilaginous fish genome to be sequenced completely). To characterize the transcriptome of C. milii and to assist in annotating exon-intron boundaries, transcriptional start sites and alternatively spliced transcripts, we are generating full-length cDNA sequences from C. milii.

  4. Characterization of full-length sequenced cDNA inserts (FLIcs from Atlantic salmon (Salmo salar

    Directory of Open Access Journals (Sweden)

    Lunner Sigbjørn

    2009-10-01

    Full Text Available Abstract Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP, the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91% of the transcripts were annotated using Gene Ontology (GO terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS. The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS. This

  5. Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)

    Science.gov (United States)

    Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn

    2009-01-01

    Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA

  6. Full-length cDNA sequences from Rhesus monkey placenta tissue: analysis and utility for comparative mapping

    Directory of Open Access Journals (Sweden)

    Lee Sang-Rae

    2010-07-01

    Full Text Available Abstract Background Rhesus monkeys (Macaca mulatta are widely-used as experimental animals in biomedical research and are closely related to other laboratory macaques, such as cynomolgus monkeys (Macaca fascicularis, and to humans, sharing a last common ancestor from about 25 million years ago. Although rhesus monkeys have been studied extensively under field and laboratory conditions, research has been limited by the lack of genetic resources. The present study generated placenta full-length cDNA libraries, characterized the resulting expressed sequence tags, and described their utility for comparative mapping with human RefSeq mRNA transcripts. Results From rhesus monkey placenta full-length cDNA libraries, 2000 full-length cDNA sequences were determined and 1835 rhesus placenta cDNA sequences longer than 100 bp were collected. These sequences were annotated based on homology to human genes. Homology search against human RefSeq mRNAs revealed that our collection included the sequences of 1462 putative rhesus monkey genes. Moreover, we identified 207 genes containing exon alterations in the coding region and the untranslated region of rhesus monkey transcripts, despite the highly conserved structure of the coding regions. Approximately 10% (187 of all full-length cDNA sequences did not represent any public human RefSeq mRNAs. Intriguingly, two rhesus monkey specific exons derived from the transposable elements of AluYRa2 (SINE family and MER11B (LTR family were also identified. Conclusion The 1835 rhesus monkey placenta full-length cDNA sequences described here could expand genomic resources and information of rhesus monkeys. This increased genomic information will greatly contribute to the development of evolutionary biology and biomedical research.

  7. An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

    Science.gov (United States)

    Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

    2011-01-01

    cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.

  8. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    Directory of Open Access Journals (Sweden)

    Wallis James G

    2007-07-01

    Full Text Available Abstract Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12 gene that is responsible for ricinoleate biosynthesis. The role(s of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2 gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at

  9. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    Directory of Open Access Journals (Sweden)

    Bendahmane Abdelhafid

    2011-05-01

    Full Text Available Abstract Background Melon (Cucumis melo, an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs and 3,073 single nucleotide polymorphisms (SNPs in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but

  10. Sequence-length variation of mtDNA HVS-I C-stretch in Chinese ethnic groups.

    Science.gov (United States)

    Chen, Feng; Dang, Yong-hui; Yan, Chun-xia; Liu, Yan-ling; Deng, Ya-jun; Fulton, David J R; Chen, Teng

    2009-10-01

    The purpose of this study was to investigate mitochondrial DNA (mtDNA) hypervariable segment-I (HVS-I) C-stretch variations and explore the significance of these variations in forensic and population genetics studies. The C-stretch sequence variation was studied in 919 unrelated individuals from 8 Chinese ethnic groups using both direct and clone sequencing approaches. Thirty eight C-stretch haplotypes were identified, and some novel and population specific haplotypes were also detected. The C-stretch genetic diversity (GD) values were relatively high, and probability (P) values were low. Additionally, C-stretch length heteroplasmy was observed in approximately 9% of individuals studied. There was a significant correlation (r=-0.961, Ppopulations. The results from the Fst and dA genetic distance matrix, neighbor-joining tree, and principal component map also suggest that C-stretch could be used as a reliable genetic marker in population genetics.

  11. Human uroporphyrinogen III synthase: Molecular cloning, nucleotide sequence, and expression of a full-length cDNA

    International Nuclear Information System (INIS)

    Tsai, Shihfeng; Bishop, D.F.; Desnick, R.J.

    1988-01-01

    Uroporphyrinogen III synthase, the fourth enzyme in the heme biosynthetic pathway, is responsible for conversion of the linear tetrapyrrole, hydroxymethylbilane, to the cyclic tetrapyrrole, uroporphyrinogen III. The deficient activity of URO-synthase is the enzymatic defect in the autosomal recessive disorder congenital erythropoietic porphyria. To facilitate the isolation of a full-length cDNA for human URO-synthase, the human erythrocyte enzyme was purified to homogeneity and 81 nonoverlapping amino acids were determined by microsequencing the N terminus and four tryptic peptides. Two synthetic oligonucleotide mixtures were used to screen 1.2 x 10 6 recombinants from a human adult liver cDNA library. Eight clones were positive with both oligonucleotide mixtures. Of these, dideoxy sequencing of the 1.3 kilobase insert from clone pUROS-2 revealed 5' and 3' untranslated sequences of 196 and 284 base pairs, respectively, and an open reading frame of 798 base pairs encoding a protein of 265 amino acids with a predicted molecular mass of 28,607 Da. The isolation and expression of this full-length cDNA for human URO-synthase should facilitate studies of the structure, organization, and chromosomal localization of this heme biosynthetic gene as well as the characterization of the molecular lesions causing congenital erythropoietic porphyria

  12. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

    Science.gov (United States)

    Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

    2010-05-07

    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.

  13. Full-Length Venom Protein cDNA Sequences from Venom-Derived mRNA: Exploring Compositional Variation and Adaptive Multigene Evolution.

    Science.gov (United States)

    Modahl, Cassandra M; Mackessy, Stephen P

    2016-06-01

    Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides

  14. Increased mRNA expression of a laminin-binding protein in human colon carcinoma: Complete sequence of a full-length cDNA encoding the protein

    International Nuclear Information System (INIS)

    Yow, Hsiukang; Wong, Jau Min; Chen, Hai Shiene; Lee, C.; Steele, G.D. Jr.; Chen, Lanbo

    1988-01-01

    Reliable markers to distinguish human colon carcinoma from normal colonic epithelium are needed particularly for poorly differentiated tumors where no useful marker is currently available. To search for markers the authors constructed cDNA libraries from human colon carcinoma cell lines and screened for clones that hybridize to a greater degree with mRNAs of colon carcinomas than with their normal counterparts. Here they report one such cDNA clone that hybridizes with a 1.2-kilobase (kb) mRNA, the level of which is ∼9-fold greater in colon carcinoma than in adjacent normal colonic epithelium. Blot hybridization of total RNA from a variety of human colon carcinoma cell lines shows that the level of this 1.2-kb mRNA in poorly differentiated colon carcinomas is as high as or higher than that in well-differentiated carcinomas. Molecular cloning and complete sequencing of cDNA corresponding to the full-length open reading frame of this 1.2-kb mRNA unexpectedly show it to contain all the partial cDNA sequence encoding 135 amino acid residues previously reported for a human laminin receptor. The deduced amino acid sequence suggests that this putative laminin-binding protein from human colon carcinomas consists of 295 amino acid residues with interesting features. There is an unusual C-terminal 70-amino acid segment, which is trypsin-resistant and highly negatively charged

  15. Generation and analysis of large-scale expressed sequence tags (ESTs from a full-length enriched cDNA library of porcine backfat tissue

    Directory of Open Access Journals (Sweden)

    Lee Hae-Young

    2006-02-01

    Full Text Available Abstract Background Genome research in farm animals will expand our basic knowledge of the genetic control of complex traits, and the results will be applied in the livestock industry to improve meat quality and productivity, as well as to reduce the incidence of disease. A combination of quantitative trait locus mapping and microarray analysis is a useful approach to reduce the overall effort needed to identify genes associated with quantitative traits of interest. Results We constructed a full-length enriched cDNA library from porcine backfat tissue. The estimated average size of the cDNA inserts was 1.7 kb, and the cDNA fullness ratio was 70%. In total, we deposited 16,110 high-quality sequences in the dbEST division of GenBank (accession numbers: DT319652-DT335761. For all the expressed sequence tags (ESTs, approximately 10.9 Mb of porcine sequence were generated with an average length of 674 bp per EST (range: 200–952 bp. Clustering and assembly of these ESTs resulted in a total of 5,008 unique sequences with 1,776 contigs (35.46% and 3,232 singleton (65.54% ESTs. From a total of 5,008 unique sequences, 3,154 (62.98% were similar to other sequences, and 1,854 (37.02% were identified as having no hit or low identity (Sus scrofa. Gene ontology (GO annotation of unique sequences showed that approximately 31.7, 32.3, and 30.8% were assigned molecular function, biological process, and cellular component GO terms, respectively. A total of 1,854 putative novel transcripts resulted after comparison and filtering with the TIGR SsGI; these included a large percentage of singletons (80.64% and a small proportion of contigs (13.36%. Conclusion The sequence data generated in this study will provide valuable information for studying expression profiles using EST-based microarrays and assist in the condensation of current pig TCs into clusters representing longer stretches of cDNA sequences. The isolation of genes expressed in backfat tissue is the

  16. Assessment of adaptive evolution between wheat and rice as deduced from full-length common wheat cDNA sequence data and expression patterns

    Directory of Open Access Journals (Sweden)

    Hayashizaki Yoshihide

    2009-06-01

    Full Text Available Abstract Background Wheat is an allopolyploid plant that harbors a huge, complex genome. Therefore, accumulation of expressed sequence tags (ESTs for wheat is becoming particularly important for functional genomics and molecular breeding. We prepared a comprehensive collection of ESTs from the various tissues that develop during the wheat life cycle and from tissues subjected to stress. We also examined their expression profiles in silico. As full-length cDNAs are indispensable to certify the collected ESTs and annotate the genes in the wheat genome, we performed a systematic survey and sequencing of the full-length cDNA clones. This sequence information is a valuable genetic resource for functional genomics and will enable carrying out comparative genomics in cereals. Results As part of the functional genomics and development of genomic wheat resources, we have generated a collection of full-length cDNAs from common wheat. By grouping the ESTs of recombinant clones randomly selected from the full-length cDNA library, we were able to sequence 6,162 independent clones with high accuracy. About 10% of the clones were wheat-unique genes, without any counterparts within the DNA database. Wheat clones that showed high homology to those of rice were selected in order to investigate their expression patterns in various tissues throughout the wheat life cycle and in response to abiotic-stress treatments. To assess the variability of genes that have evolved differently in wheat and rice, we calculated the substitution rate (Ka/Ks of the counterparts in wheat and rice. Genes that were preferentially expressed in certain tissues or treatments had higher Ka/Ks values than those in other tissues and treatments, which suggests that the genes with the higher variability expressed in these tissues is under adaptive selection. Conclusion We have generated a high-quality full-length cDNA resource for common wheat, which is essential for continuation of the

  17. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  18. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Science.gov (United States)

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  19. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    Directory of Open Access Journals (Sweden)

    Changqing Liu

    2013-05-01

    Full Text Available In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  20. Salmo salar and Esox lucius full-length cDNA sequences reveal changes in evolutionary pressures on a post-tetraploidization genome

    Directory of Open Access Journals (Sweden)

    Holt Robert A

    2010-04-01

    Full Text Available Abstract Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar, but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate.

  1. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  2. Gene discovery from Jatropha curcas by sequencing of ESTs from normalized and full-length enriched cDNA library from developing seeds

    Directory of Open Access Journals (Sweden)

    Sugantham Priyanka Annabel

    2010-10-01

    Full Text Available Abstract Background Jatropha curcas L. is promoted as an important non-edible biodiesel crop worldwide. Jatropha oil, which is a triacylglycerol, can be directly blended with petro-diesel or transesterified with methanol and used as biodiesel. Genetic improvement in jatropha is needed to increase the seed yield, oil content, drought and pest resistance, and to modify oil composition so that it becomes a technically and economically preferred source for biodiesel production. However, genetic improvement efforts in jatropha could not take advantage of genetic engineering methods due to lack of cloned genes from this species. To overcome this hurdle, the current gene discovery project was initiated with an objective of isolating as many functional genes as possible from J. curcas by large scale sequencing of expressed sequence tags (ESTs. Results A normalized and full-length enriched cDNA library was constructed from developing seeds of J. curcas. The cDNA library contained about 1 × 106 clones and average insert size of the clones was 2.1 kb. Totally 12,084 ESTs were sequenced to average high quality read length of 576 bp. Contig analysis revealed 2258 contigs and 4751 singletons. Contig size ranged from 2-23 and there were 7333 ESTs in the contigs. This resulted in 7009 unigenes which were annotated by BLASTX. It showed 3982 unigenes with significant similarity to known genes and 2836 unigenes with significant similarity to genes of unknown, hypothetical and putative proteins. The remaining 191 unigenes which did not show similarity with any genes in the public database may encode for unique genes. Functional classification revealed unigenes related to broad range of cellular, molecular and biological functions. Among the 7009 unigenes, 6233 unigenes were identified to be potential full-length genes. Conclusions The high quality normalized cDNA library was constructed from developing seeds of J. curcas for the first time and 7009 unigenes coding

  3. Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale

    Directory of Open Access Journals (Sweden)

    Liu He

    2017-10-01

    Full Text Available Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS and single-molecule real-time (SMRT sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET and one sucrose transporter (SUT are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT and four cellulose synthase (Ces genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem.

  4. Genotyping of major histocompatibility complex Class II DRB gene in Rohilkhandi goats by polymerase chain reaction-restriction fragment length polymorphism and DNA sequencing

    Directory of Open Access Journals (Sweden)

    Kush Shrivastava

    2015-10-01

    Full Text Available Aim: To study the major histocompatibility complex (MHC Class II DRB1 gene polymorphism in Rohilkhandi goat using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP and nucleotide sequencing techniques. Materials and Methods: DNA was isolated from 127 Rohilkhandi goats maintained at sheep and goat farm, Indian Veterinary Research Institute, Izatnagar, Bareilly. A 284 bp fragment of exon 2 of DRB1 gene was amplified and digested using BsaI and TaqI restriction enzymes. Population genetic parameters were calculated using Popgene v 1.32 and SAS 9.0. The genotypes were then sequenced using Sanger dideoxy chain termination method and were compared with related breeds/species using MEGA 6.0 and Megalign (DNASTAR software. Results: TaqI locus showed three and BsaI locus showed two genotypes. Both the loci were found to be in Hardy–Weinberg equilibrium (HWE, however, population genetic parameters suggest that heterozygosity is still maintained in the population at both loci. Percent diversity and divergence matrix, as well as phylogenetic analysis revealed that the MHC Class II DRB1 gene of Rohilkhandi goats was found to be in close cluster with Garole and Scottish blackface sheep breeds as compared to other goat breeds included in the sequence comparison. Conclusion: The PCR-RFLP patterns showed population to be in HWE and absence of one genotype at one locus (BsaI, both the loci showed excess of one or the other homozygote genotype, however, effective number of alleles showed that allelic diversity is present in the population. Sequence comparison of DRB1 gene of Rohilkhandi goat with other sheep and goat breed assigned Rohilkhandi goat in divergence with Jamanupari and Angora goats.

  5. Complementary DNA-amplified fragment length polymorphism ...

    African Journals Online (AJOL)

    Complementary DNA-amplified fragment length polymorphism (AFLP-cDNA) analysis of differential gene expression from the xerophyte Ammopiptanthus mongolicus in response to cold, drought and cold together with drought.

  6. Characterization of Mycoplasma hyosynoviae strains by amplified fragment length polymorphism analysis, pulsed-field gel electrophoresis and 16S ribosomal DNA sequencing

    DEFF Research Database (Denmark)

    Kokotovic, Branko; Friis, N.F.; Ahrens, Peter

    2002-01-01

    , were investigated by analysis of amplified fragment length polymorphisms of the Bgl II and Mfe I restriction sites and by pulsed-field gel electrophoresis of a Bss HII digest of chromosomal DNA. Both methods allowed unambiguous differentiation of the analysed strains and showed similar discriminatory...

  7. Construction and evaluation of normalized cDNA libraries enriched with full-length sequences for rapid discovery of new genes from Sisal (Agave sisalana Perr.) different developmental stages.

    Science.gov (United States)

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-10-12

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  8. Fragment Length of Circulating Tumor DNA.

    Science.gov (United States)

    Underhill, Hunter R; Kitzman, Jacob O; Hellwig, Sabine; Welker, Noah C; Daza, Riza; Baker, Daniel N; Gligorich, Keith M; Rostomily, Robert C; Bronner, Mary P; Shendure, Jay

    2016-07-01

    Malignant tumors shed DNA into the circulation. The transient half-life of circulating tumor DNA (ctDNA) may afford the opportunity to diagnose, monitor recurrence, and evaluate response to therapy solely through a non-invasive blood draw. However, detecting ctDNA against the normally occurring background of cell-free DNA derived from healthy cells has proven challenging, particularly in non-metastatic solid tumors. In this study, distinct differences in fragment length size between ctDNAs and normal cell-free DNA are defined. Human ctDNA in rat plasma derived from human glioblastoma multiforme stem-like cells in the rat brain and human hepatocellular carcinoma in the rat flank were found to have a shorter principal fragment length than the background rat cell-free DNA (134-144 bp vs. 167 bp, respectively). Subsequently, a similar shift in the fragment length of ctDNA in humans with melanoma and lung cancer was identified compared to healthy controls. Comparison of fragment lengths from cell-free DNA between a melanoma patient and healthy controls found that the BRAF V600E mutant allele occurred more commonly at a shorter fragment length than the fragment length of the wild-type allele (132-145 bp vs. 165 bp, respectively). Moreover, size-selecting for shorter cell-free DNA fragment lengths substantially increased the EGFR T790M mutant allele frequency in human lung cancer. These findings provide compelling evidence that experimental or bioinformatic isolation of a specific subset of fragment lengths from cell-free DNA may improve detection of ctDNA.

  9. Repeated DNA sequences in fungi

    Energy Technology Data Exchange (ETDEWEB)

    Dutta, S K

    1974-11-01

    Several fungal species, representatives of all broad groups like basidiomycetes, ascomycetes and phycomycetes, were examined for the nature of repeated DNA sequences by DNA:DNA reassociation studies using hydroxyapatite chromatography. All of the fungal species tested contained 10 to 20 percent repeated DNA sequences. There are approximately 100 to 110 copies of repeated DNA sequences of approximately 4 x 10/sup 7/ daltons piece size of each. Repeated DNA sequence homoduplexes showed on average 5/sup 0/C difference of T/sub e/50 (temperature at which 50 percent duplexes dissociate) values from the corresponding homoduplexes of unfractionated whole DNA. It is suggested that a part of repetitive sequences in fungi constitutes mitochondrial DNA and a part of it constitutes nuclear DNA. (auth)

  10. Characterization of Erwinia amylovora strains from different host plants using repetitive-sequences PCR analysis, and restriction fragment length polymorphism and short-sequence DNA repeats of plasmid pEA29.

    Science.gov (United States)

    Barionovi, D; Giorgi, S; Stoeger, A R; Ruppitsch, W; Scortichini, M

    2006-05-01

    The three main aims of the study were the assessment of the genetic relationship between a deviating Erwinia amylovora strain isolated from Amelanchier sp. (Maloideae) grown in Canada and other strains from Maloideae and Rosoideae, the investigation of the variability of the PstI fragment of the pEA29 plasmid using restriction fragment length polymorphism (RFLP) analysis and the determination of the number of short-sequence DNA repeats (SSR) by DNA sequence analysis in representative strains. Ninety-three strains obtained from 12 plant genera and different geographical locations were examined by repetitive-sequences PCR using Enterobacterial Repetitive Intergenic Consensus, BOX and Repetitive Extragenic Palindromic primer sets. Upon the unweighted pair group method with arithmetic mean analysis, a deviating strain from Amelanchier sp. was analysed using amplified ribosomal DNA restriction analysis (ARDRA) analysis and the sequencing of the 16S rDNA gene. This strain showed 99% similarity to other E. amylovora strains in the 16S gene and the same banding pattern with ARDRA. The RFLP analysis of pEA29 plasmid using MspI and Sau3A restriction enzymes showed a higher variability than that previously observed and no clear-cut grouping of the strains was possible. The number of SSR units reiterated two to 12 times. The strains obtained from pear orchards showing for the first time symptoms of fire blight had a low number of SSR units. The strains from Maloideae exhibit a wider genetic variability than previously thought. The RFLP analysis of a fragment of the pEA29 plasmid would not seem a reliable method for typing E. amylovora strains. A low number of SSR units was observed with first epidemics of fire blight. The current detection techniques are mainly based on the genetic similarities observed within the strains from the cultivated tree-fruit crops. For a more reliable detection of the fire blight pathogen also in wild and ornamentals Rosaceous plants the genetic

  11. Analysis of a cDNA clone expressing a human autoimmune antigen: full-length sequence of the U2 small nuclear RNA-associated B antigen

    International Nuclear Information System (INIS)

    Habets, W.J.; Sillekens, P.T.G.; Hoet, M.H.; Schalken, J.A.; Roebroek, A.J.M.; Leunissen, J.A.M.; Van de Ven, W.J.M.; Van Venrooij, W.J.

    1987-01-01

    A U2 small nuclear RNA-associated protein, designated B'', was recently identified as the target antigen for autoimmune sera from certain patients with systemic lupus erythematosus and other rheumatic diseases. Such antibodies enabled them to isolate cDNA clone λHB''-1 from a phage λgt11 expression library. This clone appeared to code for the B'' protein as established by in vitro translation of hybrid-selected mRNA. The identity of clone λHB''-1 was further confirmed by partial peptide mapping and analysis of the reactivity of the recombinant antigen with monospecific and monoclonal antibodies. Analysis of the nucleotide sequence of the 1015-base-pair cDNA insert of clone λHB''-1 revealed a large open reading frame of 800 nucleotides containing the coding sequence for a polypeptide of 25,457 daltons. In vitro transcription of the λHB''-1 cDNA insert and subsequent translation resulted in a protein product with the molecular size of the B'' protein. These data demonstrate that clone λHB''-1 contains the complete coding sequence of this antigen. The deduced polypeptide sequence contains three very hydrophilic regions that might constitute RNA binding sites and/or antigenic determinants. These findings might have implications both for the understanding of the pathogenesis of rheumatic diseases as well as for the elucidation of the biological function of autoimmune antigens

  12. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  13. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  14. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  15. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  16. Technology development for gene discovery and full-length sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  17. Signal sequence and keyword trap in silico for selection of full-length human cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries.

    Science.gov (United States)

    Otsuki, Tetsuji; Ota, Toshio; Nishikawa, Tetsuo; Hayashi, Koji; Suzuki, Yutaka; Yamamoto, Jun-ichi; Wakamatsu, Ai; Kimura, Kouichi; Sakamoto, Katsuhiko; Hatano, Naoto; Kawai, Yuri; Ishii, Shizuko; Saito, Kaoru; Kojima, Shin-ichi; Sugiyama, Tomoyasu; Ono, Tetsuyoshi; Okano, Kazunori; Yoshikawa, Yoko; Aotsuka, Satoshi; Sasaki, Naokazu; Hattori, Atsushi; Okumura, Koji; Nagai, Keiichi; Sugano, Sumio; Isogai, Takao

    2005-01-01

    We have developed an in silico method of selection of human full-length cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries. Fullness rates were increased to about 80% by combination of the oligo-capping method and ATGpr, software for prediction of translation start point and the coding potential. Then, using 5'-end single-pass sequences, cDNAs having the signal sequence were selected by PSORT ('signal sequence trap'). We also applied 'secretion or membrane protein-related keyword trap' based on the result of BLAST search against the SWISS-PROT database for the cDNAs which could not be selected by PSORT. Using the above procedures, 789 cDNAs were primarily selected and subjected to full-length sequencing, and 334 of these cDNAs were finally selected as novel. Most of the cDNAs (295 cDNAs: 88.3%) were predicted to encode secretion or membrane proteins. In particular, 165(80.5%) of the 205 cDNAs selected by PSORT were predicted to have signal sequences, while 70 (54.2%) of the 129 cDNAs selected by 'keyword trap' preserved the secretion or membrane protein-related keywords. Many important cDNAs were obtained, including transporters, receptors, and ligands, involved in significant cellular functions. Thus, an efficient method of selecting secretion or membrane protein-encoding cDNAs was developed by combining the above four procedures.

  18. Duplication in DNA Sequences

    Science.gov (United States)

    Ito, Masami; Kari, Lila; Kincaid, Zachary; Seki, Shinnosuke

    The duplication and repeat-deletion operations are the basis of a formal language theoretic model of errors that can occur during DNA replication. During DNA replication, subsequences of a strand of DNA may be copied several times (resulting in duplications) or skipped (resulting in repeat-deletions). As formal language operations, iterated duplication and repeat-deletion of words and languages have been well studied in the literature. However, little is known about single-step duplications and repeat-deletions. In this paper, we investigate several properties of these operations, including closure properties of language families in the Chomsky hierarchy and equations involving these operations. We also make progress toward a characterization of regular languages that are generated by duplicating a regular language.

  19. Graphene nanodevices for DNA sequencing

    NARCIS (Netherlands)

    Heerema, S.J.; Dekker, C.

    2016-01-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with

  20. Generation and analysis of a large-scale expressed sequence Tag database from a full-length enriched cDNA library of developing leaves of Gossypium hirsutum L.

    Directory of Open Access Journals (Sweden)

    Min Lin

    Full Text Available BACKGROUND: Cotton (Gossypium hirsutum L. is one of the world's most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. METHODOLOGY/PRINCIPAL FINDINGS: In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR, which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. CONCLUSIONS/SIGNIFICANCE: These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence

  1. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  2. Genotypic Characterization of Bradyrhizobium Strains Nodulating Endemic Woody Legumes of the Canary Islands by PCR-Restriction Fragment Length Polymorphism Analysis of Genes Encoding 16S rRNA (16S rDNA) and 16S-23S rDNA Intergenic Spacers, Repetitive Extragenic Palindromic PCR Genomic Fingerprinting, and Partial 16S rDNA Sequencing

    Science.gov (United States)

    Vinuesa, Pablo; Rademaker, Jan L. W.; de Bruijn, Frans J.; Werner, Dietrich

    1998-01-01

    We present a phylogenetic analysis of nine strains of symbiotic nitrogen-fixing bacteria isolated from nodules of tagasaste (Chamaecytisus proliferus) and other endemic woody legumes of the Canary Islands, Spain. These and several reference strains were characterized genotypically at different levels of taxonomic resolution by computer-assisted analysis of 16S ribosomal DNA (rDNA) PCR-restriction fragment length polymorphisms (PCR-RFLPs), 16S-23S rDNA intergenic spacer (IGS) RFLPs, and repetitive extragenic palindromic PCR (rep-PCR) genomic fingerprints with BOX, ERIC, and REP primers. Cluster analysis of 16S rDNA restriction patterns with four tetrameric endonucleases grouped the Canarian isolates with the two reference strains, Bradyrhizobium japonicum USDA 110spc4 and Bradyrhizobium sp. strain (Centrosema) CIAT 3101, resolving three genotypes within these bradyrhizobia. In the analysis of IGS RFLPs with three enzymes, six groups were found, whereas rep-PCR fingerprinting revealed an even greater genotypic diversity, with only two of the Canarian strains having similar fingerprints. Furthermore, we show that IGS RFLPs and even very dissimilar rep-PCR fingerprints can be clustered into phylogenetically sound groupings by combining them with 16S rDNA RFLPs in computer-assisted cluster analysis of electrophoretic patterns. The DNA sequence analysis of a highly variable 264-bp segment of the 16S rRNA genes of these strains was found to be consistent with the fingerprint-based classification. Three different DNA sequences were obtained, one of which was not previously described, and all belonged to the B. japonicum/Rhodopseudomonas rDNA cluster. Nodulation assays revealed that none of the Canarian isolates nodulated Glycine max or Leucaena leucocephala, but all nodulated Acacia pendula, C. proliferus, Macroptilium atropurpureum, and Vigna unguiculata. PMID:9603820

  3. In situ detection of tandem DNA repeat length

    Energy Technology Data Exchange (ETDEWEB)

    Yaar, R.; Szafranski, P.; Cantor, C.R.; Smith, C.L. [Boston Univ., MA (United States)

    1996-11-01

    A simple method for scoring short tandem DNA repeats is presented. An oligonucleotide target, containing tandem repeats embedded in a unique sequence, was hybridized to a set of complementary probes, containing tandem repeats of known lengths. Single-stranded loop structures formed on duplexes containing a mismatched (different) number of tandem repeats. No loop structure formed on duplexes containing a matched (identical) number of tandem repeats. The matched and mismatched loop structures were enzymatically distinguished and differentially labeled by treatment with S1 nuclease and the Klenow fragment of DNA polymerase. 7 refs., 4 figs.

  4. DNA Sequencing by Capillary Electrophoresis

    Science.gov (United States)

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  5. Enhanced throughput for infrared automated DNA sequencing

    Science.gov (United States)

    Middendorf, Lyle R.; Gartside, Bill O.; Humphrey, Pat G.; Roemer, Stephen C.; Sorensen, David R.; Steffens, David L.; Sutter, Scott L.

    1995-04-01

    Several enhancements have been developed and applied to infrared automated DNA sequencing resulting in significantly higher throughput. A 41 cm sequencing gel (31 cm well- to-read distance) combines high resolution of DNA sequencing fragments with optimized run times yielding two runs per day of 500 bases per sample. A 66 cm sequencing gel (56 cm well-to-read distance) produces sequence read lengths of up to 1000 bases for ds and ss templates using either T7 polymerase or cycle-sequencing protocols. Using a multichannel syringe to load 64 lanes allows 16 samples (compatible with 96-well format) to be visualized for each run. The 41 cm gel configuration allows 16,000 bases per day (16 samples X 500 bases/sample X 2 ten hour runs/day) to be sequenced with the advantages of infrared technology. Enhancements to internal labeling techniques using an infrared-labeled dATP molecule (Boehringer Mannheim GmbH, Penzberg, Germany; Sequenase (U.S. Biochemical) have also been made. The inclusion of glycerol in the sequencing reactions yields greatly improved results for some primer and template combinations. The inclusion of (alpha) -Thio-dNTP's in the labeling reaction increases signal intensity two- to three-fold.

  6. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  7. Full-length cDNA sequence cloning and analysis of Ghrelin in Cervus nippon%梅花鹿Ghrelin全长cDNA克隆及其序列分析

    Institute of Scientific and Technical Information of China (English)

    张曼; 金鑫; 田巧珍; 刘骄; 王云鹤; 杨银凤

    2017-01-01

    为获得梅花鹿Ghrelin eDNA全序列,以梅花鹿皱胃黏膜上皮组织提取的总RNA为模板,通过RT-PCR和RACE法克隆了梅花鹿皱胃中Ghrelin基因eDNA的全序列.结果表明梅花鹿Ghrelin eDNA序列全长为539 bp,其中5’非翻译区(5'UTR)为46 bp,3'UTR为128 bp,开放阅读框(ORF)为351 bp,该ORF编码116个氨基酸残基.将梅花鹿Ghrelin基因的eDNA与人和其他动物的Ghrelin相比,发现:梅花鹿Ghrelin与驯鹿、山羊、绵羊和牛的同源性达90.4%~99.1%;与恒河猴、人、猪、犬的同源性达76.6%~66.9%;与鸡和野鸽的同源性分别为36.4%和35.4%.研究表明Ghrelin的结构具有明显的种属特异性,因此Ghrelin在反刍动物体内可能有着重要的生理功能.%In order to obtain the full-length cDNA of Ghrelin in Cervus nippon,RT-PCR and RACE methods were used by using total RNA of abomasus tissue in C.nippon as template.The results of sequence analysis revealed a 539 bp length cDNA containing 46 bp 5'-untranslated region (5'UTR),128 bp 3'-untranslated region (3'UTR) and 351 bp open reading frame (ORF) encoding 116 amino acids.The cDNA sequence alignments of C.nippon Ghrelin gene with human and other animals showed that the cDNA sequence homology of C.nippon Ghrelin was 90.4%-99.1% to reindeer,goat,sheep and cattle,66.9%-76.6% with rhesus monkey,human,pig and dog,only 36.4% with chicken and C.livia.These results indicated that the structure of Ghrelin displayed an obvious varietal specificity,suggesting that Ghrelin might play an important physiological function role in ruminants.

  8. Length quantization of DNA partially expelled from heads of a bacteriophage T3 mutant

    Energy Technology Data Exchange (ETDEWEB)

    Serwer, Philip, E-mail: serwer@uthscsa.edu [Department of Biochemistry, The University of Texas Health Science Center, 7703 Floyd Curl Drive, San Antonio, TX 78229-3900 (United States); Wright, Elena T. [Department of Biochemistry, The University of Texas Health Science Center, 7703 Floyd Curl Drive, San Antonio, TX 78229-3900 (United States); Liu, Zheng; Jiang, Wen [Markey Center for Structural Biology, Department of Biological Sciences, Purdue University, West Lafayette, IN 47907 (United States)

    2014-05-15

    DNA packaging of phages phi29, T3 and T7 sometimes produces incompletely packaged DNA with quantized lengths, based on gel electrophoretic band formation. We discover here a packaging ATPase-free, in vitro model for packaged DNA length quantization. We use directed evolution to isolate a five-site T3 point mutant that hyper-produces tail-free capsids with mature DNA (heads). Three tail gene mutations, but no head gene mutations, are present. A variable-length DNA segment leaks from some mutant heads, based on DNase I-protection assay and electron microscopy. The protected DNA segment has quantized lengths, based on restriction endonuclease analysis: six sharp bands of DNA missing 3.7–12.3% of the last end packaged. Native gel electrophoresis confirms quantized DNA expulsion and, after removal of external DNA, provides evidence that capsid radius is the quantization-ruler. Capsid-based DNA length quantization possibly evolved via selection for stalling that provides time for feedback control during DNA packaging and injection. - Graphical abstract: Highlights: • We implement directed evolution- and DNA-sequencing-based phage assembly genetics. • We purify stable, mutant phage heads with a partially leaked mature DNA molecule. • Native gels and DNase-protection show leaked DNA segments to have quantized lengths. • Native gels after DNase I-removal of leaked DNA reveal the capsids to vary in radius. • Thus, we hypothesize leaked DNA quantization via variably quantized capsid radius.

  9. Fast and secure retrieval of DNA sequences

    NARCIS (Netherlands)

    2014-01-01

    Sequence models are retrieved from a sequences index. The sequence models model DNA or RNA sequences stored in a database, and each comprises a finite memory tree source model and parameters for the finite memory tree source model. One or more DNA or RNA sequences stored in the database are

  10. Order and correlations in genomic DNA sequences. The spectral approach

    International Nuclear Information System (INIS)

    Lobzin, Vasilii V; Chechetkin, Vladimir R

    2000-01-01

    The structural analysis of genomic DNA sequences is discussed in the framework of the spectral approach, which is sufficiently universal due to the reciprocal correspondence and mutual complementarity of Fourier transform length scales. The spectral characteristics of random sequences of the same nucleotide composition possess the property of self-averaging for relatively short sequences of length M≥100-300. Comparison with the characteristics of random sequences determines the statistical significance of the structural features observed. Apart from traditional applications to the search for hidden periodicities, spectral methods are also efficient in studying mutual correlations in DNA sequences. By combining spectra for structure factors and correlation functions, not only integral correlations can be estimated but also their origin identified. Using the structural spectral entropy approach, the regularity of a sequence can be quantitatively assessed. A brief introduction to the problem is also presented and other major methods of DNA sequence analysis described. (reviews of topical problems)

  11. Entropic fluctuations in DNA sequences

    Science.gov (United States)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  12. Toward a Better Compression for DNA Sequences Using Huffman Encoding.

    Science.gov (United States)

    Al-Okaily, Anas; Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

    2017-04-01

    Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).

  13. DNA Replication Profiling Using Deep Sequencing.

    Science.gov (United States)

    Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W

    2018-01-01

    Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.

  14. Blind sequence-length estimation of low-SNR cyclostationary sequences

    CSIR Research Space (South Africa)

    Vlok, JD

    2014-06-01

    Full Text Available Several existing direct-sequence spread spectrum (DSSS) detection and estimation algorithms assume prior knowledge of the symbol period or sequence length, although very few sequence-length estimation techniques are available in the literature...

  15. Photoluminescence Enhancement of Poly(3-methylthiophene Nanowires upon Length Variable DNA Hybridization

    Directory of Open Access Journals (Sweden)

    Jingyuan Huang

    2018-01-01

    Full Text Available The use of low-dimensional inorganic or organic nanomaterials has advantages for DNA and protein recognition due to their sensitivity, accuracy, and physical size matching. In this research, poly(3-methylthiophene (P3MT nanowires (NWs are electrochemically prepared with dopant followed by functionalization with probe DNA (pDNA sequence through electrostatic interaction. Various lengths of pDNA sequences (10-, 20- and 30-mer are conjugated to the P3MT NWs respectively followed with hybridization with their complementary target DNA (tDNA sequences. The nanoscale photoluminescence (PL properties of the P3MT NWs are studied throughout the whole process at solid state. In addition, the correlation between the PL enhancement and the double helix DNA with various lengths is demonstrated.

  16. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Directory of Open Access Journals (Sweden)

    Kirkness Ewen

    2006-10-01

    Full Text Available Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromosome sequence and the search for polymorphisms therein. The genome has been only partially sequenced for one male dog, disallowing mapping of the sequence into specific chromosomes. However, by comparing the male genome sequence to the complete female dog genome sequence, candidate Y-chromosome sequence may be identified by exclusion. Results The male dog genome sequence was analysed by Blast search against the human genome to identify sequences with a best match to the human Y chromosome and to the female dog genome to identify those absent in the female genome. Candidate sequences were then tested for male specificity by PCR of five male and five female dogs. 32 sequences from the male genome, with a total length of 24 kbp, were identified as male specific, based on a match to the human Y chromosome, absence in the female dog genome and male specific PCR results. 14437 bp were then sequenced for 10 male dogs originating from Europe, Southwest Asia, Siberia, East Asia, Africa and America. Nine haplotypes were found, which were defined by 14 substitutions. The genetic distance between the haplotypes indicates that they originate from at least five wolf haplotypes. There was no obvious trend in the geographic distribution of the haplotypes. Conclusion We have identified 24159 bp of dog Y-chromosome sequence to be used for population genetic studies. We sequenced 14437 bp in a worldwide collection of dogs, identifying 14 SNPs for future SNP analyses, and

  17. DNA sequence modeling based on context trees

    NARCIS (Netherlands)

    Kusters, C.J.; Ignatenko, T.; Roland, J.; Horlin, F.

    2015-01-01

    Genomic sequences contain instructions for protein and cell production. Therefore understanding and identification of biologically and functionally meaningful patterns in DNA sequences is of paramount importance. Modeling of DNA sequences in its turn can help to better understand and identify such

  18. One-dimensional TRFLP-SSCP is an effective DNA fingerprinting strategy for soil Archaea that is able to simultaneously differentiate broad taxonomic clades based on terminal fragment length polymorphisms and closely related sequences based on single stranded conformation polymorphisms.

    Science.gov (United States)

    Swanson, Colby A; Sliwinski, Marek K

    2013-09-01

    DNA fingerprinting methods provide a means to rapidly compare microbial assemblages from environmental samples without the need to first cultivate species in the laboratory. The profiles generated by these techniques are able to identify statistically significant temporal and spatial patterns, correlations to environmental gradients, and biological variability to estimate the number of replicates for clone libraries or next generation sequencing (NGS) surveys. Here we describe an improved DNA fingerprinting technique that combines terminal restriction fragment length polymorphisms (TRFLP) and single stranded conformation polymorphisms (SSCP) so that both can be used to profile a sample simultaneously rather than requiring two sequential steps as in traditional two-dimensional (2-D) gel electrophoresis. For the purpose of profiling Archaeal 16S rRNA genes from soil, the dynamic range of this combined 1-D TRFLP-SSCP approach was superior to TRFLP and SSCP. 1-D TRFLP-SSCP was able to distinguish broad taxonomic clades with genetic distances greater than 10%, such as Euryarchaeota and the Thaumarchaeal clades g_Ca. Nitrososphaera (formerly 1.1b) and o_NRP-J (formerly 1.1c) better than SSCP. In addition, 1-D TRFLP-SSCP was able to simultaneously distinguish closely related clades within a genus such as s_SCA1145 and s_SCA1170 better than TRFLP. We also tested the utility of 1-D TRFLP-SSCP fingerprinting of environmental assemblages by comparing this method to the generation of a 16S rRNA clone library of soil Archaea from a restored Tallgrass prairie. This study shows 1-D TRFLP-SSCP fingerprinting provides a rapid and phylogenetically informative screen of Archaeal 16S rRNA genes in soil samples. © 2013.

  19. "First generation" automated DNA sequencing technology.

    Science.gov (United States)

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  20. Image correlation method for DNA sequence alignment.

    Science.gov (United States)

    Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván

    2012-01-01

    The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.

  1. DNA Length Modulates the Affinity of Fragments of Genomic DNA for the Nuclear Matrix In Vitro.

    Science.gov (United States)

    García-Vilchis, David; Aranda-Anzaldo, Armando

    2017-12-01

    Classical observations have shown that during the interphase the chromosomal DNA of metazoans is organized in supercoiled loops attached to a compartment known as the nuclear matrix (NM). Fragments of chromosomal DNA able to bind the isolated NM in vitro are known as matrix associated/attachment/addressed regions or MARs. No specific consensus sequence or motif has been found that may constitute a universal, defining feature of MARs. On the other hand, high-salt resistant DNA-NM interactions in situ define true DNA loop anchorage regions or LARs, that might correspond to a subset of the potential MARs but are not necessarily identical to MARs characterized in vitro, since there are several examples of MARs able to bind the NM in vitro but which are not actually bound to the NM in situ. In the present work we assayed the capacity of two LARs, as well as of shorter fragments within such LARs, for binding to the NM in vitro. Paradoxically the isolated (≈2 kb) LARs cannot bind to the NM in vitro while their shorter (≈300 pb) sub-fragments and other non-related but equally short DNA fragments, bind to the NM in a high-salt resistant fashion. Our results suggest that the ability of a given DNA fragment for binding to the NM in vitro primarily depends on the length of the fragment, suggesting that binding to the NM is modulated by the local topology of the DNA fragment in suspension that it is known to depend on the DNA length. J. Cell. Biochem. 118: 4487-4497, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  2. Nucleotide sequence preservation of human mitochondrial DNA

    International Nuclear Information System (INIS)

    Monnat, R.J. Jr.; Loeb, L.A.

    1985-01-01

    Recombinant DNA techniques have been used to quantitate the amount of nucleotide sequence divergence in the mitochondrial DNA population of individual normal humans. Mitochondrial DNA was isolated from the peripheral blood lymphocytes of five normal humans and cloned in M13 mp11; 49 kilobases of nucleotide sequence information was obtained from 248 independently isolated clones from the five normal donors. Both between- and within-individual differences were identified. Between-individual differences were identified in approximately = to 1/200 nucleotides. In contrast, only one within-individual difference was identified in 49 kilobases of nucleotide sequence information. This high degree of mitochondrial nucleotide sequence homogeneity in human somatic cells is in marked contrast to the rapid evolutionary divergence of human mitochondrial DNA and suggests the existence of mechanisms for the concerted preservation of mammalian mitochondrial DNA sequences in single organisms

  3. Human Chromosome 7: DNA Sequence and Biology

    OpenAIRE

    Scherer, Stephen W.; Cheung, Joseph; MacDonald, Jeffrey R.; Osborne, Lucy R.; Nakabayashi, Kazuhiko; Herbrick, Jo-Anne; Carson, Andrew R.; Parker-Katiraee, Layla; Skaug, Jennifer; Khaja, Razi; Zhang, Junjun; Hudek, Alexander K.; Li, Martin; Haddad, May; Duggan, Gavin E.

    2003-01-01

    DNA sequence and annotation of the entire human chromosome 7, encompassing nearly 158 million nucleotides of DNA and 1917 gene structures, are presented. To generate a higher order description, additional structural features such as imprinted genes, fragile sites, and segmental duplications were integrated at the level of the DNA sequence with medical genetic data, including 440 chromosome rearrangement breakpoints associated with disease. This approach enabled the discovery of candidate gene...

  4. Multiple tag labeling method for DNA sequencing

    Science.gov (United States)

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  5. mtDNA point and length heteroplasmy in high- and low radiation areas of Kerala

    International Nuclear Information System (INIS)

    Forster, L.; Forster, P.; Gurney, S.M.; Spencer, M.; Huang, C.; Röhl, A.; Brinkmann, B.

    2010-01-01

    A coastal peninsula in Kerala (India) contains the world's highest level of natural radioactivity in a densely populated area, offering an opportunity to characterize radiation-associated DNA mutations. Here, we focus on mitochondrial DNA (mtDNA) mutations, which are passed exclusively from the mother to her children. To analyse point mutations, we sampled 248 pedigrees (988 individuals) in the high-radiation peninsula and in nearby low-radiation islands as a control population. Then, in an extended sample of 1,172 mtDNA sequences (containing some non-Indians for comparison), we also analysed length mutations, which in mtDNA can lead to the phenomenon of length heteroplasmy, i.e. the existence of different DNA types in the same cell. We wished to find out how fast mtDNA mutates between generations, and whether the mutation rate is increased in radioactive conditions compared to the low-irradiation sample

  6. The relationship of protein conservation and sequence length

    Directory of Open Access Journals (Sweden)

    Panchenko Anna R

    2002-11-01

    Full Text Available Abstract Background In general, the length of a protein sequence is determined by its function and the wide variance in the lengths of an organism's proteins reflects the diversity of specific functional roles for these proteins. However, additional evolutionary forces that affect the length of a protein may be revealed by studying the length distributions of proteins evolving under weaker functional constraints. Results We performed sequence comparisons to distinguish highly conserved and poorly conserved proteins from the bacterium Escherichia coli, the archaeon Archaeoglobus fulgidus, and the eukaryotes Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. For all organisms studied, the conserved and nonconserved proteins have strikingly different length distributions. The conserved proteins are, on average, longer than the poorly conserved ones, and the length distributions for the poorly conserved proteins have a relatively narrow peak, in contrast to the conserved proteins whose lengths spread over a wider range of values. For the two prokaryotes studied, the poorly conserved proteins approximate the minimal length distribution expected for a diverse range of structural folds. Conclusions There is a relationship between protein conservation and sequence length. For all the organisms studied, there seems to be a significant evolutionary trend favoring shorter proteins in the absence of other, more specific functional constraints.

  7. Directed PCR-free engineering of highly repetitive DNA sequences

    Directory of Open Access Journals (Sweden)

    Preissler Steffen

    2011-09-01

    Full Text Available Abstract Background Highly repetitive nucleotide sequences are commonly found in nature e.g. in telomeres, microsatellite DNA, polyadenine (poly(A tails of eukaryotic messenger RNA as well as in several inherited human disorders linked to trinucleotide repeat expansions in the genome. Therefore, studying repetitive sequences is of biological, biotechnological and medical relevance. However, cloning of such repetitive DNA sequences is challenging because specific PCR-based amplification is hampered by the lack of unique primer binding sites resulting in unspecific products. Results For the PCR-free generation of repetitive DNA sequences we used antiparallel oligonucleotides flanked by restriction sites of Type IIS endonucleases. The arrangement of recognition sites allowed for stepwise and seamless elongation of repetitive sequences. This facilitated the assembly of repetitive DNA segments and open reading frames encoding polypeptides with periodic amino acid sequences of any desired length. By this strategy we cloned a series of polyglutamine encoding sequences as well as highly repetitive polyadenine tracts. Such repetitive sequences can be used for diverse biotechnological applications. As an example, the polyglutamine sequences were expressed as His6-SUMO fusion proteins in Escherichia coli cells to study their aggregation behavior in vitro. The His6-SUMO moiety enabled affinity purification of the polyglutamine proteins, increased their solubility, and allowed controlled induction of the aggregation process. We successfully purified the fusions proteins and provide an example for their applicability in filter retardation assays. Conclusion Our seamless cloning strategy is PCR-free and allows the directed and efficient generation of highly repetitive DNA sequences of defined lengths by simple standard cloning procedures.

  8. Chromatid interchanges at intrachromosomal telomeric DNA sequences

    International Nuclear Information System (INIS)

    Fernandez, J.L.; Vazquez-Gundin, F.; Bilbao, A.; Gosalvez, J.; Goyanes, V.

    1997-01-01

    Chinese hamster Don cells were exposed to X-rays, mitomycin C and teniposide (VM-26) to induce chromatid exchanges (quadriradials and triradials). After fluorescence in situ hybridization (FISH) of telomere sequences it was found that interstitial telomere-like DNA sequence arrays presented around five times more breakage-rearrangements than the genome overall. This high recombinogenic capacity was independent of the clastogen, suggesting that this susceptibility is not related to the initial mechanisms of DNA damage. (author)

  9. Mitochondrial DNA sequence evolution in shorebird populations

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons

  10. Recurrence plot analysis of DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Wu Zuobing [State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080 (China)]. E-mail: wuzb@lnm.imech.ac.cn

    2004-11-15

    Recurrence plot technique of DNA sequences is established on metric representation and employed to analyze correlation structure of nucleotide strings. It is found that, in the transference of nucleotide strings, a human DNA fragment has a major correlation distance, but a yeast chromosome's correlation distance has a constant increasing.

  11. On site DNA barcoding by nanopore sequencing.

    Directory of Open Access Journals (Sweden)

    Michele Menegon

    Full Text Available Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet's biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities.

  12. Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs.

    Directory of Open Access Journals (Sweden)

    Carol Soderlund

    2009-11-01

    Full Text Available Full-length cDNA (FLcDNA sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 5' and 3' UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs, only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org.

  13. Detecting Scareware by Mining Variable Length Instruction Sequences

    OpenAIRE

    Shahzad, Raja Khurram; Lavesson, Niklas

    2011-01-01

    Scareware is a recent type of malicious software that may pose financial and privacy-related threats to novice users. Traditional countermeasures, such as anti-virus software, require regular updates and often lack the capability of detecting novel (unseen) instances. This paper presents a scareware detection method that is based on the application of machine learning algorithms to learn patterns in extracted variable length opcode sequences derived from instruction sequences of binary files....

  14. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  15. Sequencing Intractable DNA to Close Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  16. Direct chromosome-length haplotyping by single-cell sequencing

    NARCIS (Netherlands)

    Porubský, David; Sanders, Ashley D; van Wietmarschen, Niek; Falconer, Ester; Hills, Mark; Spierings, Diana C J; Bevova, Marianna R; Guryev, Victor; Lansdorp, Peter Michael

    Haplotypes are fundamental to fully characterize the diploid genome of an individual, yet methods to directly chart the unique genetic makeup of each parental chromosome are lacking. Here we introduce single-cell DNA template strand sequencing (Strand-seq) as a novel approach to phasing diploid

  17. Bacterial identification and subtyping using DNA microarray and DNA sequencing.

    Science.gov (United States)

    Al-Khaldi, Sufian F; Mossoba, Magdi M; Allard, Marc M; Lienau, E Kurt; Brown, Eric D

    2012-01-01

    The era of fast and accurate discovery of biological sequence motifs in prokaryotic and eukaryotic cells is here. The co-evolution of direct genome sequencing and DNA microarray strategies not only will identify, isotype, and serotype pathogenic bacteria, but also it will aid in the discovery of new gene functions by detecting gene expressions in different diseases and environmental conditions. Microarray bacterial identification has made great advances in working with pure and mixed bacterial samples. The technological advances have moved beyond bacterial gene expression to include bacterial identification and isotyping. Application of new tools such as mid-infrared chemical imaging improves detection of hybridization in DNA microarrays. The research in this field is promising and future work will reveal the potential of infrared technology in bacterial identification. On the other hand, DNA sequencing by using 454 pyrosequencing is so cost effective that the promise of $1,000 per bacterial genome sequence is becoming a reality. Pyrosequencing technology is a simple to use technique that can produce accurate and quantitative analysis of DNA sequences with a great speed. The deposition of massive amounts of bacterial genomic information in databanks is creating fingerprint phylogenetic analysis that will ultimately replace several technologies such as Pulsed Field Gel Electrophoresis. In this chapter, we will review (1) the use of DNA microarray using fluorescence and infrared imaging detection for identification of pathogenic bacteria, and (2) use of pyrosequencing in DNA cluster analysis to fingerprint bacterial phylogenetic trees.

  18. An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

    Directory of Open Access Journals (Sweden)

    Md. Rezaul Karim

    2012-03-01

    Full Text Available Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

  19. Fiscal 2000 report on result of the full-length cDNA structure analysis; 2000 nendo kanzen cho cDNA kozo kaiseki seika hokokusho

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2001-03-01

    This paper explains the results of research on full-length cDNA structure analysis for the period from April, 2000 to March, 2001. The outline of human genome sequence was published in June, 2000. In Japan, human gene analysis was such that, as the basic technology of the bio industry, a millennium project was decided in the budget of fiscal 2000. The full-length cDNA structure analysis is the core of the project. The libraries of cDNA were prepared using full-length and more than 4-5kbp-long cDNAs by oligo-capping method. It began from determining partial sequence data at end cDNA, and then, with new clones selected therefrom, full-length human cDNA sequence data were determined. The partial sequence data determined by fiscal 2000 were 1,035,000 clones while the full-length sequence data were 12,144 clones. The sequence data obtained were analyzed by homology search and translated into amino acid coding sequences, with predictions conducted on protein functions. A clustering method was examined that selects new clones from partial sequences. Database was constructed on gene expression profiles and disease-related gene sequence data. (NEDO)

  20. 3G vector-primer plasmid for constructing full-length-enriched cDNA libraries.

    Science.gov (United States)

    Zheng, Dong; Zhou, Yanna; Zhang, Zidong; Li, Zaiyu; Liu, Xuedong

    2008-09-01

    We designed a 3G vector-primer plasmid for the generation of full-length-enriched complementary DNA (cDNA) libraries. By employing the terminal transferase activity of reverse transcriptase and the modified strand replacement method, this plasmid (assembled with a polydT end and a deoxyguanosine [dG] end) combines priming full-length cDNA strand synthesis and directional cDNA cloning. As a result, the number of steps involved in cDNA library preparation is decreased while simplifying downstream gene manipulation, sequencing, and subcloning. The 3G vector-primer plasmid method yields fully represented plasmid primed libraries that are equivalent to those made by the SMART (switching mechanism at 5' end of RNA transcript) approach.

  1. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  2. Understanding human DNA sequence variation.

    Science.gov (United States)

    Kidd, K K; Pakstis, A J; Speed, W C; Kidd, J R

    2004-01-01

    Over the past century researchers have identified normal genetic variation and studied that variation in diverse human populations to determine the amounts and distributions of that variation. That information is being used to develop an understanding of the demographic histories of the different populations and the species as a whole, among other studies. With the advent of DNA-based markers in the last quarter century, these studies have accelerated. One of the challenges for the next century is to understand that variation. One component of that understanding will be population genetics. We present here examples of many of the ways these new data can be analyzed from a population perspective using results from our laboratory on multiple individual DNA-based polymorphisms, many clustered in haplotypes, studied in multiple populations representing all major geographic regions of the world. These data support an "out of Africa" hypothesis for human dispersal around the world and begin to refine the understanding of population structures and genetic relationships. We are also developing baseline information against which we can compare findings at different loci to aid in the identification of loci subject, now and in the past, to selection (directional or balancing). We do not yet have a comprehensive understanding of the extensive variation in the human genome, but some of that understanding is coming from population genetics.

  3. DNA Sequencing in Cultural Heritage.

    Science.gov (United States)

    Vai, Stefania; Lari, Martina; Caramelli, David

    2016-02-01

    During the last three decades, DNA analysis on degraded samples revealed itself as an important research tool in anthropology, archaeozoology, molecular evolution, and population genetics. Application on topics such as determination of species origin of prehistoric and historic objects, individual identification of famous personalities, characterization of particular samples important for historical, archeological, or evolutionary reconstructions, confers to the paleogenetics an important role also for the enhancement of cultural heritage. A really fast improvement in methodologies in recent years led to a revolution that permitted recovering even complete genomes from highly degraded samples with the possibility to go back in time 400,000 years for samples from temperate regions and 700,000 years for permafrozen remains and to analyze even more recent material that has been subjected to hard biochemical treatments. Here we propose a review on the different methodological approaches used so far for the molecular analysis of degraded samples and their application on some case studies.

  4. Anchoring a Defined Sequence to the 55' Ends of mRNAs : The Bolt to Clone Rare Full Length mRNAs and Generate cDNA Libraries porn a Few Cells.

    Science.gov (United States)

    Baptiste, J; Milne Edwards, D; Delort, J; Mallet, J

    1993-01-01

    Among numerous applications, the polymerase chain reaction (PCR) (1,2) provides a convenient means to clone 5' ends of rare mRNAs and to generate cDNA libraries from tissue available in amounts too low to be processed by conventional methods. Basically, the amplification of cDNAs by the PCR requires the availability of the sequences of two stretches of the molecule to be amplified. A sequence can easily be imposed at the 5' end of the first-strand cDNAs (corresponding to the 3' end of the mRNAs) by priming the reverse transcription with a specific primer (for cloning the 5' end of rare messenger) or with an oligonucleotide tailored with a poly (dT) stretch (for cDNA library construction), taking advantage of the poly (A) sequence that is located at the 3' end of mRNAs. Several strategies have been devised to tag the 3' end of the ss-cDNAs (corresponding to the 55' end of the mRNAs). We (3) and others have described strategies based on the addition of a homopolymeric dG (4,5) or dA (6,7) tail using terminal deoxyribonucleotide transferase (TdT) ("anchor-PCR" [4]). However, this strategy has important limitations. The TdT reaction is difficult to control and has a low efficiency (unpublished observations). But most importantly, the return primers containing a homopolymeric (dC or dT) tail generate nonspecific amplifications, a phenomenon that prevents the isolation of low abundance mRNA species and/or interferes with the relative abundance of primary clones in the library. To circumvent these drawbacks, we have used two approaches. First, we devised a strategy based on a cRNA enrichment procedure, which has been useful to eliminate nonspecific-PCR products and to allow detection and cloning of cDNAs of low abundance (3). More recently, to avoid the nonspecific amplification resulting from the annealing of the homopolymeric tail oligonucleotide, we have developed a novel anchoring strategy that is based on the ligation of an oligonucleotide to the 35' end of ss

  5. Tandemly repeated sequence in 5'end of mtDNA control region of ...

    African Journals Online (AJOL)

    Extensive length variability was observed in 5' end sequence of the mitochondrial DNA control region of the Japanese Spanish mackerel (Scomberomorus niphonius). This length variability was due to the presence of varying numbers of a 56-bp tandemly repeated sequence and a 46-bp insertion/deletion (indel).

  6. DNA origami-based nanoribbons: assembly, length distribution, and twist

    Energy Technology Data Exchange (ETDEWEB)

    Jungmann, Ralf; Scheible, Max; Kuzyk, Anton; Pardatscher, Guenther; Simmel, Friedrich C [Lehrstuhl fuer Bioelektronik, Physik-Department and ZNN/WSI, Technische Universitaet Muenchen, Am Coulombwall 4a, 85748 Garching (Germany); Castro, Carlos E, E-mail: simmel@ph.tum.de [Labor fuer Biomolekulare Nanotechnologie, Physik-Department and ZNN/WSI, Technische Universitaet Muenchen, Am Coulombwall 4a, 85748 Garching (Germany)

    2011-07-08

    A variety of polymerization methods for the assembly of elongated nanoribbons from rectangular DNA origami structures are investigated. The most efficient method utilizes single-stranded DNA oligonucleotides to bridge an intermolecular scaffold seam between origami monomers. This approach allows the fabrication of origami ribbons with lengths of several micrometers, which can be used for long-range ordered arrangement of proteins. It is quantitatively shown that the length distribution of origami ribbons obtained with this technique follows the theoretical prediction for a simple linear polymerization reaction. The design of flat single layer origami structures with constant crossover spacing inevitably results in local underwinding of the DNA helix, which leads to a global twist of the origami structures that also translates to the nanoribbons.

  7. DNA origami-based nanoribbons: assembly, length distribution, and twist

    International Nuclear Information System (INIS)

    Jungmann, Ralf; Scheible, Max; Kuzyk, Anton; Pardatscher, Guenther; Simmel, Friedrich C; Castro, Carlos E

    2011-01-01

    A variety of polymerization methods for the assembly of elongated nanoribbons from rectangular DNA origami structures are investigated. The most efficient method utilizes single-stranded DNA oligonucleotides to bridge an intermolecular scaffold seam between origami monomers. This approach allows the fabrication of origami ribbons with lengths of several micrometers, which can be used for long-range ordered arrangement of proteins. It is quantitatively shown that the length distribution of origami ribbons obtained with this technique follows the theoretical prediction for a simple linear polymerization reaction. The design of flat single layer origami structures with constant crossover spacing inevitably results in local underwinding of the DNA helix, which leads to a global twist of the origami structures that also translates to the nanoribbons.

  8. OPTSDNA: Performance evaluation of an efficient distributed bioinformatics system for DNA sequence analysis.

    Science.gov (United States)

    Khan, Mohammad Ibrahim; Sheel, Chotan

    2013-01-01

    Storage of sequence data is a big concern as the amount of data generated is exponential in nature at several locations. Therefore, there is a need to develop techniques to store data using compression algorithm. Here we describe optimal storage algorithm (OPTSDNA) for storing large amount of DNA sequences of varying length. This paper provides performance analysis of optimal storage algorithm (OPTSDNA) of a distributed bioinformatics computing system for analysis of DNA sequences. OPTSDNA algorithm is used for storing various sizes of DNA sequences into database. DNA sequences of different lengths were stored by using this algorithm. These input DNA sequences are varied in size from very small to very large. Storage size is calculated by this algorithm. Response time is also calculated in this work. The efficiency and performance of the algorithm is high (in size calculation with percentage) when compared with other known with sequential approach.

  9. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    Directory of Open Access Journals (Sweden)

    Moore JE

    2006-01-01

    Full Text Available Abstract Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted.

  10. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    Science.gov (United States)

    Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC

    2006-01-01

    Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935

  11. Concentration and length dependence of DNA looping in transcriptional regulation.

    Directory of Open Access Journals (Sweden)

    Lin Han

    2009-05-01

    Full Text Available In many cases, transcriptional regulation involves the binding of transcription factors at sites on the DNA that are not immediately adjacent to the promoter of interest. This action at a distance is often mediated by the formation of DNA loops: Binding at two or more sites on the DNA results in the formation of a loop, which can bring the transcription factor into the immediate neighborhood of the relevant promoter. These processes are important in settings ranging from the historic bacterial examples (bacterial metabolism and the lytic-lysogeny decision in bacteriophage, to the modern concept of gene regulation to regulatory processes central to pattern formation during development of multicellular organisms. Though there have been a variety of insights into the combinatorial aspects of transcriptional control, the mechanism of DNA looping as an agent of combinatorial control in both prokaryotes and eukaryotes remains unclear. We use single-molecule techniques to dissect DNA looping in the lac operon. In particular, we measure the propensity for DNA looping by the Lac repressor as a function of the concentration of repressor protein and as a function of the distance between repressor binding sites. As with earlier single-molecule studies, we find (at least two distinct looped states and demonstrate that the presence of these two states depends both upon the concentration of repressor protein and the distance between the two repressor binding sites. We find that loops form even at interoperator spacings considerably shorter than the DNA persistence length, without the intervention of any other proteins to prebend the DNA. The concentration measurements also permit us to use a simple statistical mechanical model of DNA loop formation to determine the free energy of DNA looping, or equivalently, the for looping.

  12. The sequence specificity of UV-induced DNA damage in a systematically altered DNA sequence.

    Science.gov (United States)

    Khoe, Clairine V; Chung, Long H; Murray, Vincent

    2018-06-01

    The sequence specificity of UV-induced DNA damage was investigated in a specifically designed DNA plasmid using two procedures: end-labelling and linear amplification. Absorption of UV photons by DNA leads to dimerisation of pyrimidine bases and produces two major photoproducts, cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). A previous study had determined that two hexanucleotide sequences, 5'-GCTC*AC and 5'-TATT*AA, were high intensity UV-induced DNA damage sites. The UV clone plasmid was constructed by systematically altering each nucleotide of these two hexanucleotide sequences. One of the main goals of this study was to determine the influence of single nucleotide alterations on the intensity of UV-induced DNA damage. The sequence 5'-GCTC*AC was designed to examine the sequence specificity of 6-4PPs and the highest intensity 6-4PP damage sites were found at 5'-GTTC*CC nucleotides. The sequence 5'-TATT*AA was devised to investigate the sequence specificity of CPDs and the highest intensity CPD damage sites were found at 5'-TTTT*CG nucleotides. It was proposed that the tetranucleotide DNA sequence, 5'-YTC*Y (where Y is T or C), was the consensus sequence for the highest intensity UV-induced 6-4PP adduct sites; while it was 5'-YTT*C for the highest intensity UV-induced CPD damage sites. These consensus tetranucleotides are composed entirely of consecutive pyrimidines and must have a DNA conformation that is highly productive for the absorption of UV photons. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.

  13. Special Issue: Next Generation DNA Sequencing

    Directory of Open Access Journals (Sweden)

    Paul Richardson

    2010-10-01

    Full Text Available Next Generation Sequencing (NGS refers to technologies that do not rely on traditional dideoxy-nucleotide (Sanger sequencing where labeled DNA fragments are physically resolved by electrophoresis. These new technologies rely on different strategies, but essentially all of them make use of real-time data collection of a base level incorporation event across a massive number of reactions (on the order of millions versus 96 for capillary electrophoresis for instance. The major commercial NGS platforms available to researchers are the 454 Genome Sequencer (Roche, Illumina (formerly Solexa Genome analyzer, the SOLiD system (Applied Biosystems/Life Technologies and the Heliscope (Helicos Corporation. The techniques and different strategies utilized by these platforms are reviewed in a number of the papers in this special issue. These technologies are enabling new applications that take advantage of the massive data produced by this next generation of sequencing instruments. [...

  14. Simulating efficiently the evolution of DNA sequences.

    Science.gov (United States)

    Schöniger, M; von Haeseler, A

    1995-02-01

    Two menu-driven FORTRAN programs are described that simulate the evolution of DNA sequences in accordance with a user-specified model. This general stochastic model allows for an arbitrary stationary nucleotide composition and any transition-transversion bias during the process of base substitution. In addition, the user may define any hypothetical model tree according to which a family of sequences evolves. The programs suggest the computationally most inexpensive approach to generate nucleotide substitutions. Either reproducible or non-repeatable simulations, depending on the method of initializing the pseudo-random number generator, can be performed. The corresponding options are offered by the interface menu.

  15. Genomic signal processing for DNA sequence clustering.

    Science.gov (United States)

    Mendizabal-Ruiz, Gerardo; Román-Godínez, Israel; Torres-Ramos, Sulema; Salido-Ruiz, Ricardo A; Vélez-Pérez, Hugo; Morales, J Alejandro

    2018-01-01

    Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

  16. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  17. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  18. Next generation sequencing of DNA-launched Chikungunya vaccine virus

    Energy Technology Data Exchange (ETDEWEB)

    Hidajat, Rachmat; Nickols, Brian [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States); Forrester, Naomi [Institute for Human Infections and Immunity, Sealy Center for Vaccine Development and Department of Pathology, University of Texas Medical Branch, GNL, 301 University Blvd., Galveston, TX 77555 (United States); Tretyakova, Irina [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States); Weaver, Scott [Institute for Human Infections and Immunity, Sealy Center for Vaccine Development and Department of Pathology, University of Texas Medical Branch, GNL, 301 University Blvd., Galveston, TX 77555 (United States); Pushko, Peter, E-mail: ppushko@medigen-usa.com [Medigen, Inc., 8420 Gas House Pike, Suite S, Frederick, MD 21701 (United States)

    2016-03-15

    Chikungunya virus (CHIKV) represents a pandemic threat with no approved vaccine available. Recently, we described a novel vaccination strategy based on iDNA® infectious clone designed to launch a live-attenuated CHIKV vaccine from plasmid DNA in vitro or in vivo. As a proof of concept, we prepared iDNA plasmid pCHIKV-7 encoding the full-length cDNA of the 181/25 vaccine. The DNA-launched CHIKV-7 virus was prepared and compared to the 181/25 virus. Illumina HiSeq2000 sequencing revealed that with the exception of the 3′ untranslated region, CHIKV-7 viral RNA consistently showed a lower frequency of single-nucleotide polymorphisms than the 181/25 RNA including at the E2-12 and E2-82 residues previously identified as attenuating mutations. In the CHIKV-7, frequencies of reversions at E2-12 and E2-82 were 0.064% and 0.086%, while in the 181/25, frequencies were 0.179% and 0.133%, respectively. We conclude that the DNA-launched virus has a reduced probability of reversion mutations, thereby enhancing vaccine safety. - Highlights: • Chikungunya virus (CHIKV) is an emerging pandemic threat. • In vivo DNA-launched attenuated CHIKV is a novel vaccine technology. • DNA-launched virus was sequenced using HiSeq2000 and compared to the 181/25 virus. • DNA-launched virus has lower frequency of SNPs at E2-12 and E2-82 attenuation loci.

  19. Next generation sequencing of DNA-launched Chikungunya vaccine virus

    International Nuclear Information System (INIS)

    Hidajat, Rachmat; Nickols, Brian; Forrester, Naomi; Tretyakova, Irina; Weaver, Scott; Pushko, Peter

    2016-01-01

    Chikungunya virus (CHIKV) represents a pandemic threat with no approved vaccine available. Recently, we described a novel vaccination strategy based on iDNA® infectious clone designed to launch a live-attenuated CHIKV vaccine from plasmid DNA in vitro or in vivo. As a proof of concept, we prepared iDNA plasmid pCHIKV-7 encoding the full-length cDNA of the 181/25 vaccine. The DNA-launched CHIKV-7 virus was prepared and compared to the 181/25 virus. Illumina HiSeq2000 sequencing revealed that with the exception of the 3′ untranslated region, CHIKV-7 viral RNA consistently showed a lower frequency of single-nucleotide polymorphisms than the 181/25 RNA including at the E2-12 and E2-82 residues previously identified as attenuating mutations. In the CHIKV-7, frequencies of reversions at E2-12 and E2-82 were 0.064% and 0.086%, while in the 181/25, frequencies were 0.179% and 0.133%, respectively. We conclude that the DNA-launched virus has a reduced probability of reversion mutations, thereby enhancing vaccine safety. - Highlights: • Chikungunya virus (CHIKV) is an emerging pandemic threat. • In vivo DNA-launched attenuated CHIKV is a novel vaccine technology. • DNA-launched virus was sequenced using HiSeq2000 and compared to the 181/25 virus. • DNA-launched virus has lower frequency of SNPs at E2-12 and E2-82 attenuation loci.

  20. Aspects of coverage in medical DNA sequencing

    Directory of Open Access Journals (Sweden)

    Wilson Richard K

    2008-05-01

    Full Text Available Abstract Background DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations. Results We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8× to 10× redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26× and 21×, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21× value for normal samples is essentially a constant. Conclusion Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study.

  1. Role of DNA deletion length in mutation and cell survival

    International Nuclear Information System (INIS)

    Braby, L.A.; Morgan, T.L.

    1992-01-01

    A model is presented which is based on the assumption that malignant transformation, mutation, chromosome aberration, and reproductive death of cells are all manifestations of radiation induced deletions in the DNA of the cell, and that the size of the deletion in relation to the spacing of essential genes determines the consequences of that deletion. It is assumed that two independent types of potentially lethal lesions can result in DNA deletions, and that the relative numbers of these types of damage is dependent on radiation quality. The repair of the damage reduces the length of a deletion, but does not always eliminate it. The predictions of this model are in good agreement with a wide variety of experimental evidence. (author)

  2. cDNA sequence quality data - Budding yeast cDNA sequencing project | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Budding yeast cDNA sequencing project cDNA sequence quality data Data detail Data name cDNA sequence quality... data DOI 10.18908/lsdba.nbdc00838-003 Description of data contents Phred's quality score. P...tion Download License Update History of This Database Site Policy | Contact Us cDNA sequence quality

  3. Laser mass spectrometry for DNA sequencing, disease diagnosis, and fingerprinting

    Energy Technology Data Exchange (ETDEWEB)

    Winston Chen, C.H.; Taranenko, N.I.; Zhu, Y.F.; Chung, C.N.; Allman, S.L.

    1997-03-01

    Since laser mass spectrometry has the potential for achieving very fast DNA analysis, the authors recently applied it to DNA sequencing, DNA typing for fingerprinting, and DNA screening for disease diagnosis. Two different approaches for sequencing DNA have been successfully demonstrated. One is to sequence DNA with DNA ladders produced from Snager`s enzymatic method. The other is to do direct sequencing without DNA ladders. The need for quick DNA typing for identification purposes is critical for forensic application. The preliminary results indicate laser mass spectrometry can possibly be used for rapid DNA fingerprinting applications at a much lower cost than gel electrophoresis. Population screening for certain genetic disease can be a very efficient step to reducing medical costs through prevention. Since laser mass spectrometry can provide very fast DNA analysis, the authors applied laser mass spectrometry to disease diagnosis. Clinical samples with both base deletion and point mutation have been tested with complete success.

  4. Method for priming and DNA sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Mugasimangalam, R.C.; Ulanovsky, L.E.

    1997-12-01

    A method is presented for improving the priming specificity of an oligonucleotide primer that is non-unique in a nucleic acid template which includes selecting a continuous stretch of several nucleotides in the template DNA where one of the four bases does not occur in the stretch. This also includes bringing the template DNA in contract with a non-unique primer partially or fully complimentary to the sequence immediately upstream of the selected sequence stretch. This results in polymerase-mediated differential extension of the primer in the presence of a subset of deoxyribonucleotide triphosphates that does not contain the base complementary to the base absent in the selected sequence stretch. These reactions occur at a temperature sufficiently low for allowing the extension of the non-unique primer. The method causes polymerase-mediated extension reactions in the presence of all four natural deoxyribonucleotide triphosphates or modifications. At this high temperature discrimination occurs against priming sites of the non-unique primer where the differential extension has not made the primer sufficiently stable to prime. However, the primer extended at the selected stretch is sufficiently stable to prime.

  5. A colorimetric platform for sensitively differentiating telomere DNA with different lengths, monitoring G-quadruplex and dsDNA based on silver nanoclusters and unmodified gold nanoparticles

    Science.gov (United States)

    Qu, Fei; Chen, Zeqiu; You, Jinmao; Song, Cuihua

    2018-05-01

    Human telomere DNA plays a vital role in genome integrity control and carcinogenesis as an indication for extensive cell proliferation. Herein, silver nanoclusters (Ag NCs) templated by polymer and unmodified gold nanoparticles (Au NPs) are designed as a new colorimetric platform for sensitively differentiating telomere DNA with different lengths, monitoring G-quadruplex and dsDNA. Ag NCs can produce the aggregation of Au NPs, so the color of Au NPs changes to blue and the absorption peak moves to 700 nm. While the telomere DNA can protect Au NPs from aggregation, the color turns to red again and the absorption band blue shift. Benefiting from the obvious color change, we can differentiate the length of telomere DNA by naked eyes. As the length of telomere DNA is longer, the variation of color becomes more noticeable. The detection limits of telomere DNA containing 10, 22, 40, 64 bases are estimated to be 1.41, 1.21, 0.23 and 0.22 nM, respectively. On the other hand, when telomere DNA forms G-quadruplex in the presence of K+, or dsDNA with complementary sequence, both G-quadruplex and dsDNA can protect Au NPs better than the unfolded telomere DNA. Hence, a new colorimetric platform for monitoring structure conversion of DNA is established by Ag NCs-Au NPs system, and to prove this type of application, a selective K+ sensor is developed.

  6. Statistical length of DNA based on AFM image measured by a computer

    International Nuclear Information System (INIS)

    Chen Xinqing; Qiu Xijun; Zhang Yi; Hu Jun; Wu Shiying; Huang Yibo; Ai Xiaobai; Li Minqian

    2001-01-01

    Taking advantage of image processing technology, the contour length of DNA molecule was measured automatically by a computer. Based on the AFM image of DNA, the topography of DNA was simulated into a curve. Then the DNA length was measured automatically by inserting mode. It was shown that the experimental length of a naturally deposited DNA (180.4 +- 16.4 nm) was well consistent with the theoretical length (185.0 nm). Comparing to other methods, the present approach had advantages of precision and automatism. The stretched DNA was also measured. It present approach had advantages of precision and automatism. The stretched DNA was also measured. It was shown that the experimental length (343.6 +- 20.7 nm) was much longer than the theoretical length (307.0 nm). This result indicated that the stretching process had a distinct effect on the DNA length. However, the method provided here avoided the DNA-stretching effect

  7. DNA Duplex Length and Salt Concentration Dependence of Enthalpy−Entropy Compensation Parameters for DNA Melting

    KAUST Repository

    Starikov, E. B.

    2009-08-20

    Systematical differential calorimetry experiments on DNA oligomers with different lengths and placed in water solutions with various added salt concentrations may, in principle, unravel important information about the structure and dynamics of the DNA and their water-counterion surrounding. With this in mind, to reinterpret the most recent results of calorimetric experiments on DNA oligomers of such a kind, the recent enthalpy-entropy compensation theory has been used. It is demonstrated that the application of the latter could enable direct estimation of thermodynamic parameters of the microphase transitions connected to the changes in DNA dynamical regimes versus the length of the biopolymers and the ionic strengths of their water solutions, and this calls for much more systematical experimental and theoretical studies in this field. © 2009 American Chemical Society.

  8. Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

    Science.gov (United States)

    McCutchen-Maloney, Sandra L.

    2002-01-01

    Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.

  9. Poincaré recurrences of DNA sequences

    Science.gov (United States)

    Frahm, K. M.; Shepelyansky, D. L.

    2012-01-01

    We analyze the statistical properties of Poincaré recurrences of Homo sapiens, mammalian, and other DNA sequences taken from the Ensembl Genome data base with up to 15 billion base pairs. We show that the probability of Poincaré recurrences decays in an algebraic way with the Poincaré exponent β≈4 even if the oscillatory dependence is well pronounced. The correlations between recurrences decay with an exponent ν≈0.6 that leads to an anomalous superdiffusive walk. However, for Homo sapiens sequences, with the largest available statistics, the diffusion coefficient converges to a finite value on distances larger than one million base pairs. We argue that the approach based on Poncaré recurrences determines new proximity features between different species and sheds a new light on their evolution history.

  10. Cloning and sequencing of Indian Water buffalo (Bubalus bubalis) interleukin-3 cDNA

    KAUST Repository

    Sugumar, Thennarasu; Harishankar, M.; Dhinakar Raj, G.

    2011-01-01

    Full-length cDNA (435 bp) of the interleukin-3(IL-3) gene of the Indian water buffalo was amplified by reverse transcriptase-polymerase chain reaction and sequenced. This sequence had 96% nucleotide identity and 92% amino acid identity with bovine

  11. Authentication of Botanical Origin in Herbal Teas by Plastid Noncoding DNA Length Polymorphisms.

    Science.gov (United States)

    Uncu, Ali Tevfik; Uncu, Ayse Ozgur; Frary, Anne; Doganlar, Sami

    2015-07-01

    The aim of this study was to develop a DNA barcode assay to authenticate the botanical origin of herbal teas. To reach this aim, we tested the efficiency of a PCR-capillary electrophoresis (PCR-CE) approach on commercial herbal tea samples using two noncoding plastid barcodes, the trnL intron and the intergenic spacer between trnL and trnF. Barcode DNA length polymorphisms proved successful in authenticating the species origin of herbal teas. We verified the validity of our approach by sequencing species-specific barcode amplicons from herbal tea samples. Moreover, we displayed the utility of PCR-CE assays coupled with sequencing to identify the origin of undeclared plant material in herbal tea samples. The PCR-CE assays proposed in this work can be applied as routine tests for the verification of botanical origin in herbal teas and can be extended to authenticate all types of herbal foodstuffs.

  12. A microfabricated hybrid device for DNA sequencing.

    Science.gov (United States)

    Liu, Shaorong

    2003-11-01

    We have created a hybrid device of a microfabricated round-channel twin-T injector incorporated with a separation capillary in order to extend the straight separation distance for high speed and long readlength DNA sequencing. Semicircular grooves on glass wafers are obtained using a photomask with a narrow line-width and a standard isotropic photolithographic etching process. Round channels are made when two etched wafers are face-to-face aligned and bonded. A two-mask fabrication process has been developed to make channels of two different diameters. The twin-T injector is formed by the smaller channels whose diameter matches the bore of the separation capillary, and the "usual" separation channel, now called the connection channel, is formed by the larger ones whose diameter matches the outer diameter of the separation capillary. The separation capillary is inserted through the connection channel all the way to the twin-T injector to allow the capillary bore flush with the twin-T injector channels. The total dead-volume of the connection is estimated to be approximately 5 pL. To demonstrate the efficiency of this hybrid device, we have performed four-color DNA sequencing on it. Using a 200 microm twin-T injector coupled with a separation capillary of 20 cm effective separation distance, we have obtained readlengths of 800 plus bases at an accuracy of 98.5% in 56 min, compared to about 650 bases in 100 min on a conventional 40 cm long capillary sequencing machine under similar conditions. At an increased separation field strength and using a diluted sieving matrix, the separation time has been reduced to 20 min with a readlength of 700 bases at 98.5% base-calling accuracy.

  13. Identification of Meconopsis species by a DNA barcode sequence ...

    African Journals Online (AJOL)

    Deoxyribonucleic acid (DNA) barcoding is a novel technology that uses a standard DNA sequence to facilitate species identification. Species identification is necessary for the authentication of traditional plant based medicines. Although a consensus has not been agreed regarding which DNA sequences can be used as ...

  14. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  15. Phylogeny of the Serrasalmidae (Characiformes based on mitochondrial DNA sequences

    Directory of Open Access Journals (Sweden)

    Guillermo Ortí

    2008-01-01

    Full Text Available Previous studies based on DNA sequences of mitochondrial (mt rRNA genes showed three main groups within the subfamily Serrasalminae: (1 a "pacu" clade of herbivores (Colossoma, Mylossoma, Piaractus; (2 the "Myleus" clade (Myleus, Mylesinus, Tometes, Ossubtus; and (3 the "piranha" clade (Serrasalmus, Pygocentrus, Pygopristis, Pristobrycon, Catoprion, Metynnis. The genus Acnodon was placed as the sister taxon of clade (2+3. However, poor resolution within each clade was obtained due to low levels of variation among rRNA gene sequences. Complete sequences of the hypervariable mtDNA control region for a total of 45 taxa, and additional sequences of 12S and 16S rRNA from a total of 74 taxa representing all genera in the family are now presented to address intragroup relationships. Control region sequences of several serrasalmid species exhibit tandem repeats of short motifs (12 to 33 bp in the 3' end of this region, accounting for substantial length variation. Bayesian inference and maximum parsimony analyses of these sequences identify the same groupings as before and provide further evidence to support the following observations: (a Serrasalmus gouldingi and species of Pristobrycon (non-striolatus form a monophyletic group that is the sister group to other species of Serrasalmus and Pygocentrus; (b Catoprion, Pygopristis, and Pristobrycon striolatus form a well supported clade, sister to the group described above; (c some taxa assigned to the genus Myloplus (M. asterias, M tiete, M ternetzi, and M rubripinnis form a well supported group whereas other Myloplus species remain with uncertain affinities (d Mylesinus, Tometes and Myleus setiger form a monophyletic group.

  16. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Rogelio Alcántara-Silva

    2017-03-01

    Full Text Available Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .

  17. Next Generation DNA Sequencing and the Future of Genomic Medicine

    OpenAIRE

    Anderson, Matthew W.; Schrijver, Iris

    2010-01-01

    In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpreta...

  18. SWORDS: A statistical tool for analysing large DNA sequences

    Indian Academy of Sciences (India)

    Unknown

    These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in ... tions with the cellular processes like recombination, replication .... in DNA sequences using certain specific probability laws. (Pevzner et al ...

  19. Full-length sequencing and identification of novel polymorphisms in ...

    Indian Academy of Sciences (India)

    The aim of this work was to sequence the entirecoding region of ACACA gene in Valle del Belice sheep breed to identify polymorphic sites. A total of 51 coding exons of ACACA gene were sequenced in 32 individuals of Valle del Belice sheep breed. Sequencing analysis and alignment of obtained sequences showed the ...

  20. A sequence-dependent rigid-base model of DNA

    Science.gov (United States)

    Gonzalez, O.; Petkevičiutė, D.; Maddocks, J. H.

    2013-02-01

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can

  1. A sequence-dependent rigid-base model of DNA.

    Science.gov (United States)

    Gonzalez, O; Petkevičiūtė, D; Maddocks, J H

    2013-02-07

    A novel hierarchy of coarse-grain, sequence-dependent, rigid-base models of B-form DNA in solution is introduced. The hierarchy depends on both the assumed range of energetic couplings, and the extent of sequence dependence of the model parameters. A significant feature of the models is that they exhibit the phenomenon of frustration: each base cannot simultaneously minimize the energy of all of its interactions. As a consequence, an arbitrary DNA oligomer has an intrinsic or pre-existing stress, with the level of this frustration dependent on the particular sequence of the oligomer. Attention is focussed on the particular model in the hierarchy that has nearest-neighbor interactions and dimer sequence dependence of the model parameters. For a Gaussian version of this model, a complete coarse-grain parameter set is estimated. The parameterized model allows, for an oligomer of arbitrary length and sequence, a simple and explicit construction of an approximation to the configuration-space equilibrium probability density function for the oligomer in solution. The training set leading to the coarse-grain parameter set is itself extracted from a recent and extensive database of a large number of independent, atomic-resolution molecular dynamics (MD) simulations of short DNA oligomers immersed in explicit solvent. The Kullback-Leibler divergence between probability density functions is used to make several quantitative assessments of our nearest-neighbor, dimer-dependent model, which is compared against others in the hierarchy to assess various assumptions pertaining both to the locality of the energetic couplings and to the level of sequence dependence of its parameters. It is also compared directly against all-atom MD simulation to assess its predictive capabilities. The results show that the nearest-neighbor, dimer-dependent model can successfully resolve sequence effects both within and between oligomers. For example, due to the presence of frustration, the model can

  2. Is DNA a worm-like chain in Couette flow? In search of persistence length, a critical review.

    Science.gov (United States)

    Rittman, Martyn; Gilroy, Emma; Koohya, Hashem; Rodger, Alison; Richards, Adair

    2009-01-01

    Persistence length is the foremost measure of DNA flexibility. Its origins lie in polymer theory which was adapted for DNA following the determination of BDNA structure in 1953. There is no single definition of persistence length used, and the links between published definitions are based on assumptions which may, or may not be, clearly stated. DNA flexibility is affected by local ionic strength, solvent environment, bound ligands and intrinsic sequence-dependent flexibility. This article is a review of persistence length providing a mathematical treatment of the relationships between four definitions of persistence length, including: correlation, Kuhn length, bending, and curvature. Persistence length has been measured using various microscopy, force extension and solution methods such as linear dichroism and transient electric birefringence. For each experimental method a model of DNA is required to interpret the data. The importance of understanding the underlying models, along with the assumptions required by each definition to determine a value of persistence length, is highlighted for linear dichroism data, where it transpires that no model is currently available for long DNA or medium to high shear rate experiments.

  3. Analysis of mutation/rearrangement frequencies and methylation patterns at a given DNA locus using restriction fragment length polymorphism.

    Science.gov (United States)

    Boyko, Alex; Kovalchuk, Igor

    2010-01-01

    Restriction fragment length polymorphism (RFLP) is a difference in DNA sequences of organisms belonging to the same species. RFLPs are typically detected as DNA fragments of different lengths after digestion with various restriction endonucleases. The comparison of RFLPs allows investigators to analyze the frequency of occurrence of mutations, such as point mutations, deletions, insertions, and gross chromosomal rearrangements, in the progeny of stressed plants. The assay involves restriction enzyme digestion of DNA followed by hybridization of digested DNA using a radioactively or enzymatically labeled probe. Since DNA can be digested with methylation sensitive enzymes, the assay can also be used to analyze a methylation pattern of a particular locus. Here, we describe RFLP analysis using methylation-insensitive and methylation-sensitive enzymes.

  4. A novel constraint for thermodynamically designing DNA sequences.

    Directory of Open Access Journals (Sweden)

    Qiang Zhang

    Full Text Available Biotechnological and biomolecular advances have introduced novel uses for DNA such as DNA computing, storage, and encryption. For these applications, DNA sequence design requires maximal desired (and minimal undesired hybridizations, which are the product of a single new DNA strand from 2 single DNA strands. Here, we propose a novel constraint to design DNA sequences based on thermodynamic properties. Existing constraints for DNA design are based on the Hamming distance, a constraint that does not address the thermodynamic properties of the DNA sequence. Using a unique, improved genetic algorithm, we designed DNA sequence sets which satisfy different distance constraints and employ a free energy gap based on a minimum free energy (MFE to gauge DNA sequences based on set thermodynamic properties. When compared to the best constraints of the Hamming distance, our method yielded better thermodynamic qualities. We then used our improved genetic algorithm to obtain lower-bound DNA sequence sets. Here, we discuss the effects of novel constraint parameters on the free energy gap.

  5. An automated annotation tool for genomic DNA sequences using

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  6. The nucleotide sequence of human transition protein 1 cDNA

    Energy Technology Data Exchange (ETDEWEB)

    Luerssen, H; Hoyer-Fender, S; Engel, W [Universitaet Goettingen (West Germany)

    1988-08-11

    The authors have screened a human testis cDNA library with an oligonucleotide of 81 mer prepared according to a part of the published nucleotide sequence of the rat transition protein TP 1. They have isolated a cDNA clone with the length of 441 bp containing the coding region of 162 bp for human transition protein 1. There is about 84% homology in the coding region of the sequence compared to rat. The human cDNA-clone encodes a polypeptide of 54 amino acids of which 7 are different to that of rat.

  7. Hypervariable minisatellite DNA sequences in the Indian peafowl Pavo cristatus.

    Science.gov (United States)

    Hanotte, O; Burke, T; Armour, J A; Jeffreys, A J

    1991-04-01

    We report here for the first time the large-scale isolation of hypervariable minisatellite DNA sequences from a non-human species, the Indian peafowl (Pavo cristatus). A size-selected genomic DNA fraction, rich in hypervariable minisatellites, was cloned into Charomid 9-36. This library was screened using two multilocus hypervariable probes, 33.6 and 33.15 and also, in a "probe-walking" approach, with five of the peafowl minisatellites initially isolated. Forty-eight positively hybridizing clones were characterized and found to originate from 30 different loci, 18 of which were polymorphic. Five of these variable minisatellite loci were studied further. They all showed Mendelian inheritance. The heterozygosities of these loci were relatively low (range 22-78%) in comparison with those of previously cloned human loci, as expected in view of inbreeding in our semicaptive study population. No new length allele mutations were observed in families and the mean mutation rate per locus is low (less than 0.004, 95% confidence maximum). These loci were also investigated by cross-species hybridization in related taxa. The ability of the probes to detect hypervariable sequences in other species within the same avian family was found to vary, from those probes that are species-specific to those that are apparently general to the family. We also illustrate the potential usefulness of these probes for paternity analysis in a study of sexual selection, and discuss the general application of specific hypervariable probes in behavioral and evolutionary studies.

  8. Mouse tetranectin: cDNA sequence, tissue-specific expression, and chromosomal mapping

    DEFF Research Database (Denmark)

    Ibaraki, K; Kozak, C A; Wewer, U M

    1995-01-01

    regulation, mouse tetranectin cDNA was cloned from a 16-day-old mouse embryo library. Sequence analysis revealed a 992-bp cDNA with an open reading frame of 606 bp, which is identical in length to the human tetranectin cDNA. The deduced amino acid sequence showed high homology to the human cDNA with 76......(s) of tetranectin. The sequence analysis revealed a difference in both sequence and size of the noncoding regions between mouse and human cDNAs. Northern analysis of the various tissues from mouse, rat, and cow showed the major transcript(s) to be approximately 1 kb, which is similar in size to that observed...

  9. Sequence-specific activation of the DNA sensor cGAS by Y-form DNA structures as found in primary HIV-1 cDNA.

    Science.gov (United States)

    Herzner, Anna-Maria; Hagmann, Cristina Amparo; Goldeck, Marion; Wolter, Steven; Kübler, Kirsten; Wittmann, Sabine; Gramberg, Thomas; Andreeva, Liudmila; Hopfner, Karl-Peter; Mertens, Christina; Zillinger, Thomas; Jin, Tengchuan; Xiao, Tsan Sam; Bartok, Eva; Coch, Christoph; Ackermann, Damian; Hornung, Veit; Ludwig, Janos; Barchet, Winfried; Hartmann, Gunther; Schlee, Martin

    2015-10-01

    Cytosolic DNA that emerges during infection with a retrovirus or DNA virus triggers antiviral type I interferon responses. So far, only double-stranded DNA (dsDNA) over 40 base pairs (bp) in length has been considered immunostimulatory. Here we found that unpaired DNA nucleotides flanking short base-paired DNA stretches, as in stem-loop structures of single-stranded DNA (ssDNA) derived from human immunodeficiency virus type 1 (HIV-1), activated the type I interferon-inducing DNA sensor cGAS in a sequence-dependent manner. DNA structures containing unpaired guanosines flanking short (12- to 20-bp) dsDNA (Y-form DNA) were highly stimulatory and specifically enhanced the enzymatic activity of cGAS. Furthermore, we found that primary HIV-1 reverse transcripts represented the predominant viral cytosolic DNA species during early infection of macrophages and that these ssDNAs were highly immunostimulatory. Collectively, our study identifies unpaired guanosines in Y-form DNA as a highly active, minimal cGAS recognition motif that enables detection of HIV-1 ssDNA.

  10. Levenshtein error-correcting barcodes for multiplexed DNA sequencing

    NARCIS (Netherlands)

    Buschmann, Tilo; Bystrykh, Leonid V.

    2013-01-01

    Background: High-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called

  11. Molecular design of sequence specific DNA alkylating agents.

    Science.gov (United States)

    Minoshima, Masafumi; Bando, Toshikazu; Shinohara, Ken-ichi; Sugiyama, Hiroshi

    2009-01-01

    Sequence-specific DNA alkylating agents have great interest for novel approach to cancer chemotherapy. We designed the conjugates between pyrrole (Py)-imidazole (Im) polyamides and DNA alkylating chlorambucil moiety possessing at different positions. The sequence-specific DNA alkylation by conjugates was investigated by using high-resolution denaturing polyacrylamide gel electrophoresis (PAGE). The results showed that polyamide chlorambucil conjugates alkylate DNA at flanking adenines in recognition sequences of Py-Im polyamides, however, the reactivities and alkylation sites were influenced by the positions of conjugation. In addition, we synthesized conjugate between Py-Im polyamide and another alkylating agent, 1-(chloromethyl)-5-hydroxy-1,2-dihydro-3H-benz[e]indole (seco-CBI). DNA alkylation reactivies by both alkylating polyamides were almost comparable. In contrast, cytotoxicities against cell lines differed greatly. These comparative studies would promote development of appropriate sequence-specific DNA alkylating polyamides against specific cancer cells.

  12. Full-length enriched multistage cDNA library construction covering ...

    African Journals Online (AJOL)

    DR TONUKARI NYEROVWO

    2012-04-10

    Apr 10, 2012 ... Full Length Research Paper. Full-length enriched ... complementary DNA; pfu, plaque-forming unit. ... Chinese-native tree species in Populus section Leuce ... the infected bacteria, 2 ml melted top agar was added, and the.

  13. Food Fish Identification from DNA Extraction through Sequence Analysis

    Science.gov (United States)

    Hallen-Adams, Heather E.

    2015-01-01

    This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…

  14. DNA Polymerases Drive DNA Sequencing-by-Synthesis Technologies: Both Past and Present

    Directory of Open Access Journals (Sweden)

    Cheng-Yao eChen

    2014-06-01

    Full Text Available Next-generation sequencing (NGS technologies have revolutionized modern biological and biomedical research. The engines responsible for this innovation are DNA polymerases; they catalyze the biochemical reaction for deriving template sequence information. In fact, DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. E. coli DNA polymerase I proteolytic (Klenow fragment was originally utilized in Sanger's dideoxy chain terminating DNA sequencing chemistry. From these humble beginnings followed an explosion of organism-specific, genome sequence information accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic bacteria/archaea were modified and tested in today's standard capillary electrophoresis (CE and NGS sequencing platforms. These enzymes were selected for their efficient incorporation of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage ⱷ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphate-labeled nucleoside polyphosphates. Furthermore, ⱷ29 enzyme has also been utilized in emerging DNA sequencing technologies including nanopore-, and protein-transistor-based sequencing. DNA polymerase is, and will continue to be, a crucial component of sequencing technologies.

  15. Sequence of human protamine 2 cDNA

    Energy Technology Data Exchange (ETDEWEB)

    Domenjoud, L; Fronia, C; Uhde, F; Engel, W [Universitaet Goettingen (West Germany)

    1988-08-11

    The authors report the cloning and sequencing of a cDNA clone for human protamine 2 (hp2), isolated from a human testis cDNA library cloned in the vector {lambda}-gt11. A 66mer oligonucleotide, that corresponds to an amino acid sequence which is highly conserved between hp2 and mouse protamine 2 (mp2) served as hybridization probe. The homology between the amino acid sequence deduced from our cDNA and the published amino acid sequence for hp2 is 100%.

  16. Sequence periodicity in nucleosomal DNA and intrinsic curvature.

    Science.gov (United States)

    Nair, T Murlidharan

    2010-05-17

    Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA.

  17. Rapid Multiplex Small DNA Sequencing on the MinION Nanopore Sequencing Platform

    Directory of Open Access Journals (Sweden)

    Shan Wei

    2018-05-01

    Full Text Available Real-time sequencing of short DNA reads has a wide variety of clinical and research applications including screening for mutations, target sequences and aneuploidy. We recently demonstrated that MinION, a nanopore-based DNA sequencing device the size of a USB drive, could be used for short-read DNA sequencing. In this study, an ultra-rapid multiplex library preparation and sequencing method for the MinION is presented and applied to accurately test normal diploid and aneuploidy samples’ genomic DNA in under three hours, including library preparation and sequencing. This novel method shows great promise as a clinical diagnostic test for applications requiring rapid short-read DNA sequencing.

  18. Substrate sequence selectivity of APOBEC3A implicates intra-DNA interactions.

    Science.gov (United States)

    Silvas, Tania V; Hou, Shurong; Myint, Wazo; Nalivaika, Ellen; Somasundaran, Mohan; Kelch, Brian A; Matsuo, Hiroshi; Kurt Yilmaz, Nese; Schiffer, Celia A

    2018-05-14

    The APOBEC3 (A3) family of human cytidine deaminases is renowned for providing a first line of defense against many exogenous and endogenous retroviruses. However, the ability of these proteins to deaminate deoxycytidines in ssDNA makes A3s a double-edged sword. When overexpressed, A3s can mutate endogenous genomic DNA resulting in a variety of cancers. Although the sequence context for mutating DNA varies among A3s, the mechanism for substrate sequence specificity is not well understood. To characterize substrate specificity of A3A, a systematic approach was used to quantify the affinity for substrate as a function of sequence context, length, secondary structure, and solution pH. We identified the A3A ssDNA binding motif as (T/C)TC(A/G), which correlated with enzymatic activity. We also validated that A3A binds RNA in a sequence specific manner. A3A bound tighter to substrate binding motif within a hairpin loop compared to linear oligonucleotide, suggesting A3A affinity is modulated by substrate structure. Based on these findings and previously published A3A-ssDNA co-crystal structures, we propose a new model with intra-DNA interactions for the molecular mechanism underlying A3A sequence preference. Overall, the sequence and structural preferences identified for A3A leads to a new paradigm for identifying A3A's involvement in mutation of endogenous or exogenous DNA.

  19. Low-Energy Electron-Induced Strand Breaks in Telomere-Derived DNA Sequences-Influence of DNA Sequence and Topology.

    Science.gov (United States)

    Rackwitz, Jenny; Bald, Ilko

    2018-03-26

    During cancer radiation therapy high-energy radiation is used to reduce tumour tissue. The irradiation produces a shower of secondary low-energy (DNA very efficiently by dissociative electron attachment. Recently, it was suggested that low-energy electron-induced DNA strand breaks strongly depend on the specific DNA sequence with a high sensitivity of G-rich sequences. Here, we use DNA origami platforms to expose G-rich telomere sequences to low-energy (8.8 eV) electrons to determine absolute cross sections for strand breakage and to study the influence of sequence modifications and topology of telomeric DNA on the strand breakage. We find that the telomeric DNA 5'-(TTA GGG) 2 is more sensitive to low-energy electrons than an intermixed sequence 5'-(TGT GTG A) 2 confirming the unique electronic properties resulting from G-stacking. With increasing length of the oligonucleotide (i.e., going from 5'-(GGG ATT) 2 to 5'-(GGG ATT) 4 ), both the variety of topology and the electron-induced strand break cross sections increase. Addition of K + ions decreases the strand break cross section for all sequences that are able to fold G-quadruplexes or G-intermediates, whereas the strand break cross section for the intermixed sequence remains unchanged. These results indicate that telomeric DNA is rather sensitive towards low-energy electron-induced strand breakage suggesting significant telomere shortening that can also occur during cancer radiation therapy. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Isolation of full-length putative rat lysophospholipase cDNA using improved methods for mRNA isolation and cDNA cloning

    International Nuclear Information System (INIS)

    Han, J.H.; Stratowa, C.; Rutter, W.J.

    1987-01-01

    The authors have cloned a full-length putative rat pancreatic lysophospholipase cDNA by an improved mRNA isolation method and cDNA cloning strategy using [ 32 P]-labelled nucleotides. These new methods allow the construction of a cDNA library from the adult rat pancreas in which the majority of recombinant clones contained complete sequences for the corresponding mRNAs. A previously recognized but unidentified long and relatively rare cDNA clone containing the entire sequence from the cap site at the 5' end to the poly(A) tail at the 3' end of the mRNA was isolated by single-step screening of the library. The size, amino acid composition, and the activity of the protein expressed in heterologous cells strongly suggest this mRNA codes for lysophospholipase

  1. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations.

    Science.gov (United States)

    Oikonomopoulos, Spyros; Wang, Yu Chang; Djambazian, Haig; Badescu, Dunarel; Ragoussis, Jiannis

    2016-08-24

    To assess the performance of the Oxford Nanopore Technologies MinION sequencing platform, cDNAs from the External RNA Controls Consortium (ERCC) RNA Spike-In mix were sequenced. This mix mimics mammalian mRNA species and consists of 92 polyadenylated transcripts with known concentration. cDNA libraries were generated using a template switching protocol to facilitate the direct comparison between different sequencing platforms. The MinION performance was assessed for its ability to sequence the cDNAs directly with good accuracy in terms of abundance and full length. The abundance of the ERCC cDNA molecules sequenced by MinION agreed with their expected concentration. No length or GC content bias was observed. The majority of cDNAs were sequenced as full length. Additionally, a complex cDNA population derived from a human HEK-293 cell line was sequenced on an Illumina HiSeq 2500, PacBio RS II and ONT MinION platforms. We observed that there was a good agreement in the measured cDNA abundance between PacBio RS II and ONT MinION (rpearson = 0.82, isoforms with length more than 700bp) and between Illumina HiSeq 2500 and ONT MinION (rpearson = 0.75). This indicates that the ONT MinION can sequence quantitatively both long and short full length cDNA molecules.

  2. Spectral entropy criteria for structural segmentation in genomic DNA sequences

    International Nuclear Information System (INIS)

    Chechetkin, V.R.; Lobzin, V.V.

    2004-01-01

    The spectral entropy is calculated with Fourier structure factors and characterizes the level of structural ordering in a sequence of symbols. It may efficiently be applied to the assessment and reconstruction of the modular structure in genomic DNA sequences. We present the relevant spectral entropy criteria for the local and non-local structural segmentation in DNA sequences. The results are illustrated with the model examples and analysis of intervening exon-intron segments in the protein-coding regions

  3. Adenoviral DNA replication: DNA sequences and enzymes required for initiation in vitro

    International Nuclear Information System (INIS)

    Stillman, B.W.; Tamanoi, F.

    1983-01-01

    In this paper evidence is provided that the 140,000-dalton DNA polymerase is encoded by the adenoviral genome and is required for the initiation of DNA replication in vitro. The DNA sequences in the template DNA that are required for the initiation of replication have also been identified, using both plasmid DNAs and synthetic oligodeoxyribonucleotides. 48 references, 7 figures, 1 table

  4. A two-locus DNA sequence database for typing plant and human pathogens within the Fusarium oxysporum species complex

    DEFF Research Database (Denmark)

    O'Donnell, Kerry; Gueidan, C; Sink, S

    2009-01-01

    We constructed a two-locus database, comprising partial translation elongation factor (EF-1alpha) gene sequences and nearly full-length sequences of the nuclear ribosomal intergenic spacer region (IGS rDNA) for 850 isolates spanning the phylogenetic breadth of the Fusarium oxysporum species compl...... of the IGS rDNA sequences may be non-orthologous. We also evaluated enniatin, fumonisin and moniliformin mycotoxin production in vitro within a phylogenetic framework....

  5. Isolation and characterization of full-length cDNA clones coding for cholinesterase from fetal human tissues

    International Nuclear Information System (INIS)

    Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.; Goldberg, O.; Soreq, H.

    1987-01-01

    To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from λgt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A) + RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species

  6. Next Generation Sequencing of Ancient DNA: Requirements, Strategies and Perspectives

    Directory of Open Access Journals (Sweden)

    Michael Knapp

    2010-07-01

    Full Text Available The invention of next-generation-sequencing has revolutionized almost all fields of genetics, but few have profited from it as much as the field of ancient DNA research. From its beginnings as an interesting but rather marginal discipline, ancient DNA research is now on its way into the centre of evolutionary biology. In less than a year from its invention next-generation-sequencing had increased the amount of DNA sequence data available from extinct organisms by several orders of magnitude. Ancient DNA  research is now not only adding a temporal aspect to evolutionary studies and allowing for the observation of evolution in real time, it also provides important data to help understand the origins of our own species. Here we review progress that has been made in next-generation-sequencing of ancient DNA over the past five years and evaluate sequencing strategies and future directions.

  7. Lack of association of colonic epithelium telomere length and oxidative DNA damage in Type 2 diabetes under good metabolic control

    Directory of Open Access Journals (Sweden)

    Kennedy Hugh

    2008-10-01

    Full Text Available Abstract Background Telomeres are DNA repeat sequences necessary for DNA replication which shorten at cell division at a rate directly related to levels of oxidative stress. Critical telomere shortening predisposes to cell senescence and to epithelial malignancies. Type 2 diabetes is characterised by increased oxidative DNA damage, telomere attrition, and an increased risk of colonic malignancy. We hypothesised that the colonic mucosa in Type 2 diabetes would be characterised by increased DNA damage and telomere shortening. Methods We examined telomere length (by flow fluorescent in situ hybridization and oxidative DNA damage (flow cytometry of 8 – oxoguanosine in the colonic mucosal cells of subjects with type 2 diabetes (n = 10; mean age 62.2 years, mean HbA1c 6.9% and 22 matched control subjects. No colonic pathology was apparent in these subjects at routine gastrointestinal investigations. Results Mean colonic epithelial telomere length in the diabetes group was not significantly different from controls (10.6 [3.6] vs. 12.1 [3.4] Molecular Equivalent of Soluble Fluorochrome Units [MESF]; P = 0.5. Levels of oxidative DNA damage were similar in both T2DM and control groups (2.6 [0.6] vs. 2.5 [0.6] Mean Fluorescent Intensity [MFI]; P = 0.7. There was no significant relationship between oxidative DNA damage and telomere length in either group (both p > 0.1. Conclusion Colonic epithelium in Type 2 diabetes does not differ significantly from control colonic epithelium in oxidative DNA damage or telomere length. There is no evidence in this study for increased oxidative DNA damage or significant telomere attrition in colonic mucosa as a carcinogenic mechanism.

  8. Thermodynamics of sequence-specific binding of PNA to DNA

    DEFF Research Database (Denmark)

    Ratilainen, T; Holmén, A; Tuite, E

    2000-01-01

    For further characterization of the hybridization properties of peptide nucleic acids (PNAs), the thermodynamics of hybridization of mixed sequence PNA-DNA duplexes have been studied. We have characterized the binding of PNA to DNA in terms of binding affinity (perfectly matched duplexes) and seq......For further characterization of the hybridization properties of peptide nucleic acids (PNAs), the thermodynamics of hybridization of mixed sequence PNA-DNA duplexes have been studied. We have characterized the binding of PNA to DNA in terms of binding affinity (perfectly matched duplexes...

  9. Concentrating Genomic Length DNA in a Microfabricated Array

    DEFF Research Database (Denmark)

    Chen, Yu; Abrams, Ezra S.; Boles, T. Christian

    2015-01-01

    the DNA molecules to minimal coil size using polyethylene glycol (PEG) derived depletion forces. We map out the sweet spot, where concentration occurs, as a function of PEG concentration and flow speed using a combination of theoretical analysis and experiment. Purification of DNA from enzymatic reactions...

  10. Cloning, sequencing, and expression of cDNA for human β-glucuronidase

    International Nuclear Information System (INIS)

    Oshima, A.; Kyle, J.W.; Miller, R.D.

    1987-01-01

    The authors report here the cDNA sequence for human placental β-glucuronidase (β-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH 2 -terminal amino acid sequence determined for human spleen β-glucuronidase agreed with that inferred from the DNA sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human β-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human β-glucuronidase, demonstrate the existence of two populations of mRNA for β-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length

  11. Transcriptional blockages in a cell-free system by sequence-selective DNA alkylating agents.

    Science.gov (United States)

    Ferguson, L R; Liu, A P; Denny, W A; Cullinane, C; Talarico, T; Phillips, D R

    2000-04-14

    There is considerable interest in DNA sequence-selective DNA-binding drugs as potential inhibitors of gene expression. Five compounds with distinctly different base pair specificities were compared in their effects on the formation and elongation of the transcription complex from the lac UV5 promoter in a cell-free system. All were tested at drug levels which killed 90% of cells in a clonogenic survival assay. Cisplatin, a selective alkylator at purine residues, inhibited transcription, decreasing the full-length transcript, and causing blockage at a number of GG or AG sequences, making it probable that intrastrand crosslinks are the blocking lesions. A cyclopropylindoline known to be an A-specific alkylator also inhibited transcription, with blocks at adenines. The aniline mustard chlorambucil, that targets primarily G but also A sequences, was also effective in blocking the formation of full-length transcripts. It produced transcription blocks either at, or one base prior to, AA or GG sequences, suggesting that intrastrand crosslinks could again be involved. The non-alkylating DNA minor groove binder Hoechst 33342 (a bisbenzimidazole) blocked formation of the full-length transcript, but without creating specific blockage sites. A bisbenzimidazole-linked aniline mustard analogue was a more effective transcription inhibitor than either chlorambucil or Hoechst 33342, with different blockage sites occurring immediately as compared with 2 h after incubation. The blockages were either immediately prior to AA or GG residues, or four to five base pairs prior to such sites, a pattern not predicted from in vitro DNA-binding studies. Minor groove DNA-binding ligands are of particular interest as inhibitors of gene expression, since they have the potential ability to bind selectively to long sequences of DNA. The results suggest that the bisbenzimidazole-linked mustard does cause alkylation and transcription blockage at novel DNA sites. in addition to sites characteristic of

  12. Characteristics of alternating current hopping conductivity in DNA sequences

    International Nuclear Information System (INIS)

    Song-Shan, Ma; Hui, Xu; Huan-You, Wang; Rui, Guo

    2009-01-01

    This paper presents a model to describe alternating current (AC) conductivity of DNA sequences, in which DNA is considered as a one-dimensional (1D) disordered system, and electrons transport via hopping between localized states. It finds that AC conductivity in DNA sequences increases as the frequency of the external electric field rises, and it takes the form of ø ac (ω) ∼ ω 2 ln 2 (1/ω). Also AC conductivity of DNA sequences increases with the increase of temperature, this phenomenon presents characteristics of weak temperature-dependence. Meanwhile, the AC conductivity in an off-diagonally correlated case is much larger than that in the uncorrelated case of the Anderson limit in low temperatures, which indicates that the off-diagonal correlations in DNA sequences have a great effect on the AC conductivity, while at high temperature the off-diagonal correlations no longer play a vital role in electric transport. In addition, the proportion of nucleotide pairs p also plays an important role in AC electron transport of DNA sequences. For p < 0.5, the conductivity of DNA sequence decreases with the increase of p, while for p ≥ 0.5, the conductivity increases with the increase of p. (cross-disciplinary physics and related areas of science and technology)

  13. Sequence-dependent DNA deformability studied using molecular dynamics simulations.

    Science.gov (United States)

    Fujii, Satoshi; Kono, Hidetoshi; Takenaka, Shigeori; Go, Nobuhiro; Sarai, Akinori

    2007-01-01

    Proteins recognize specific DNA sequences not only through direct contact between amino acids and bases, but also indirectly based on the sequence-dependent conformation and deformability of the DNA (indirect readout). We used molecular dynamics simulations to analyze the sequence-dependent DNA conformations of all 136 possible tetrameric sequences sandwiched between CGCG sequences. The deformability of dimeric steps obtained by the simulations is consistent with that by the crystal structures. The simulation results further showed that the conformation and deformability of the tetramers can highly depend on the flanking base pairs. The conformations of xATx tetramers show the most rigidity and are not affected by the flanking base pairs and the xYRx show by contrast the greatest flexibility and change their conformations depending on the base pairs at both ends, suggesting tetramers with the same central dimer can show different deformabilities. These results suggest that analysis of dimeric steps alone may overlook some conformational features of DNA and provide insight into the mechanism of indirect readout during protein-DNA recognition. Moreover, the sequence dependence of DNA conformation and deformability may be used to estimate the contribution of indirect readout to the specificity of protein-DNA recognition as well as nucleosome positioning and large-scale behavior of nucleic acids.

  14. Characteristics of alternating current hopping conductivity in DNA sequences

    Institute of Scientific and Technical Information of China (English)

    Ma Song-Shan; Xu Hui; Wang Huan-You; Guo Rui

    2009-01-01

    This paper presents a model to describe alternating current (AC) conductivity of DNA sequences,in which DNA is considered as a one-dimensional (1D) disordered system,and electrons transport via hopping between localized states.It finds that AC conductivity in DNA sequences increases as the frequency of the external electric field rises,and it takes the form of σac(ω)~ω2 ln2(1/ω).Also AC conductivity of DNA sequences increases with the increase of temperature,this phenomenon presents characteristics of weak temperature-dependence.Meanwhile,the AC conductivity in an off diagonally correlated case is much larger than that in the uncorrelated case of the Anderson limit in low temperatures,which indicates that the off-diagonal correlations in DNA sequences have a great effect on the AC conductivity,while at high temperature the off-diagonal correlations no longer play a vital role in electric transport. In addition,the proportion of nucleotide pairs p also plays an important role in AC electron transport of DNA sequences.For p<0.5,the conductivity of DNA sequence decreases with the increase of p,while for p > 0.5,the conductivity increases with the increase of p.

  15. Nucleotide sequence analysis of regions of adenovirus 5 DNA containing the origins of DNA replication

    International Nuclear Information System (INIS)

    Steenbergh, P.H.

    1979-01-01

    The purpose of the investigations described is the determination of nucleotide sequences at the molecular ends of the linear adenovirus type 5 DNA. Knowledge of the primary structure at the termini of this DNA molecule is of particular interest in the study of the mechanism of replication of adenovirus DNA. The initiation- and termination sites of adenovirus DNA replication are located at the ends of the DNA molecule. (Auth.)

  16. High-Throughput Block Optical DNA Sequence Identification.

    Science.gov (United States)

    Sagar, Dodderi Manjunatha; Korshoj, Lee Erik; Hanson, Katrina Bethany; Chowdhury, Partha Pratim; Otoupal, Peter Britton; Chatterjee, Anushree; Nagpal, Prashant

    2018-01-01

    Optical techniques for molecular diagnostics or DNA sequencing generally rely on small molecule fluorescent labels, which utilize light with a wavelength of several hundred nanometers for detection. Developing a label-free optical DNA sequencing technique will require nanoscale focusing of light, a high-throughput and multiplexed identification method, and a data compression technique to rapidly identify sequences and analyze genomic heterogeneity for big datasets. Such a method should identify characteristic molecular vibrations using optical spectroscopy, especially in the "fingerprinting region" from ≈400-1400 cm -1 . Here, surface-enhanced Raman spectroscopy is used to demonstrate label-free identification of DNA nucleobases with multiplexed 3D plasmonic nanofocusing. While nanometer-scale mode volumes prevent identification of single nucleobases within a DNA sequence, the block optical technique can identify A, T, G, and C content in DNA k-mers. The content of each nucleotide in a DNA block can be a unique and high-throughput method for identifying sequences, genes, and other biomarkers as an alternative to single-letter sequencing. Additionally, coupling two complementary vibrational spectroscopy techniques (infrared and Raman) can improve block characterization. These results pave the way for developing a novel, high-throughput block optical sequencing method with lossy genomic data compression using k-mer identification from multiplexed optical data acquisition. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Sequence length variation, indel costs, and congruence in sensitivity analysis

    DEFF Research Database (Denmark)

    Aagesen, Lone; Petersen, Gitte; Seberg, Ole

    2005-01-01

    The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which...... the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously...... preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation...

  18. Cloning, sequencing and expression of cDNA encoding growth ...

    Indian Academy of Sciences (India)

    Unknown

    of medicine, animal husbandry, fish farming and animal ..... northern pike (Esox lucius) growth hormone; Mol. Mar. Biol. ... prolactin 1-luciferase fusion gene in African catfish and ... 1988 Cloning and sequencing of cDNA that encodes goat.

  19. DNA Nucleotide Sequence Restricted by the RI Endonuclease

    Science.gov (United States)

    Hedgpeth, Joe; Goodman, Howard M.; Boyer, Herbert W.

    1972-01-01

    The sequence of DNA base pairs adjacent to the phosphodiester bonds cleaved by the RI restriction endonuclease in unmodified DNA from coliphage λ has been determined. The 5′-terminal nucleotide labeled with 32P and oligonucleotides up to the heptamer were analyzed from a pancreatic DNase digest. The following sequence of nucleotides adjacent to the RI break made in λ DNA was deduced from these data and from the 3′-dinucleotide sequence and nearest-neighbor analysis obtained from repair synthesis with the DNA polymerase of Rous sarcoma virus [Formula: see text] The RI endonuclease cleavage of the phosphodiester bonds (indicated by arrows) generates 5′-phosphoryls and short cohesive termini of four nucleotides, pApApTpT. The most striking feature of the sequence is its symmetry. PMID:4343974

  20. Capillary gel electrophoresis for rapid, high resolution DNA sequencing.

    OpenAIRE

    Swerdlow, H; Gesteland, R

    1990-01-01

    Capillary gel electrophoresis has been demonstrated for the separation and detection of DNA sequencing samples. Enzymatic dideoxy nucleotide chain termination was employed, using fluorescently tagged oligonucleotide primers and laser based on-column detection (limit of detection is 6,000 molecules per peak). Capillary gel separations were shown to be three times faster, with better resolution (2.4 x), and higher separation efficiency (5.4 x) than a conventional automated slab gel DNA sequenci...

  1. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments.

    Science.gov (United States)

    Dabney, Jesse; Knapp, Michael; Glocke, Isabelle; Gansauge, Marie-Theres; Weihmann, Antje; Nickel, Birgit; Valdiosera, Cristina; García, Nuria; Pääbo, Svante; Arsuaga, Juan-Luis; Meyer, Matthias

    2013-09-24

    Although an inverse relationship is expected in ancient DNA samples between the number of surviving DNA fragments and their length, ancient DNA sequencing libraries are strikingly deficient in molecules shorter than 40 bp. We find that a loss of short molecules can occur during DNA extraction and present an improved silica-based extraction protocol that enables their efficient retrieval. In combination with single-stranded DNA library preparation, this method enabled us to reconstruct the mitochondrial genome sequence from a Middle Pleistocene cave bear (Ursus deningeri) bone excavated at Sima de los Huesos in the Sierra de Atapuerca, Spain. Phylogenetic reconstructions indicate that the U. deningeri sequence forms an early diverging sister lineage to all Western European Late Pleistocene cave bears. Our results prove that authentic ancient DNA can be preserved for hundreds of thousand years outside of permafrost. Moreover, the techniques presented enable the retrieval of phylogenetically informative sequences from samples in which virtually all DNA is diminished to fragments shorter than 50 bp.

  2. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  3. An extended sequence specificity for UV-induced DNA damage.

    Science.gov (United States)

    Chung, Long H; Murray, Vincent

    2018-01-01

    The sequence specificity of UV-induced DNA damage was determined with a higher precision and accuracy than previously reported. UV light induces two major damage adducts: cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). Employing capillary electrophoresis with laser-induced fluorescence and taking advantages of the distinct properties of the CPDs and 6-4PPs, we studied the sequence specificity of UV-induced DNA damage in a purified DNA sequence using two approaches: end-labelling and a polymerase stop/linear amplification assay. A mitochondrial DNA sequence that contained a random nucleotide composition was employed as the target DNA sequence. With previous methodology, the UV sequence specificity was determined at a dinucleotide or trinucleotide level; however, in this paper, we have extended the UV sequence specificity to a hexanucleotide level. With the end-labelling technique (for 6-4PPs), the consensus sequence was found to be 5'-GCTC*AC (where C* is the breakage site); while with the linear amplification procedure, it was 5'-TCTT*AC. With end-labelling, the dinucleotide frequency of occurrence was highest for 5'-TC*, 5'-TT* and 5'-CC*; whereas it was 5'-TT* for linear amplification. The influence of neighbouring nucleotides on the degree of UV-induced DNA damage was also examined. The core sequences consisted of pyrimidine nucleotides 5'-CTC* and 5'-CTT* while an A at position "1" and C at position "2" enhanced UV-induced DNA damage. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  4. Sequence dependence of electron-induced DNA strand breakage revealed by DNA nanoarrays

    DEFF Research Database (Denmark)

    Keller, Adrian; Rackwitz, Jenny; Cauët, Emilie

    2014-01-01

    The electronic structure of DNA is determined by its nucleotide sequence, which is for instance exploited in molecular electronics. Here we demonstrate that also the DNA strand breakage induced by low-energy electrons (18 eV) depends on the nucleotide sequence. To determine the absolute cross sec...

  5. PREDICTION OF CHROMATIN STATES USING DNA SEQUENCE PROPERTIES

    KAUST Repository

    Bahabri, Rihab R.

    2013-06-01

    Activities of DNA are to a great extent controlled epigenetically through the internal struc- ture of chromatin. This structure is dynamic and is influenced by different modifications of histone proteins. Various combinations of epigenetic modification of histones pinpoint to different functional regions of the DNA determining the so-called chromatin states. How- ever, the characterization of chromatin states by the DNA sequence properties remains largely unknown. In this study we aim to explore whether DNA sequence patterns in the human genome can characterize different chromatin states. Using DNA sequence motifs we built binary classifiers for each chromatic state to eval- uate whether a given genomic sequence is a good candidate for belonging to a particular chromatin state. Of four classification algorithms (C4.5, Naive Bayes, Random Forest, and SVM) used for this purpose, the decision tree based classifiers (C4.5 and Random Forest) yielded best results among those we evaluated. Our results suggest that in general these models lack sufficient predictive power, although for four chromatin states (insulators, het- erochromatin, and two types of copy number variation) we found that presence of certain motifs in DNA sequences does imply an increased probability that such a sequence is one of these chromatin states.

  6. PNA Directed Sequence Addressed Self-Assembly of DNA Nanostructures

    DEFF Research Database (Denmark)

    Nielsen, Peter E.

    2008-01-01

    sequence specifically recognize another PNA oligomer. We describe how such three domain PNAs have utility for assembling dsDNA grid and clover leaf structures, and in combination with SNAP-tag technol. of protein dsDNA structures. (c) 2008 American Institute of Physics. [on SciFinder (R)] Udgivelsesdato...

  7. AU2EU : Privacy-preserving matching of DNA sequences

    NARCIS (Netherlands)

    Ignatenko, T.; Petkovic, M.; Naccache, D.; Sauveron, D.

    2014-01-01

    Advances in DNA sequencing create new opportunities for the use of DNA data in healthcare for diagnostic and treatment purposes, but also in many other health and well-being services. This brings new challenges with regard to the protection and use of this sensitive data. Thus, special technical

  8. Close sequence identity between ribosomal DNA episomes of the ...

    Indian Academy of Sciences (India)

    Unknown

    The restriction map of the E. dispar rDNA circle showed close simi- larity to EhR1 .... for 30 cycles in a DNA Thermal cycler (MJ Research,. USA). 3. .... by asterisk. The gaps show the variation between E. dispar and E. histolytica sequences.

  9. DNA Sequences of RAPD Fragments in the Egyptian cotton ...

    African Journals Online (AJOL)

    Random Amplified Polymorphic DNAs (RAPDs) is a DNA polymorphism assay based on the amplification of random DNA segments with single primers of arbitrary nucleotide sequence. Despite the fact that the RAPD technique has become a very powerful tool and has found use in numerous applications, yet, the nature of ...

  10. Effects of sequence on DNA wrapping around histones

    Science.gov (United States)

    Ortiz, Vanessa

    2011-03-01

    A central question in biophysics is whether the sequence of a DNA strand affects its mechanical properties. In epigenetics, these are thought to influence nucleosome positioning and gene expression. Theoretical and experimental attempts to answer this question have been hindered by an inability to directly resolve DNA structure and dynamics at the base-pair level. In our previous studies we used a detailed model of DNA to measure the effects of sequence on the stability of naked DNA under bending. Sequence was shown to influence DNA's ability to form kinks, which arise when certain motifs slide past others to form non-native contacts. Here, we have now included histone-DNA interactions to see if the results obtained for naked DNA are transferable to the problem of nucleosome positioning. Different DNA sequences interacting with the histone protein complex are studied, and their equilibrium and mechanical properties are compared among themselves and with the naked case. NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359).

  11. DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

    Science.gov (United States)

    Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

    2012-01-01

    DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.

  12. Highly multiplexed targeted DNA sequencing from single nuclei.

    Science.gov (United States)

    Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E

    2016-02-01

    Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.

  13. Googling DNA sequences on the World Wide Web.

    Science.gov (United States)

    Hajibabaei, Mehrdad; Singer, Gregory A C

    2009-11-10

    New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.

  14. Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

    Science.gov (United States)

    Shi, Jinming

    In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.

  15. Complete cDNA sequence coding for human docking protein

    Energy Technology Data Exchange (ETDEWEB)

    Hortsch, M; Labeit, S; Meyer, D I

    1988-01-11

    Docking protein (DP, or SRP receptor) is a rough endoplasmic reticulum (ER)-associated protein essential for the targeting and translocation of nascent polypeptides across this membrane. It specifically interacts with a cytoplasmic ribonucleoprotein complex, the signal recognition particle (SRP). The nucleotide sequence of cDNA encoding the entire human DP and its deduced amino acid sequence are given.

  16. Phylogenetic analysis of the genus Hordeum using repetitive DNA sequences

    DEFF Research Database (Denmark)

    Svitashev, S.; Bryngelsson, T.; Vershinin, A.

    1994-01-01

    A set of six cloned barley (Hordeum vulgare) repetitive DNA sequences was used for the analysis of phylogenetic relationships among 31 species (46 taxa) of the genus Hordeum, using molecular hybridization techniques. In situ hybridization experiments showed dispersed organization of the sequences...

  17. Full-Length Sequence of Mouse Acupuncture-Induced 1-L (Aig1l Gene Including Its Transcriptional Start Site

    Directory of Open Access Journals (Sweden)

    Mika Ohta

    2011-01-01

    Full Text Available We have been investigating the molecular efficacy of electroacupuncture (EA, which is one type of acupuncture therapy. In our previous molecular biological study of acupuncture, we found an EA-induced gene, named acupuncture-induced 1-L (Aig1l, in mouse skeletal muscle. The aims of this study consisted of identification of the full-length cDNA sequence of Aig1l including the transcriptional start site, determination of the tissue distribution of Aig1l and analysis of the effect of EA on Aig1l gene expression. We determined the complete cDNA sequence including the transcriptional start site via cDNA cloning with the cap site hunting method. We then analyzed the tissue distribution of Aig1l by means of northern blot analysis and real-time quantitative polymerase chain reaction. We used the semiquantitative reverse transcriptase-polymerase chain reaction to examine the effect of EA on Aig1l gene expression. Our results showed that the complete cDNA sequence of Aig1l was 6073 bp long, and the putative protein consisted of 962 amino acids. All seven tissues that we analyzed expressed the Aig1l gene. In skeletal muscle, EA induced expression of the Aig1l gene, with high expression observed after 3 hours of EA. Our findings thus suggest that the Aig1l gene may play a key role in the molecular mechanisms of EA efficacy.

  18. Sequence specificity of DNA cleavage by Micrococcus luteus γ endonuclease

    International Nuclear Information System (INIS)

    Hentosh, P.; Henner, W.D.; Reynolds, R.J.

    1985-01-01

    DNA fragments of defined sequence have been used to determine the sites of cleavage by γ-endonuclease activity in extracts prepared from Micrococcus luteus. End-labeled DNA restriction fragments of pBR322 DNA that had been irradiated under nitrogen in the presence of potassium iodide or t-butanol were treated with M. luteus γ endonuclease and analyzed on irradiated DNA preferentially at the positions of cytosines and thymines. DNA cleavage occurred immediately to the 3' side of pyrimidines in irradiated DNA and resulted in fragments that terminate in a 5'-phosphoryl group. These studies indicate that both altered cytosines and thymines may be important DNA lesions requiring repair after exposure to γ radiation

  19. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  20. Next-generation sequencing offers new insights into DNA degradation

    DEFF Research Database (Denmark)

    Overballe-Petersen, Søren; Orlando, Ludovic Antoine Alexandre; Willerslev, Eske

    2012-01-01

    The processes underlying DNA degradation are central to various disciplines, including cancer research, forensics and archaeology. The sequencing of ancient DNA molecules on next-generation sequencing platforms provides direct measurements of cytosine deamination, depurination and fragmentation...... rates that previously were obtained only from extrapolations of results from in vitro kinetic experiments performed over short timescales. For example, recent next-generation sequencing of ancient DNA reveals purine bases as one of the main targets of postmortem hydrolytic damage, through base...... elimination and strand breakage. It also shows substantially increased rates of DNA base-loss at guanosine. In this review, we argue that the latter results from an electron resonance structure unique to guanosine rather than adenosine having an extra resonance structure over guanosine as previously suggested....

  1. Mapping Base Modifications in DNA by Transverse-Current Sequencing

    Science.gov (United States)

    Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

    2018-02-01

    Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.

  2. Cloning and sequencing of Indian Water buffalo (Bubalus bubalis) interleukin-3 cDNA

    KAUST Repository

    Sugumar, Thennarasu

    2011-12-12

    Full-length cDNA (435 bp) of the interleukin-3(IL-3) gene of the Indian water buffalo was amplified by reverse transcriptase-polymerase chain reaction and sequenced. This sequence had 96% nucleotide identity and 92% amino acid identity with bovine IL-3. There are 10 amino acid substitutions in buffalo compared with that of bovine. The amino acid sequence of buffalo IL-3 also showed very high identity with that of other ruminants, indicating functional cross-reactivity. Structural homology modelling of buffalo IL-3 protein with human IL-3 showed the presence of five helical structures.

  3. Spreadsheet-based program for alignment of overlapping DNA sequences.

    Science.gov (United States)

    Anbazhagan, R; Gabrielson, E

    1999-06-01

    Molecular biology laboratories frequently face the challenge of aligning small overlapping DNA sequences derived from a long DNA segment. Here, we present a short program that can be used to adapt Excel spreadsheets as a tool for aligning DNA sequences, regardless of their orientation. The program runs on any Windows or Macintosh operating system computer with Excel 97 or Excel 98. The program is available for use as an Excel file, which can be downloaded from the BioTechniques Web site. Upon execution, the program opens a specially designed customized workbook and is capable of identifying overlapping regions between two sequence fragments and displaying the sequence alignment. It also performs a number of specialized functions such as recognition of restriction enzyme cutting sites and CpG island mapping without costly specialized software.

  4. Effect of temperature and cycle length on microbial competition in PHB-producing sequencing batch reactor

    NARCIS (Netherlands)

    Jiang, Y.; Marang, L.; Kleerebezem, R.; Muyzer, G.; van Loosdrecht, M.C.M.

    2011-01-01

    The impact of temperature and cycle length on microbial competition between polyhydroxybutyrate (PHB)-producing populations enriched in feast-famine sequencing batch reactors (SBRs) was investigated at temperatures of 20 °C and 30 °C, and in a cycle length range of 1-18 h. In this study, the

  5. Differences in a ribosomal DNA sequence of Strongylus species allows identification of single eggs.

    Science.gov (United States)

    Campbell, A J; Gasser, R B; Chilton, N B

    1995-03-01

    In the current study, molecular techniques were evaluated for the species identification of individual strongyle eggs. Adult worms of Strongylus edentatus, S. equinus and S. vulgaris were collected at necropsy from horses from Australia and the U.S.A. Genomic DNA was isolated and a ribosomal transcribed spacer (ITS-2) amplified and sequenced using polymerase chain reaction (PCR) techniques. The length of the ITS-2 sequence of S. edentatus, S. equinus and S. vulgaris ranged between 217 and 235 nucleotides. Extensive sequence analysis demonstrated a low degree (0-0.9%) of intraspecific variation in the ITS-2 for the Strongylus species examined, whereas the levels of interspecific differences (13-29%) were significantly greater. Interspecific differences in the ITS-2 sequences allowed unequivocal species identification of single worms and eggs using PCR-linked restriction fragment length polymorphism. These results demonstrate the potential of the ribosomal spacers as genetic markers for species identification of single strongyle eggs from horse faeces.

  6. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Directory of Open Access Journals (Sweden)

    Jason D Thompson

    Full Text Available Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  7. Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

    Science.gov (United States)

    Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

    2012-01-01

    Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.

  8. The algorithm of random length sequences synthesis for frame synchronization of digital television systems

    Directory of Open Access Journals (Sweden)

    Аndriy V. Sadchenko

    2015-12-01

    Full Text Available Digital television systems need to ensure that all digital signals processing operations are performed simultaneously and consistently. Frame synchronization dictated by the need to match phases of transmitter and receiver so that it would be possible to identify the start of a frame. As a frame synchronization signals are often used long length binary sequence with good aperiodic autocorrelation function. Aim: This work is dedicated to the development of the algorithm of random length sequences synthesis. Materials and Methods: The paper provides a comparative analysis of the known sequences, which can be used at present as synchronization ones, revealed their advantages and disadvantages. This work proposes the algorithm for the synthesis of binary synchronization sequences of random length with good autocorrelation properties based on noise generator with a uniform distribution law of probabilities. A "white noise" semiconductor generator is proposed to use as the initial material for the synthesis of binary sequences with desired properties. Results: The statistical analysis of the initial implementations of the "white noise" and synthesized sequences for frame synchronization of digital television is conducted. The comparative analysis of the synthesized sequences with known ones was carried out. The results show the benefits of obtained sequences in compare with known ones. The performed simulations confirm the obtained results. Conclusions: Thus, the search algorithm of binary synchronization sequences with desired autocorrelation properties received. According to this algorithm, the sequence can be longer in length and without length limitations. The received sync sequence can be used for frame synchronization in modern digital communication systems that will increase their efficiency and noise immunity.

  9. Cloning and sequence analysis of cDNA coding for rat nucleolar protein C23

    International Nuclear Information System (INIS)

    Ghaffari, S.H.; Olson, M.O.J.

    1986-01-01

    Using synthetic oligonucleotides as primers and probes, the authors have isolated and sequenced cDNA clones encoding protein C23, a putative nucleolus organizer protein. Poly(A + ) RNA was isolated from rat Novikoff hepatoma cells and enriched in C23 mRNA by sucrose density gradient ultracentrifugation. Two deoxyoligonuleotides, a 48- and a 27-mer, were synthesized on the basis of amino acid sequence from the C-terminal half of protein C23 and cDNA sequence data from CHO cell protein. The 48-mer was used a primer for synthesis of cDNA which was then inserted into plasmid pUC9. Transformed bacterial colonies were screened by hybridization with 32 P labeled 27-mer. Two clones among 5000 gave a strong positive signal. Plasmid DNAs from these clones were purified and characterized by blotting and nucleotide sequence analysis. The length of C23 mRNA was estimated to be 3200 bases in a northern blot analysis. The sequence of a 267 b.p. insert shows high homology with the CHO cDNA with only 9 nucleotide differences and an identical amino acid sequence. These studies indicate that this region of the protein is highly conserved

  10. Hibiscus latent Fort Pierce virus in Brazil and synthesis of its biologically active full-length cDNA clone.

    Science.gov (United States)

    Gao, Ruimin; Niu, Shengniao; Dai, Weifang; Kitajima, Elliot; Wong, Sek-Man

    2016-10-01

    A Brazilian isolate of Hibiscus latent Fort Pierce virus (HLFPV-BR) was firstly found in a hibiscus plant in Limeira, SP, Brazil. RACE PCR was carried out to obtain the full-length sequences of HLFPV-BR which is 6453 nucleotides and has more than 99.15 % of complete genomic RNA nucleotide sequence identity with that of HLFPV Japanese isolate. The genomic structure of HLFPV-BR is similar to other tobamoviruses. It includes a 5' untranslated region (UTR), followed by open reading frames encoding for a 128-kDa protein and a 188-kDa readthrough protein, a 38-kDa movement protein, 18-kDa coat protein, and a 3' UTR. Interestingly, the unique feature of poly(A) tract is also found within its 3'-UTR. Furthermore, from the total RNA extracted from the local lesions of HLFPV-BR-infected Chenopodium quinoa leaves, a biologically active, full-length cDNA clone encompassing the genome of HLFPV-BR was amplified and placed adjacent to a T7 RNA polymerase promoter. The capped in vitro transcripts from the cloned cDNA were infectious when mechanically inoculated into C. quinoa and Nicotiana benthamiana plants. This is the first report of the presence of an isolate of HLFPV in Brazil and the successful synthesis of a biologically active HLFPV-BR full-length cDNA clone.

  11. Chaos game representation (CGR)-walk model for DNA sequences

    International Nuclear Information System (INIS)

    Jie, Gao; Zhen-Yuan, Xu

    2009-01-01

    Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model. (cross-disciplinary physics and related areas of science and technology)

  12. WebPrInSeS: automated full-length clone sequence identification and verification using high-throughput sequencing data.

    Science.gov (United States)

    Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart

    2010-07-01

    High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users.

  13. Mitochondrial DNA sequence-based phylogenetic relationship ...

    Indian Academy of Sciences (India)

    cophaga ranges from 0.037–0.106 and 0.049–0.207 for COI and ND5 genes, respectively (tables 2 and 3). Analysis of genetic distance on the basis of sequence difference for both the mitochondrial genes shows very little genetic difference. The discrepancy in the phylogenetic trees based on individ- ual genes may be due ...

  14. Novel DNA sequence detection method based on fluorescence energy transfer

    International Nuclear Information System (INIS)

    Kobayashi, S.; Tamiya, E.; Karube, I.

    1987-01-01

    Recently the detection of specific DNA sequence, DNA analysis, has been becoming more important for diagnosis of viral genomes causing infections disease and human sequences related to inherited disorders. These methods typically involve electrophoresis, the immobilization of DNA on a solid support, hybridization to a complementary probe, the detection using labeled with /sup 32/P or nonisotopically with a biotin-avidin-enzyme system, and so on. These techniques are highly effective, but they are very time-consuming and expensive. A principle of fluorescene energy transfer is that the light energy from an excited donor (fluorophore) is transferred to an acceptor (fluorophore), if the acceptor exists in the vicinity of the donor and the excitation spectrum of donor overlaps the emission spectrum of acceptor. In this study, the fluorescence energy transfer was applied to the detection of specific DNA sequence using the hybridization method. The analyte, single-stranded DNA labeled with the donor fluorophore is hybridized to a probe DNA labeled with the acceptor. Because of the complementary DNA duplex formation, two fluorophores became to be closed to each other, and the fluorescence energy transfer was occurred

  15. Management of High-Throughput DNA Sequencing Projects: Alpheus.

    Science.gov (United States)

    Miller, Neil A; Kingsmore, Stephen F; Farmer, Andrew; Langley, Raymond J; Mudge, Joann; Crow, John A; Gonzalez, Alvaro J; Schilkey, Faye D; Kim, Ryan J; van Velkinburgh, Jennifer; May, Gregory D; Black, C Forrest; Myers, M Kathy; Utsey, John P; Frost, Nicholas S; Sugarbaker, David J; Bueno, Raphael; Gullans, Stephen R; Baxter, Susan M; Day, Steve W; Retzel, Ernest F

    2008-12-26

    High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem's SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.

  16. DNA-PK dependent targeting of DNA-ends to a protein complex assembled on matrix attachment region DNA sequences

    International Nuclear Information System (INIS)

    Mauldin, S.K.; Getts, R.C.; Perez, M.L.; DiRienzo, S.; Stamato, T.D.

    2003-01-01

    Full text: We find that nuclear protein extracts from mammalian cells contain an activity that allows DNA ends to associate with circular pUC18 plasmid DNA. This activity requires the catalytic subunit of DNA-PK (DNA-PKcs) and Ku since it was not observed in mutants lacking Ku or DNA-PKcs but was observed when purified Ku/DNA-PKcs was added to these mutant extracts. Competition experiments between pUC18 and pUC18 plasmids containing various nuclear matrix attachment region (MAR) sequences suggest that DNA ends preferentially associate with plasmids containing MAR DNA sequences. At a 1:5 mass ratio of MAR to pUC18, approximately equal amounts of DNA end binding to the two plasmids were observed, while at a 1:1 ratio no pUC18 end-binding was observed. Calculation of relative binding activities indicates that DNA-end binding activities to MAR sequences was 7 to 21 fold higher than pUC18. Western analysis of proteins bound to pUC18 and MAR plasmids indicates that XRCC4, DNA ligase IV, scaffold attachment factor A, topoisomerase II, and poly(ADP-ribose) polymerase preferentially associate with the MAR plasmid in the absence or presence of DNA ends. In contrast, Ku and DNA-PKcs were found on the MAR plasmid only in the presence of DNA ends. After electroporation of a 32P-labeled DNA probe into human cells and cell fractionation, 87% of the total intercellular radioactivity remained in nuclei after a 0.5M NaCl extraction suggesting the probe was strongly bound in the nucleus. The above observations raise the possibility that DNA-PK targets DNA-ends to a repair and/or DNA damage signaling complex which is assembled on MAR sites in the nucleus

  17. Applications of statistical physics and information theory to the analysis of DNA sequences

    Science.gov (United States)

    Grosse, Ivo

    2000-10-01

    DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

  18. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    Directory of Open Access Journals (Sweden)

    Baldwin Stephen A

    2011-03-01

    Full Text Available Abstract Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  19. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities.

    Science.gov (United States)

    Troshin, Peter V; Postis, Vincent Lg; Ashworth, Denise; Baldwin, Stephen A; McPherson, Michael J; Barton, Geoffrey J

    2011-03-07

    Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.

  20. Dialects of the DNA uptake sequence in Neisseriaceae.

    Directory of Open Access Journals (Sweden)

    Stephan A Frye

    2013-04-01

    Full Text Available In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS, which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS-dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5'-CTG-3' is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS-dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic

  1. Dialects of the DNA Uptake Sequence in Neisseriaceae

    Science.gov (United States)

    Frye, Stephan A.; Nilsen, Mariann; Tønjum, Tone; Ambur, Ole Herman

    2013-01-01

    In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS), which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS–dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5′-CTG-3′ is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS–dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic transformation

  2. On Sequence Lengths of Some Special External Exclusive OR Type LFSR Structures – Study and Analysis

    Directory of Open Access Journals (Sweden)

    A Ahmad

    2014-12-01

    Full Text Available The study of the length of pseudo-random binary sequences generated by Linear- Feedback Shift Registers (LFSRs plays an important role in the design approaches of built-in selftest, cryptosystems, and other applications. However, certain LFSR structures might not be appropriate in some situations. Given that determining the length of generated pseudo-random binary sequence is a complex task, therefore, before using an LFSR structure, it is essential to investigate the length and the properties of the sequence. This paper investigates some conditions and LFSR’s structures, which restrict the pseudo-random binary sequences’ generation to a certain fixed length. The outcomes of this paper are presented in the form of theorems, simulations, and analyses. We believe that these outcomes are of great importance to the designers of built-in self-test equipment, cryptosystems, and other applications such as radar, CDMA, error correction, and Monte Carlo simulation.

  3. Mitochondrial DNA sequence evolution in the Arctoidea.

    OpenAIRE

    Zhang, Y P; Ryder, O A

    1993-01-01

    Some taxa in the superfamily Arctoidea, such as the giant panda and the lesser panda, have presented puzzles to taxonomists. In the present study, approximately 397 bases of the cytochrome b gene, 364 bases of the 12S rRNA gene, and 74 bases of the tRNA(Thr) and tRNA(Pro) genes from the giant panda, lesser panda, kinkajou, raccoon, coatimundi, and all species of the Ursidae were sequenced. The high transition/transversion ratios in cytochrome b and RNA genes prior to saturation suggest that t...

  4. Noninvasive prenatal paternity testing (NIPAT) through maternal plasma DNA sequencing

    DEFF Research Database (Denmark)

    Jiang, Haojun; Xie, Yifan; Li, Xuchao

    2016-01-01

    developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels......Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) have been already used to perform noninvasive prenatal paternity testing from maternal plasma DNA. The frequently used technologies were PCR followed by capillary electrophoresis and SNP typing array, respectively. Here, we...... paternity test using STR multiplex system. Our study here proved that the maternal plasma DNA sequencing-based technology is feasible and accurate in determining paternity, which may provide an alternative in forensic application in the future....

  5. Statistical assignment of DNA sequences using Bayesian phylogenetics

    DEFF Research Database (Denmark)

    Terkelsen, Kasper Munch; Boomsma, Wouter Krogh; Huelsenbeck, John P.

    2008-01-01

    We provide a new automated statistical method for DNA barcoding based on a Bayesian phylogenetic analysis. The method is based on automated database sequence retrieval, alignment, and phylogenetic analysis using a custom-built program for Bayesian phylogenetic analysis. We show on real data...... that the method outperforms Blast searches as a measure of confidence and can help eliminate 80% of all false assignment based on best Blast hit. However, the most important advance of the method is that it provides statistically meaningful measures of confidence. We apply the method to a re......-analysis of previously published ancient DNA data and show that, with high statistical confidence, most of the published sequences are in fact of Neanderthal origin. However, there are several cases of chimeric sequences that are comprised of a combination of both Neanderthal and modern human DNA....

  6. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  7. Optimal choice of word length when comparing two Markov sequences using a χ 2-statistic.

    Science.gov (United States)

    Bai, Xin; Tang, Kujin; Ren, Jie; Waterman, Michael; Sun, Fengzhu

    2017-10-03

    Alignment-free sequence comparison using counts of word patterns (grams, k-tuples) has become an active research topic due to the large amount of sequence data from the new sequencing technologies. Genome sequences are frequently modelled by Markov chains and the likelihood ratio test or the corresponding approximate χ 2 -statistic has been suggested to compare two sequences. However, it is not known how to best choose the word length k in such studies. We develop an optimal strategy to choose k by maximizing the statistical power of detecting differences between two sequences. Let the orders of the Markov chains for the two sequences be r 1 and r 2 , respectively. We show through both simulations and theoretical studies that the optimal k= max(r 1 ,r 2 )+1 for both long sequences and next generation sequencing (NGS) read data. The orders of the Markov chains may be unknown and several methods have been developed to estimate the orders of Markov chains based on both long sequences and NGS reads. We study the power loss of the statistics when the estimated orders are used. It is shown that the power loss is minimal for some of the estimators of the orders of Markov chains. Our studies provide guidelines on choosing the optimal word length for the comparison of Markov sequences.

  8. Effect of genomic long-range correlations on DNA persistence length: from theory to single molecule experiments.

    Science.gov (United States)

    Moukhtar, Julien; Faivre-Moskalenko, Cendrine; Milani, Pascale; Audit, Benjamin; Vaillant, Cedric; Fontaine, Emeline; Mongelard, Fabien; Lavorel, Guillaume; St-Jean, Philippe; Bouvet, Philippe; Argoul, Françoise; Arneodo, Alain

    2010-04-22

    Sequence dependency of DNA intrinsic bending properties has been emphasized as a possible key ingredient to in vivo chromatin organization. We use atomic force microscopy (AFM) in air and liquid to image intrinsically straight (synthetic), uncorrelated (hepatitis C RNA virus) and persistent long-range correlated (human) DNA fragments in various ionic conditions such that the molecules freely equilibrate on the mica surface before being captured in a particular conformation. 2D thermodynamic equilibrium is experimentally verified by a detailed statistical analysis of the Gaussian nature of the DNA bend angle fluctuations. We show that the worm-like chain (WLC) model, commonly used to describe the average conformation of long semiflexible polymers, reproduces remarkably well the persistence length estimates for the first two molecules as consistently obtained from (i) mean square end-to-end distance measurement and (ii) mean projection of the end-to-end vector on the initial orientation. Whatever the operating conditions (air or liquid, concentration of metal cations Mg(2+) and/or Ni(2+)), the persistence length found for the uncorrelated viral DNA underestimates the value obtained for the straight DNA. We show that this systematic difference is the signature of the presence of an uncorrelated structural intrinsic disorder in the hepatitis C virus (HCV) DNA fragment that superimposes on local curvatures induced by thermal fluctuations and that only the entropic disorder depends upon experimental conditions. In contrast, the WLC model fails to describe the human DNA conformations. We use a mean-field extension of the WLC model to account for the presence of long-range correlations (LRC) in the intrinsic curvature disorder of human genomic DNA: the stronger the LRC, the smaller the persistence length. The comparison of AFM imaging of human DNA with LRC DNA simulations confirms that the rather small mean square end-to-end distance observed, particularly for G

  9. Statistical properties and fractals of nucleotide clusters in DNA sequences

    International Nuclear Information System (INIS)

    Sun Tingting; Zhang Linxi; Chen Jin; Jiang Zhouting

    2004-01-01

    Statistical properties of nucleotide clusters in DNA sequences and their fractals are investigated in this paper. The average size of nucleotide clusters in non-coding sequence is larger than that in coding sequence. We investigate the cluster-size distribution P(S) for human chromosomes 21 and 22, and the results are different from previous works. The cluster-size distribution P(S 1 +S 2 ) with the total size of sequential Pu-cluster and Py-cluster S 1 +S 2 is studied. We observe that P(S 1 +S 2 ) follows an exponential decay both in coding and non-coding sequences. However, we get different results for human chromosomes 21 and 22. The probability distribution P(S 1 ,S 2 ) of nucleotide clusters with the size of sequential Pu-cluster and Py-cluster S 1 and S 2 respectively, is also examined. In the meantime, some of the linear correlations are obtained in the double logarithmic plots of the fluctuation F(l) versus nucleotide cluster distance l along the DNA chain. The power spectrums of nucleotide clusters are also discussed, and it is concluded that the curves are flat and hardly changed and the 1/3 frequency is neither observed in coding sequence nor in non-coding sequence. These investigations can provide some insights into the nucleotide clusters of DNA sequences

  10. Complete mitochondrial DNA sequence of the Eastern keelback mullet Liza affinis.

    Science.gov (United States)

    Gong, Xiaoling; Zhu, Wenjia; Bao, Baolong

    2016-05-01

    Eastern keelback mullet (Liza affinis) inhabits inlet waters and estuaries of rivers. In this paper, we initially determined the complete mitochondrial genome of Liza affinis. The entire mtDNA sequence is 16,831 bp in length, including 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes and 1 putative control region. Its order and numbers of genes are similar to most bony fishes.

  11. [Sequence of the ITS region of nuclear ribosomal DNA(nrDNA) in Xinjiang wild Dianthus and its phylogenetic relationship].

    Science.gov (United States)

    Zhang, Lu; Cai, You-Ming; Zhuge, Qiang; Zou, Hui-Yu; Huang, Min-Ren

    2002-06-01

    Xinjiang is a center of distribution and differentiation of genus Dianthus in China, and has a great deal of species resources. The sequences of ITS region (including ITS-1, 5.8S rDNA and ITS-2) of nuclear ribosomal DNA from 8 species of genus Dianthus wildly distributed in Xinjiang were determined by direct sequencing of PCR products. The result showed that the size of the ITS of Dianthus is from 617 to 621 bp, and the length variation is only 4 bp. There are very high homogeneous (97.6%-99.8%) sequences between species, and about 80% homogeneous sequences between genus Dianthus and outgroup. The sequences of ITS in genus Dianthus are relatively conservative. In general, there are more conversion than transition in the variation sites among genus Dianthus. The conversion rates are relatively high, and the ratios of conversion/transition are 1.0-3.0. On the basis of phylogenetic analysis of nucleotide sequences the species of Dianthus in China would be divided into three sections. There is a distant relationship between sect. Barbulatum Williams and sect. Dianthus and between sect. Barbulatum Williams and sect. Fimbriatum Williams, and there is a close relationship between sect. Dianthus and sect. Fimbriatum Williams. From the phylogenetic tree of ITS it was found that the origin of sect. Dianthusis is earlier than that of sect. Fimbriatum Williams and sect. Barbulatum Williams.

  12. DNA sequence responsible for the amplification of adjacent genes.

    Science.gov (United States)

    Pasion, S G; Hartigan, J A; Kumar, V; Biswas, D K

    1987-10-01

    A 10.3-kb DNA fragment in the 5'-flanking region of the rat prolactin (rPRL) gene was isolated from F1BGH(1)2C1, a strain of rat pituitary tumor cells (GH cells) that produces prolactin in response to 5-bromodeoxyuridine (BrdU). Following transfection and integration into genomic DNA of recipient mouse L cells, this DNA induced amplification of the adjacent thymidine kinase gene from Herpes simplex virus type 1 (HSV1TK). We confirmed the ability of this "Amplicon" sequence to induce amplification of other linked or unlinked genes in DNA-mediated gene transfer studies. When transferred into the mouse L cells with the 10.3-5'rPRL gene sequence of BrdU-responsive cells, both the human growth hormone and the HSV1TK genes are amplified in response to 5-bromodeoxyuridine. This observation is substantiated by BrdU-induced amplification of the cotransferred bacterial Neo gene. Cotransfection studies reveal that the BrdU-induced amplification capability is associated with a 4-kb DNA sequence in the 5'-flanking region of the rPRL gene of BrdU-responsive cells. These results demonstrate that genes of heterologous origin, linked or unlinked, and selected or unselected, can be coamplified when located within the amplification boundary of the Amplicon sequence.

  13. Anaplasma phagocytophilum in Danish sheep: confirmation by DNA sequencing

    Directory of Open Access Journals (Sweden)

    Thamsborg Stig M

    2009-12-01

    Full Text Available Abstract Background The presence of Anaplasma phagocytophilum, an Ixodes ricinus transmitted bacterium, was investigated in two flocks of Danish grazing lambs. Direct PCR detection was performed on DNA extracted from blood and serum with subsequent confirmation by DNA sequencing. Methods 31 samples obtained from clinically normal lambs in 2000 from Fussingø, Jutland and 12 samples from ten lambs and two ewes from a clinical outbreak at Feddet, Zealand in 2006 were included in the study. Some of the animals from Feddet had shown clinical signs of polyarthritis and general unthriftiness prior to sampling. DNA extraction was optimized from blood and serum and detection achieved by a 16S rRNA targeted PCR with verification of the product by DNA sequencing. Results Five DNA extracts were found positive by PCR, including two samples from 2000 and three from 2006. For both series of samples the product was verified as A. phagocytophilum by DNA sequencing. Conclusions A. phagocytophilum was detected by molecular methods for the first time in Danish grazing lambs during the two seasons investigated (2000 and 2006.

  14. Isolation of a sex-linked DNA sequence in cranes.

    Science.gov (United States)

    Duan, W; Fuerst, P A

    2001-01-01

    A female-specific DNA fragment (CSL-W; crane sex-linked DNA on W chromosome) was cloned from female whooping cranes (Grus americana). From the nucleotide sequence of CSL-W, a set of polymerase chain reaction (PCR) primers was identified which amplify a 227-230 bp female-specific fragment from all existing crane species and some other noncrane species. A duplicated versions of the DNA segment, which is found to have a larger size (231-235 bp) than CSL-W in both sexes, was also identified, and was designated CSL-NW (crane sex-linked DNA on non-W chromosome). The nucleotide similarity between the sequences of CSL-W and CSL-NW from whooping cranes was 86.3%. The CSL primers do not amplify any sequence from mammalian DNA, limiting the potential for contamination from human sources. Using the CSL primers in combination with a quick DNA extraction method allows the noninvasive identification of crane gender in less than 10 h. A test of the methodology was carried out on fully developed body feathers from 18 captive cranes and resulted in 100% successful identification.

  15. DNA Qualification Workflow for Next Generation Sequencing of Histopathological Samples

    Science.gov (United States)

    Simbolo, Michele; Gottardi, Marisa; Corbo, Vincenzo; Fassan, Matteo; Mafficini, Andrea; Malpeli, Giorgio; Lawlor, Rita T.; Scarpa, Aldo

    2013-01-01

    Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA) and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR) was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF) tissues, 6 formalin-fixed paraffin-embedded (FFPE) tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard workflow for

  16. DNA qualification workflow for next generation sequencing of histopathological samples.

    Directory of Open Access Journals (Sweden)

    Michele Simbolo

    Full Text Available Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF tissues, 6 formalin-fixed paraffin-embedded (FFPE tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard

  17. A putative peroxidase cDNA from turnip and analysis of the encoded protein sequence.

    Science.gov (United States)

    Romero-Gómez, S; Duarte-Vázquez, M A; García-Almendárez, B E; Mayorga-Martínez, L; Cervantes-Avilés, O; Regalado, C

    2008-12-01

    A putative peroxidase cDNA was isolated from turnip roots (Brassica napus L. var. purple top white globe) by reverse transcriptase-polymerase chain reaction (RT-PCR) and rapid amplification of cDNA ends (RACE). Total RNA extracted from mature turnip roots was used as a template for RT-PCR, using a degenerated primer designed to amplify the highly conserved distal motif of plant peroxidases. The resulting partial sequence was used to design the rest of the specific primers for 5' and 3' RACE. Two cDNA fragments were purified, sequenced, and aligned with the partial sequence from RT-PCR, and a complete overlapping sequence was obtained and labeled as BbPA (Genbank Accession No. AY423440, named as podC). The full length cDNA is 1167bp long and contains a 1077bp open reading frame (ORF) encoding a 358 deduced amino acid peroxidase polypeptide. The putative peroxidase (BnPA) showed a calculated Mr of 34kDa, and isoelectric point (pI) of 4.5, with no significant identity with other reported turnip peroxidases. Sequence alignment showed that only three peroxidases have a significant identity with BnPA namely AtP29a (84%), and AtPA2 (81%) from Arabidopsis thaliana, and HRPA2 (82%) from horseradish (Armoracia rusticana). Work is in progress to clone this gene into an adequate host to study the specific role and possible biotechnological applications of this alternative peroxidase source.

  18. On the normalization of the minimum free energy of RNAs by sequence length.

    Directory of Open Access Journals (Sweden)

    Edoardo Trotta

    Full Text Available The minimum free energy (MFE of ribonucleic acids (RNAs increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size.

  19. On the normalization of the minimum free energy of RNAs by sequence length.

    Science.gov (United States)

    Trotta, Edoardo

    2014-01-01

    The minimum free energy (MFE) of ribonucleic acids (RNAs) increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size.

  20. Compilation and analysis of Escherichia coli promoter DNA sequences.

    OpenAIRE

    Hawley, D K; McClure, W R

    1983-01-01

    The DNA sequence of 168 promoter regions (-50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5' end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter ...

  1. Evaluation of DNA bending models in their capacity to predict electrophoretic migration anomalies of satellite DNA sequences.

    Science.gov (United States)

    Matyášek, Roman; Fulneček, Jaroslav; Kovařík, Aleš

    2013-09-01

    DNA containing a sequence that generates a local curvature exhibits a pronounced retardation in electrophoretic mobility. Various theoretical models have been proposed to explain relationship between DNA structural features and migration anomaly. Here, we studied the capacity of 15 static wedge-bending models to predict electrophoretic behavior of 69 satellite monomers derived from four divergent families. All monomers exhibited retarded mobility in PAGE corresponding to retardation factors ranging 1.02-1.54. The curvature varied both within and across the groups and correlated with the number, position, and lengths of A-tracts. Two dinucleotide models provided strong correlation between gel mobility and curvature prediction; two trinucleotide models were satisfactory while remaining dinucleotide models provided intermediate results with reliable prediction for subsets of sequences only. In some cases, similarly shaped molecules exhibited relatively large differences in mobility and vice versa. Generally less accurate predictions were obtained in groups containing less homogeneous sequences possessing distinct structural features. In conclusion, relatively universal theoretical models were identified suitable for the analysis of natural sequences known to harbor relatively moderate curvature. These models could be potentially applied to genome wide studies. However, in silico predictions should be viewed in context of experimental measurement of intrinsic DNA curvature. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Gene organization in rice revealed by full-length cDNA mapping and gene expression analysis through microarray.

    Directory of Open Access Journals (Sweden)

    Kouji Satoh

    Full Text Available Rice (Oryza sativa L. is a model organism for the functional genomics of monocotyledonous plants since the genome size is considerably smaller than those of other monocotyledonous plants. Although highly accurate genome sequences of indica and japonica rice are available, additional resources such as full-length complementary DNA (FL-cDNA sequences are also indispensable for comprehensive analyses of gene structure and function. We cross-referenced 28.5K individual loci in the rice genome defined by mapping of 578K FL-cDNA clones with the 56K loci predicted in the TIGR genome assembly. Based on the annotation status and the presence of corresponding cDNA clones, genes were classified into 23K annotated expressed (AE genes, 33K annotated non-expressed (ANE genes, and 5.5K non-annotated expressed (NAE genes. We developed a 60mer oligo-array for analysis of gene expression from each locus. Analysis of gene structures and expression levels revealed that the general features of gene structure and expression of NAE and ANE genes were considerably different from those of AE genes. The results also suggested that the cloning efficiency of rice FL-cDNA is associated with the transcription activity of the corresponding genetic locus, although other factors may also have an effect. Comparison of the coverage of FL-cDNA among gene families suggested that FL-cDNA from genes encoding rice- or eukaryote-specific domains, and those involved in regulatory functions were difficult to produce in bacterial cells. Collectively, these results indicate that rice genes can be divided into distinct groups based on transcription activity and gene structure, and that the coverage bias of FL-cDNA clones exists due to the incompatibility of certain eukaryotic genes in bacteria.

  3. Polyfluorophore Labels on DNA: Dramatic Sequence Dependence of Quenching

    Science.gov (United States)

    Teo, Yin Nah; Wilson, James N.

    2010-01-01

    We describe studies carried out in the DNA context to test how a common fluorescence quencher, dabcyl, interacts with oligodeoxynu-cleoside fluorophores (ODFs)—a system of stacked, electronically interacting fluorophores built on a DNA scaffold. We tested twenty different tetrameric ODF sequences containing varied combinations and orderings of pyrene (Y), benzopyrene (B), perylene (E), dimethylaminostilbene (D), and spacer (S) monomers conjugated to the 3′ end of a DNA oligomer. Hybridization of this probe sequence to a dabcyl-labeled complementary strand resulted in strong quenching of fluorescence in 85% of the twenty ODF sequences. The high efficiency of quenching was also established by their large Stern–Volmer constants (KSV) of between 2.1 × 104 and 4.3 × 105M−1, measured with a free dabcyl quencher. Interestingly, quenching of ODFs displayed strong sequence dependence. This was particularly evident in anagrams of ODF sequences; for example, the sequence BYDS had a KSV that was approximately two orders of magnitude greater than that of BSDY, which has the same dye composition. Other anagrams, for example EDSY and ESYD, also displayed different responses upon quenching by dabcyl. Analysis of spectra showed that apparent excimer and exciplex emission bands were quenched with much greater efficiency compared to monomer emission bands by at least an order of magnitude. This suggests an important role played by delocalized excited states of the π stack of fluorophores in the amplified quenching of fluorescence. PMID:19780115

  4. High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

    Directory of Open Access Journals (Sweden)

    Brian Wade Jamandre

    2014-01-01

    Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.

  5. Detecting differential DNA methylation from sequencing of bisulfite converted DNA of diverse species.

    Science.gov (United States)

    Huh, Iksoo; Wu, Xin; Park, Taesung; Yi, Soojin V

    2017-07-21

    DNA methylation is one of the most extensively studied epigenetic modifications of genomic DNA. In recent years, sequencing of bisulfite-converted DNA, particularly via next-generation sequencing technologies, has become a widely popular method to study DNA methylation. This method can be readily applied to a variety of species, dramatically expanding the scope of DNA methylation studies beyond the traditionally studied human and mouse systems. In parallel to the increasing wealth of genomic methylation profiles, many statistical tools have been developed to detect differentially methylated loci (DMLs) or differentially methylated regions (DMRs) between biological conditions. We discuss and summarize several key properties of currently available tools to detect DMLs and DMRs from sequencing of bisulfite-converted DNA. However, the majority of the statistical tools developed for DML/DMR analyses have been validated using only mammalian data sets, and less priority has been placed on the analyses of invertebrate or plant DNA methylation data. We demonstrate that genomic methylation profiles of non-mammalian species are often highly distinct from those of mammalian species using examples of honey bees and humans. We then discuss how such differences in data properties may affect statistical analyses. Based on these differences, we provide three specific recommendations to improve the power and accuracy of DML and DMR analyses of invertebrate data when using currently available statistical tools. These considerations should facilitate systematic and robust analyses of DNA methylation from diverse species, thus advancing our understanding of DNA methylation. © The Author 2017. Published by Oxford University Press.

  6. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney; Kodzius, Rimantas; Tan, Yue Ying; Tay, Alice; Tay, Boon-Hui; Venkatesh, Byrappa

    2012-01-01

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the 'oligo-capping' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5'-ESTs and 41,317 3'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for whole

  7. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney

    2012-10-08

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the \\'oligo-capping\\' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5\\'-ESTs and 41,317 3\\'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for

  8. The cDNA sequence of a neutral horseradish peroxidase.

    Science.gov (United States)

    Bartonek-Roxå, E; Eriksson, H; Mattiasson, B

    1991-02-16

    A cDNA clone encoding a horseradish (Armoracia rusticana) peroxidase has been isolated and characterized. The cDNA contains 1378 nucleotides excluding the poly(A) tail and the deduced protein contains 327 amino acids which includes a 28 amino acid leader sequence. The predicted amino acid sequence is nine amino acids shorter than the major isoenzyme belonging to the horseradish peroxidase C group (HRP-C) and the sequence shows 53.7% identity with this isoenzyme. The described clone encodes nine cysteines of which eight correspond well with the cysteines found in HRP-C. Five potential N-glycosylation sites with the general sequence Asn-X-Thr/Ser are present in the deduced sequence. Compared to the earlier described HRP-C this is three glycosylation sites less. The shorter sequence and fewer N-glycosylation sites give the native isoenzyme a molecular weight of several thousands less than the horseradish peroxidase C isoenzymes. Comparison with the net charge value of HRP-C indicates that the described cDNA clone encodes a peroxidase which has either the same or a slightly less basic pI value, depending on whether the encoded protein is N-terminally blocked or not. This excludes the possibility that HRP-n could belong to either the HRP-A, -D or -E groups. The low sequence identity (53.7%) with HRP-C indicates that the described clone does not belong to the HRP-C isoenzyme group and comparison of the total amino acid composition with the HRP-B group does not place the described clone within this isoenzyme group. Our conclusion is that the described cDNA clone encodes a neutral horseradish peroxidase which belongs to a new, not earlier described, horseradish peroxidase group.

  9. RNA-DNA sequence differences spell genetic code ambiguities

    DEFF Research Database (Denmark)

    Bentin, Thomas; Nielsen, Michael L

    2013-01-01

    A recent paper in Science by Li et al. 2011(1) reports widespread sequence differences in the human transcriptome between RNAs and their encoding genes termed RNA-DNA differences (RDDs). The findings could add a new layer of complexity to gene expression but the study has been criticized. ...

  10. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    RPS16 of eukaryote is a component of the 40S small ribosomal subunit encoded by RPS16 gene and is also a homolog of prokaryotic RPS9. The cDNA and genomic sequence of RPS16 was cloned successfully for the first time from the Giant Panda (Ailuropoda melanoleuca) using reverse transcription-polymerase chain ...

  11. DNA sequence and prokaryotic expression analysis of vitellogenin ...

    African Journals Online (AJOL)

    In this study, the DNA sequence of vitellogenin from Antheraea pernyi (Ap-Vg) was identified and its functional domain (30-740 aa, Ap-Vg-1) was expressed in Escherichia coli BL21 (DE3) cells. The recombinant Ap-Vg-1 proteins were purified and used for antibody preparation. The results showed that the intact DNA ...

  12. (Brassicaceae) based on nuclear ribosomal ITS DNA sequences

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Genetics; Volume 93; Issue 2. Phylogeny and biogeography of Alyssum (Brassicaceae) based on nuclear ribosomal ITS DNA sequences. Yan Li Yan Kong Zhe Zhang Yanqiang Yin Bin Liu Guanghui Lv Xiyong Wang. Research Article Volume 93 Issue 2 August 2014 pp 313-323 ...

  13. Aptaligner: automated software for aligning pseudorandom DNA X-aptamers from next-generation sequencing data.

    Science.gov (United States)

    Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E

    2014-06-10

    Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.

  14. High Performance Systolic Array Core Architecture Design for DNA Sequencer

    Directory of Open Access Journals (Sweden)

    Saiful Nurdin Dayana

    2018-01-01

    Full Text Available This paper presents a high performance systolic array (SA core architecture design for Deoxyribonucleic Acid (DNA sequencer. The core implements the affine gap penalty score Smith-Waterman (SW algorithm. This time-consuming local alignment algorithm guarantees optimal alignment between DNA sequences, but it requires quadratic computation time when performed on standard desktop computers. The use of linear SA decreases the time complexity from quadratic to linear. In addition, with the exponential growth of DNA databases, the SA architecture is used to overcome the timing issue. In this work, the SW algorithm has been captured using Verilog Hardware Description Language (HDL and simulated using Xilinx ISIM simulator. The proposed design has been implemented in Xilinx Virtex -6 Field Programmable Gate Array (FPGA and improved in the core area by 90% reduction.

  15. Sequence heterogeneity accelerates protein search for targets on DNA

    International Nuclear Information System (INIS)

    Shvets, Alexey A.; Kolomeisky, Anatoly B.

    2015-01-01

    The process of protein search for specific binding sites on DNA is fundamentally important since it marks the beginning of all major biological processes. We present a theoretical investigation that probes the role of DNA sequence symmetry, heterogeneity, and chemical composition in the protein search dynamics. Using a discrete-state stochastic approach with a first-passage events analysis, which takes into account the most relevant physical-chemical processes, a full analytical description of the search dynamics is obtained. It is found that, contrary to existing views, the protein search is generally faster on DNA with more heterogeneous sequences. In addition, the search dynamics might be affected by the chemical composition near the target site. The physical origins of these phenomena are discussed. Our results suggest that biological processes might be effectively regulated by modifying chemical composition, symmetry, and heterogeneity of a genome

  16. Sequence heterogeneity accelerates protein search for targets on DNA

    Energy Technology Data Exchange (ETDEWEB)

    Shvets, Alexey A.; Kolomeisky, Anatoly B., E-mail: tolya@rice.edu [Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005 (United States)

    2015-12-28

    The process of protein search for specific binding sites on DNA is fundamentally important since it marks the beginning of all major biological processes. We present a theoretical investigation that probes the role of DNA sequence symmetry, heterogeneity, and chemical composition in the protein search dynamics. Using a discrete-state stochastic approach with a first-passage events analysis, which takes into account the most relevant physical-chemical processes, a full analytical description of the search dynamics is obtained. It is found that, contrary to existing views, the protein search is generally faster on DNA with more heterogeneous sequences. In addition, the search dynamics might be affected by the chemical composition near the target site. The physical origins of these phenomena are discussed. Our results suggest that biological processes might be effectively regulated by modifying chemical composition, symmetry, and heterogeneity of a genome.

  17. DNA watermarks in non-coding regulatory sequences

    Directory of Open Access Journals (Sweden)

    Pyka Martin

    2009-07-01

    Full Text Available Abstract Background DNA watermarks can be applied to identify the unauthorized use of genetically modified organisms. It has been shown that coding regions can be used to encrypt information into living organisms by using the DNA-Crypt algorithm. Yet, if the sequence of interest presents a non-coding DNA sequence, either the function of a resulting functional RNA molecule or a regulatory sequence, such as a promoter, could be affected. For our studies we used the small cytoplasmic RNA 1 in yeast and the lac promoter region of Escherichia coli. Findings The lac promoter was deactivated by the integrated watermark. In addition, the RNA molecules displayed altered configurations after introducing a watermark, but surprisingly were functionally intact, which has been verified by analyzing the growth characteristics of both wild type and watermarked scR1 transformed yeast cells. In a third approach we introduced a second overlapping watermark into the lac promoter, which did not affect the promoter activity. Conclusion Even though the watermarked RNA and one of the watermarked promoters did not show any significant differences compared to the wild type RNA and wild type promoter region, respectively, it cannot be generalized that other RNA molecules or regulatory sequences behave accordingly. Therefore, we do not recommend integrating watermark sequences into regulatory regions.

  18. Cloning and sequencing of full-length cDNAs of RNA1 and RNA2 of a Tomato black ring virus isolate from Poland.

    Science.gov (United States)

    Jończyk, M; Le Gall, O; Pałucha, A; Borodynko, N; Pospieszny, H

    2004-04-01

    Full-length cDNA clones corresponding to the RNA1 and RNA2 of the Polish isolate MJ of Tomato black ring virus (TBRV, genus Nepovirus) were obtained using a direct recombination strategy in yeast, and their complete nucleotide sequences were established. RNA1 is 7358 nucleotides and RNA2 is 4633 nucleotides in length, excluding the poly(A) tails. Both RNAs contain a single open reading frame encoding polyproteins of 254 kDa and 149 kDa for RNA1 and RNA2 respectively. Putative cleavage sites were identified, and the relationships between TBRV and related nepoviruses were studied by sequence comparison.

  19. DNA copy number, including telomeres and mitochondria, assayed using next-generation sequencing

    Directory of Open Access Journals (Sweden)

    Jackson Stuart

    2010-04-01

    Full Text Available Abstract Background DNA copy number variations occur within populations and aberrations can cause disease. We sought to develop an improved lab-automatable, cost-efficient, accurate platform to profile DNA copy number. Results We developed a sequencing-based assay of nuclear, mitochondrial, and telomeric DNA copy number that draws on the unbiased nature of next-generation sequencing and incorporates techniques developed for RNA expression profiling. To demonstrate this platform, we assayed UMC-11 cells using 5 million 33 nt reads and found tremendous copy number variation, including regions of single and homogeneous deletions and amplifications to 29 copies; 5 times more mitochondria and 4 times less telomeric sequence than a pool of non-diseased, blood-derived DNA; and that UMC-11 was derived from a male individual. Conclusion The described assay outputs absolute copy number, outputs an error estimate (p-value, and is more accurate than array-based platforms at high copy number. The platform enables profiling of mitochondrial levels and telomeric length. The assay is lab-automatable and has a genomic resolution and cost that are tunable based on the number of sequence reads.

  20. Noise Attenuation Estimation for Maximum Length Sequences in Deconvolution Process of Auditory Evoked Potentials

    Directory of Open Access Journals (Sweden)

    Xian Peng

    2017-01-01

    Full Text Available The use of maximum length sequence (m-sequence has been found beneficial for recovering both linear and nonlinear components at rapid stimulation. Since m-sequence is fully characterized by a primitive polynomial of different orders, the selection of polynomial order can be problematic in practice. Usually, the m-sequence is repetitively delivered in a looped fashion. Ensemble averaging is carried out as the first step and followed by the cross-correlation analysis to deconvolve linear/nonlinear responses. According to the classical noise reduction property based on additive noise model, theoretical equations have been derived in measuring noise attenuation ratios (NARs after the averaging and correlation processes in the present study. A computer simulation experiment was conducted to test the derived equations, and a nonlinear deconvolution experiment was also conducted using order 7 and 9 m-sequences to address this issue with real data. Both theoretical and experimental results show that the NAR is essentially independent of the m-sequence order and is decided by the total length of valid data, as well as stimulation rate. The present study offers a guideline for m-sequence selections, which can be used to estimate required recording time and signal-to-noise ratio in designing m-sequence experiments.

  1. VoSeq: a voucher and DNA sequence web application.

    Directory of Open Access Journals (Sweden)

    Carlos Peña

    Full Text Available There is an ever growing number of molecular phylogenetic studies published, due to, in part, the advent of new techniques that allow cheap and quick DNA sequencing. Hence, the demand for relational databases with which to manage and annotate the amassing DNA sequences, genes, voucher specimens and associated biological data is increasing. In addition, a user-friendly interface is necessary for easy integration and management of the data stored in the database back-end. Available databases allow management of a wide variety of biological data. However, most database systems are not specifically constructed with the aim of being an organizational tool for researchers working in phylogenetic inference. We here report a new software facilitating easy management of voucher and sequence data, consisting of a relational database as back-end for a graphic user interface accessed via a web browser. The application, VoSeq, includes tools for creating molecular datasets of DNA or amino acid sequences ready to be used in commonly used phylogenetic software such as RAxML, TNT, MrBayes and PAUP, as well as for creating tables ready for publishing. It also has inbuilt BLAST capabilities against all DNA sequences stored in VoSeq as well as sequences in NCBI GenBank. By using mash-ups and calls to web services, VoSeq allows easy integration with public services such as Yahoo! Maps, Flickr, Encyclopedia of Life (EOL and GBIF (by generating data-dumps that can be processed with GBIF's Integrated Publishing Toolkit.

  2. Probing DNA in nanopores via tunneling: from sequencing to ``quantum'' analogies

    Science.gov (United States)

    di Ventra, Massimiliano

    2012-02-01

    Fast and low-cost DNA sequencing methods would revolutionize medicine: a person could have his/her full genome sequenced so that drugs could be tailored to his/her specific illnesses; doctors could know in advance patients' likelihood to develop a given ailment; cures to major diseases could be found faster [1]. However, this goal of ``personalized medicine'' is hampered today by the high cost and slow speed of DNA sequencing methods. In this talk, I will discuss the sequencing protocol we suggest which requires the measurement of the distributions of transverse currents during the translocation of single-stranded DNA into nanopores [2-5]. I will support our conclusions with a combination of molecular dynamics simulations coupled to quantum mechanical calculations of electrical current in experimentally realizable systems [2-5]. I will also discuss recent experiments that support these theoretical predictions. In addition, I will show how this relatively unexplored area of research at the interface between solids, liquids, and biomolecules at the nanometer length scale is a fertile ground to study quantum phenomena that have a classical counterpart, such as ionic quasi-particles, ionic ``quantized'' conductance [6,7] and Coulomb blockade [8]. Work supported in part by NIH. [4pt] [1] M. Zwolak, M. Di Ventra, Physical Approaches to DNA Sequencing and Detection, Rev. Mod. Phys. 80, 141 (2008).[0pt] [2] M. Zwolak and M. Di Ventra, Electronic signature of DNA nucleotides via transverse transport, Nano Lett. 5, 421 (2005).[0pt] [3] J. Lagerqvist, M. Zwolak, and M. Di Ventra, Fast DNA sequencing via transverse electronic transport, Nano Lett. 6, 779 (2006).[0pt] [4] J. Lagerqvist, M. Zwolak, and M. Di Ventra, Influence of the environment and probes on rapid DNA sequencing via transverse electronic transport, Biophys. J. 93, 2384 (2007).[0pt] [5] M. Krems, M. Zwolak, Y.V. Pershin, and M. Di Ventra, Effect of noise on DNA sequencing via transverse electronic transport

  3. C-terminal low-complexity sequence repeats of Mycobacterium smegmatis Ku modulate DNA binding.

    Science.gov (United States)

    Kushwaha, Ambuj K; Grove, Anne

    2013-01-24

    Ku protein is an integral component of the NHEJ (non-homologous end-joining) pathway of DSB (double-strand break) repair. Both eukaryotic and prokaryotic Ku homologues have been characterized and shown to bind DNA ends. A unique feature of Mycobacterium smegmatis Ku is its basic C-terminal tail that contains several lysine-rich low-complexity PAKKA repeats that are absent from homologues encoded by obligate parasitic mycobacteria. Such PAKKA repeats are also characteristic of mycobacterial Hlp (histone-like protein) for which they have been shown to confer the ability to appose DNA ends. Unexpectedly, removal of the lysine-rich extension enhances DNA-binding affinity, but an interaction between DNA and the PAKKA repeats is indicated by the observation that only full-length Ku forms multiple complexes with a short stem-loop-containing DNA previously designed to accommodate only one Ku dimer. The C-terminal extension promotes DNA end-joining by T4 DNA ligase, suggesting that the PAKKA repeats also contribute to efficient end-joining. We suggest that low-complexity lysine-rich sequences have evolved repeatedly to modulate the function of unrelated DNA-binding proteins.

  4. Human β satellite DNA: Genomic organization and sequence definition of a class of highly repetitive tandem DNA

    International Nuclear Information System (INIS)

    Waye, J.S.; Willard, H.F.

    1989-01-01

    The authors describe a class of human repetitive DNA, called β satellite, that, at a most fundamental level, exists as tandem arrays of diverged ∼68-base-pair monomer repeat units. The monomer units are organized as distinct subsets, each characterized by a multimeric higher-order repeat unit that is tandemly reiterated and represents a recent unit of amplification. They have cloned, characterized, and determined the sequence of two β satellite higher-order repeat units: one located on chromosome 9, the other on the acrocentric chromosomes (13, 14, 15, 21, and 22) and perhaps other sites in the genome. Analysis by pulsed-field gel electrophoresis reveals that these tandem arrays are localized in large domains that are marked by restriction fragment length polymorphisms. In total, β-satellite sequences comprise several million base pairs of DNA in the human genome. Analysis of this DNA family should permit insights into the nature of chromosome-specific and nonspecific modes of satellite DNA evolution and provide useful tools for probing the molecular organization and concerted evolution of the acrocentric chromosomes

  5. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    Science.gov (United States)

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  6. High-Throughput DNA sequencing of ancient wood.

    Science.gov (United States)

    Wagner, Stefanie; Lagane, Frédéric; Seguin-Orlando, Andaine; Schubert, Mikkel; Leroy, Thibault; Guichoux, Erwan; Chancerel, Emilie; Bech-Hebelstrup, Inger; Bernard, Vincent; Billard, Cyrille; Billaud, Yves; Bolliger, Matthias; Croutsch, Christophe; Čufar, Katarina; Eynaud, Frédérique; Heussner, Karl Uwe; Köninger, Joachim; Langenegger, Fabien; Leroy, Frédéric; Lima, Christine; Martinelli, Nicoletta; Momber, Garry; Billamboz, André; Nelle, Oliver; Palomo, Antoni; Piqué, Raquel; Ramstein, Marianne; Schweichel, Roswitha; Stäuble, Harald; Tegel, Willy; Terradas, Xavier; Verdin, Florence; Plomion, Christophe; Kremer, Antoine; Orlando, Ludovic

    2018-03-01

    Reconstructing the colonization and demographic dynamics that gave rise to extant forests is essential to forecasts of forest responses to environmental changes. Classical approaches to map how population of trees changed through space and time largely rely on pollen distribution patterns, with only a limited number of studies exploiting DNA molecules preserved in wooden tree archaeological and subfossil remains. Here, we advance such analyses by applying high-throughput (HTS) DNA sequencing to wood archaeological and subfossil material for the first time, using a comprehensive sample of 167 European white oak waterlogged remains spanning a large temporal (from 550 to 9,800 years) and geographical range across Europe. The successful characterization of the endogenous DNA and exogenous microbial DNA of 140 (~83%) samples helped the identification of environmental conditions favouring long-term DNA preservation in wood remains, and started to unveil the first trends in the DNA decay process in wood material. Additionally, the maternally inherited chloroplast haplotypes of 21 samples from three periods of forest human-induced use (Neolithic, Bronze Age and Middle Ages) were found to be consistent with those of modern populations growing in the same geographic areas. Our work paves the way for further studies aiming at using ancient DNA preserved in wood to reconstruct the micro-evolutionary response of trees to climate change and human forest management. © 2018 John Wiley & Sons Ltd.

  7. Micropatterning stretched and aligned DNA for sequence-specific nanolithography

    Science.gov (United States)

    Petit, Cecilia Anna Paulette

    Techniques for fabricating nanostructured materials can be categorized as either "top-down" or "bottom-up". Top-down techniques use lithography and contact printing to create patterned surfaces and microfluidic channels that can corral and organize nanoscale structures, such as molecules and nanorods in contrast; bottom-up techniques use self-assembly or molecular recognition to direct the organization of materials. A central goal in nanotechnology is the integration of bottom-up and top-down assembly strategies for materials development, device design; and process integration. With this goal in mind, we have developed strategies that will allow this integration by using DNA as a template for nanofabrication; two top-down approaches allow the placement of these templates, while the bottom-up technique uses the specific sequence of bases to pattern materials along each strand of DNA. Our first top-down approach, termed combing of molecules in microchannels (COMMIC), produces microscopic patterns of stretched and aligned molecules of DNA on surfaces. This process consists of passing an air-water interface over end adsorbed molecules inside microfabricated channels. The geometry of the microchannel directs the placement of the DNA molecules, while the geometry of the airwater interface directs the local orientation and curvature of the molecules. We developed another top-down strategy for creating micropatterns of stretched and aligned DNA using surface chemistry. Because DNA stretching occurs on hydrophobic surfaces, this technique uses photolithography to pattern vinyl-terminated silanes on glass When these surface-, are immersed in DNA solution, molecules adhere preferentially to the silanized areas. This approach has also proven useful in patterning protein for cell adhesion studies. Finally, we describe the use of these stretched and aligned molecules of DNA as templates for the subsequent bottom-up construction of hetero-structures through hybridization

  8. Effects of Circular DNA Length on Transfection Efficiency by Electroporation into HeLa Cells.

    Science.gov (United States)

    Hornstein, Benjamin D; Roman, Dany; Arévalo-Soliz, Lirio M; Engevik, Melinda A; Zechiedrich, Lynn

    2016-01-01

    The ability to produce extremely small and circular supercoiled vectors has opened new territory for improving non-viral gene therapy vectors. In this work, we compared transfection of supercoiled DNA vectors ranging from 383 to 4,548 bp, each encoding shRNA against GFP under control of the H1 promoter. We assessed knockdown of GFP by electroporation into HeLa cells. All of our vectors entered cells in comparable numbers when electroporated with equal moles of DNA. Despite similar cell entry, we found length-dependent differences in how efficiently the vectors knocked down GFP. As vector length increased up to 1,869 bp, GFP knockdown efficiency per mole of transfected DNA increased. From 1,869 to 4,257 bp, GFP knockdown efficiency per mole was steady, then decreased with increasing vector length. In comparing GFP knockdown with equal masses of vectors, we found that the shorter vectors transfect more efficiently per nanogram of DNA transfected. Our results rule out cell entry and DNA mass as determining factors for gene knockdown efficiency via electroporation. The length-dependent effects we have uncovered are likely explained by differences in nuclear translocation or transcription. These data add an important step towards clinical applications of non-viral vector delivery.

  9. The influence of DNA sequence on epigenome-induced pathologies

    Directory of Open Access Journals (Sweden)

    Meagher Richard B

    2012-07-01

    Full Text Available Abstract Clear cause-and-effect relationships are commonly established between genotype and the inherited risk of acquiring human and plant diseases and aberrant phenotypes. By contrast, few such cause-and-effect relationships are established linking a chromatin structure (that is, the epitype with the transgenerational risk of acquiring a disease or abnormal phenotype. It is not entirely clear how epitypes are inherited from parent to offspring as populations evolve, even though epigenetics is proposed to be fundamental to evolution and the likelihood of acquiring many diseases. This article explores the hypothesis that, for transgenerationally inherited chromatin structures, “genotype predisposes epitype”, and that epitype functions as a modifier of gene expression within the classical central dogma of molecular biology. Evidence for the causal contribution of genotype to inherited epitypes and epigenetic risk comes primarily from two different kinds of studies discussed herein. The first and direct method of research proceeds by the examination of the transgenerational inheritance of epitype and the penetrance of phenotype among genetically related individuals. The second approach identifies epitypes that are duplicated (as DNA sequences are duplicated and evolutionarily conserved among repeated patterns in the DNA sequence. The body of this article summarizes particularly robust examples of these studies from humans, mice, Arabidopsis, and other organisms. The bulk of the data from both areas of research support the hypothesis that genotypes predispose the likelihood of displaying various epitypes, but for only a few classes of epitype. This analysis suggests that renewed efforts are needed in identifying polymorphic DNA sequences that determine variable nucleosome positioning and DNA methylation as the primary cause of inherited epigenome-induced pathologies. By contrast, there is very little evidence that DNA sequence directly

  10. Design of Tail-Clamp Peptide Nucleic Acid Tethered with Azobenzene Linker for Sequence-Specific Detection of Homopurine DNA

    Directory of Open Access Journals (Sweden)

    Shinjiro Sawada

    2017-10-01

    Full Text Available DNA carries genetic information in its sequence of bases. Synthetic oligonucleotides that can sequence-specifically recognize a target gene sequence are a useful tool for regulating gene expression or detecting target genes. Among the many synthetic oligonucleotides, tail-clamp peptide nucleic acid (TC-PNA offers advantages since it has two homopyrimidine PNA strands connected via a flexible ethylene glycol-type linker that can recognize complementary homopurine sequences via Watson-Crick and Hoogsteen base pairings and form thermally-stable PNA/PNA/DNA triplex structures. Here, we synthesized a series of TC-PNAs that can possess different lengths of azobenzene-containing linkers and studied their binding behaviours to homopurine single-stranded DNA. Introduction of azobenzene at the N-terminus amine of PNA increased the thermal stability of PNA-DNA duplexes. Further extension of the homopyrimidine PNA strand at the N-terminus of PNA-AZO further increased the binding stability of the PNA/DNA/PNA triplex to the target homopurine sequence; however, it induced TC-PNA/DNA/TC-PNA complex formation. Among these TC-PNAs, 9W5H-C4-AZO consisting of nine Watson-Crick bases and five Hoogsteen bases tethered with a beta-alanine conjugated azobenzene linker gave a stable 1:1 TC-PNA/ssDNA complex and exhibited good mismatch recognition. Our design for TC-PNA-AZO can be utilized for detecting homopurine sequences in various genes.

  11. Pericentric satellite DNA sequences in Pipistrellus pipistrellus (Vespertilionidae; Chiroptera).

    Science.gov (United States)

    Barragán, M J L; Martínez, S; Marchal, J A; Fernández, R; Bullejos, M; Díaz de la Guardia, R; Sánchez, A

    2003-09-01

    This paper reports the molecular and cytogenetic characterization of a HindIII family of satellite DNA in the bat species Pipistrellus pipistrellus. This satellite is organized in tandem repeats of 418 bp monomer units, and represents approximately 3% of the whole genome. The consensus sequence from five cloned monomer units has an A-T content of 62.20%. We have found differences in the ladder pattern of bands between two populations of the same species. These differences are probably because of the absence of the target sites for the HindIII enzyme in most monomer units of one population, but not in the other. Fluorescent in situ hybridization (FISH) localized the satellite DNA in the pericentromeric regions of all autosomes and the X chromosome, but it was absent from the Y chromosome. Digestion of genomic DNAs with HpaII and its isoschizomer MspI demonstrated that these repetitive DNA sequences are not methylated. Other bat species were tested for the presence of this repetitive DNA. It was absent in five Vespertilionidae and one Rhinolophidae species, indicating that it could be a species/genus specific, repetitive DNA family.

  12. DNA Duplex Length and Salt Concentration Dependence of Enthalpy−Entropy Compensation Parameters for DNA Melting

    KAUST Repository

    Starikov, E. B.; Nordén, Bengt

    2009-01-01

    of the DNA and their water-counterion surrounding. With this in mind, to reinterpret the most recent results of calorimetric experiments on DNA oligomers of such a kind, the recent enthalpy-entropy compensation theory has been used. It is demonstrated

  13. Early Lyme disease with spirochetemia - diagnosed by DNA sequencing

    Directory of Open Access Journals (Sweden)

    Jones William

    2010-11-01

    Full Text Available Abstract Background A sensitive and analytically specific nucleic acid amplification test (NAAT is valuable in confirming the diagnosis of early Lyme disease at the stage of spirochetemia. Findings Venous blood drawn from patients with clinical presentations of Lyme disease was tested for the standard 2-tier screen and Western Blot serology assay for Lyme disease, and also by a nested polymerase chain reaction (PCR for B. burgdorferi sensu lato 16S ribosomal DNA. The PCR amplicon was sequenced for B. burgdorferi genomic DNA validation. A total of 130 patients visiting emergency room (ER or Walk-in clinic (WALKIN, and 333 patients referred through the private physicians' offices were studied. While 5.4% of the ER/WALKIN patients showed DNA evidence of spirochetemia, none (0% of the patients referred from private physicians' offices were DNA-positive. In contrast, while 8.4% of the patients referred from private physicians' offices were positive for the 2-tier Lyme serology assay, only 1.5% of the ER/WALKIN patients were positive for this antibody test. The 2-tier serology assay missed 85.7% of the cases of early Lyme disease with spirochetemia. The latter diagnosis was confirmed by DNA sequencing. Conclusion Nested PCR followed by automated DNA sequencing is a valuable supplement to the standard 2-tier antibody assay in the diagnosis of early Lyme disease with spirochetemia. The best time to test for Lyme spirochetemia is when the patients living in the Lyme disease endemic areas develop unexplained symptoms or clinical manifestations that are consistent with Lyme disease early in the course of their illness.

  14. Length and sequence dependence in the association of Huntingtin protein with lipid membranes

    Science.gov (United States)

    Jawahery, Sudi; Nagarajan, Anu; Matysiak, Silvina

    2013-03-01

    There is a fundamental gap in our understanding of how aggregates of mutant Huntingtin protein (htt) with overextended polyglutamine (polyQ) sequences gain the toxic properties that cause Huntington's disease (HD). Experimental studies have shown that the most important step associated with toxicity is the binding of mutant htt aggregates to lipid membranes. Studies have also shown that flanking amino acid sequences around the polyQ sequence directly affect interactions with the lipid bilayer, and that polyQ sequences of greater than 35 glutamine repeats in htt are a characteristic of HD. The key steps that determine how flanking sequences and polyQ length affect the structure of lipid bilayers remain unknown. In this study, we use atomistic molecular dynamics simulations to study the interactions between lipid membranes of varying compositions and polyQ peptides of varying lengths and flanking sequences. We find that overextended polyQ interactions do cause deformation in model membranes, and that the flanking sequences do play a role in intensifying this deformation by altering the shape of the affected regions.

  15. Spectral sum rules and search for periodicities in DNA sequences

    International Nuclear Information System (INIS)

    Chechetkin, V.R.

    2011-01-01

    Periodic patterns play the important regulatory and structural roles in genomic DNA sequences. Commonly, the underlying periodicities should be understood in a broad statistical sense, since the corresponding periodic patterns have been strongly distorted by the random point mutations and insertions/deletions during molecular evolution. The latent periodicities in DNA sequences can be efficiently displayed by Fourier transform. The criteria of significance for observed periodicities are obtained via the comparison versus the counterpart characteristics of the reference random sequences. We show that the restrictions imposed on the significance criteria by the rigorous spectral sum rules can be rationally described with De Finetti distribution. This distribution provides the convenient intermediate asymptotic form between Rayleigh distribution and exact combinatoric theory. - Highlights: → We study the significance criteria for latent periodicities in DNA sequences. → The constraints imposed by sum rules can be described with De Finetti distribution. → It is intermediate between Rayleigh distribution and exact combinatoric theory. → Theory is applicable to the study of correlations between different periodicities. → The approach can be generalized to the arbitrary discrete Fourier transform.

  16. Study of force induced melting of dsDNA as a function of length and conformation

    International Nuclear Information System (INIS)

    Danilowicz, Claudia; Hatch, Kristi; Conover, Alyson; Gunaratne, Ruwan; Coljee, Vincent; Prentiss, Mara; Ducas, Theodore

    2010-01-01

    We measure the constant force required to melt double-stranded (ds) DNA as a function of length for lengths from 12 to 100 000 base pairs, where the force is applied to the 3'3' or 5'5' ends of the dsDNA. Molecules with 32 base pairs or fewer melt before overstretching. For these short molecules, the melting force is independent of the ends to which the force is applied and the shear force as a function of length is well described by de Gennes theory with a de Gennes length of less than 10 bp. Molecules with lengths of 500 base pairs or more overstretch before melting. For these long molecules, the melting force depends on the ends to which the force is applied. The melting force as a function of length increases even when the length exceeds 1000 bp, where the length dependence is inconsistent with de Gennes theory. Finally, we expand de Gennes melting theory to 3'5' pulling and compare the predictions with experimental results.

  17. Phylogenetic relationships of the Gomphales based on nuc-25S-rDNA, mit-12S-rDNA, and mit-atp6-DNA combined sequences

    Science.gov (United States)

    Admir J. Giachini; Kentaro Hosaka; Eduardo Nouhra; Joseph Spatafora; James M. Trappe

    2010-01-01

    Phylogenetic relationships among Geastrales, Gomphales, Hysterangiales, and Phallales were estimated via combined sequences: nuclear large subunit ribosomal DNA (nuc-25S-rDNA), mitochondrial small subunit ribosomal DNA (mit-12S-rDNA), and mitochondrial atp6 DNA (mit-atp6-DNA). Eighty-one taxa comprising 19 genera and 58 species...

  18. cDNA encoding a polypeptide including a hevein sequence

    Energy Technology Data Exchange (ETDEWEB)

    Raikhel, Natasha V. (Okemos, MI); Broekaert, Willem F. (Dilbeek, BE); Chua, Nam-Hai (Scarsdale, NY); Kush, Anil (New York, NY)

    1993-02-16

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a pu GOVERNMENT RIGHTS This application was funded under Department of Energy Contract DE-AC02-76ER01338. The U.S. Government has certain rights under this application and any patent issuing thereon.

  19. Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

    Science.gov (United States)

    Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

    2016-11-23

    The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.

  20. cDNA sequences of two apolipoproteins from lamprey

    International Nuclear Information System (INIS)

    Pontes, M.; Xu, X.; Graham, D.; Riley, M.; Doolittle, R.F.

    1987-01-01

    The messages for two small but abundant apolipoproteins found in lamprey blood plasma were cloned with the aid of oligonucleotide probes based on amino-terminal sequences. In both cases, numerous clones were identified in a lamprey liver cDNA library, consistent with the great abundance of these proteins in lamprey blood. One of the cDNAs (LAL1) has a coding region of 105 amino acids that corresponds to a 21-residue signal peptide, a putative 8-residue propeptide, and the 76-residue mature protein found in blood. The other cDNA (LAL2) codes for a total of 191 residues, the first 23 of which constitute a signal peptide. The two proteins, which occur in the high-density lipoprotein fraction of ultracentrifuged plasma, have amino acid compositions similar to those of apolipoproteins found in mammalian blood; computer analysis indicates that the sequences are largely helix-permissive. When the sequences were searched against an amino acid sequence data base, rat apolipoprotein IV was the best matching candidate in both cases. Although a reasonable alignment can be made with that sequence and LAL1, definitive assignment of the two lamprey proteins to typical mammalian classes cannot be made at this point

  1. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...... in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H. influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions....

  2. Digital Droplet Multiple Displacement Amplification (ddMDA for Whole Genome Sequencing of Limited DNA Samples.

    Directory of Open Access Journals (Sweden)

    Minsoung Rhee

    Full Text Available Multiple displacement amplification (MDA is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently, the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet, ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology.

  3. cDNA sequences of two inducible T-cell genes

    Energy Technology Data Exchange (ETDEWEB)

    Kwon, B.S. (Indiana Univ. School of Medicine, Indianapolis (USA) Guthrie Research Institute, Sayre, PA (USA)); Weissman, S.M. (Yale Univ., New Haven, CT (USA))

    1989-03-01

    The authors have previously described a set of human T-lymphocyte-specific cDNA clones isolated by a modified differential screening procedure. Apparent full-length cDNAs containing the sequences of 14 of the 16 initial isolates were sequenced and were found to represent five different species of mRNA; three of the five species were identical to previously reported cDNA sequences of preproenkephalin, T-cell-replacing factor, and a serine esterase, respectively. The other two species, 4-1BB and L2G25B, were inducible sequences found in mRNA from both a cytolytic T-lymphocyte and a helper T-lymphocyte clone and were not previously described in T-cell mRNA; these mRNA sequences encode peptides of 256 and 92 amino acids, respectively. Both peptides contain putative leader sequences. The protein encoded by 4-1BB also has a potential membrane anchor segment and other features also seen in known receptor proteins.

  4. Taxonomy and phylogeny of the genus citrus based on the nuclear ribosomal dna its region sequence

    International Nuclear Information System (INIS)

    Sun, Y.L.

    2015-01-01

    The genus Citrus (Aurantioideae, Rutaceae) is the sole source of the citrus fruits of commerce showing high economic values. In this study, the taxonomy and phylogeny of Citrus species is evaluated using sequence analysis of the ITS region of nrDNA. This study is based on 26 plants materials belonging to 22 Citrus species having wild, domesticated, and cultivated species. Through DNA alignment of the ITS sequence, ITS1 and ITS2 regions showed relatively high variations of sequence length and nucleotide among these Citrus species. According to previous six-tribe discrimination theory by Swingle and Reece, the grouping in our ITS phylogenetic tree reconstructed by ITS sequences was not related to tribe discrimination but species discrimination. However, the molecular analysis could provide more information on citrus taxonomy. Combined with ITS sequences of other subgenera in then true citrus fruit tree group, the ITS phylogenetic tree indicated subgenera Citrus was monophyletic and nearer to Fortunella, Poncirus, and Clymenia compared to Microcitrus and Eremocitrus. Abundant sequence variations of the ITS region shown in this study would help species identification and tribe differentiation of the genus Citrus. (author)

  5. Rational Design of High-Number dsDNA Fragments Based on Thermodynamics for the Construction of Full-Length Genes in a Single Reaction.

    Science.gov (United States)

    Birla, Bhagyashree S; Chou, Hui-Hsien

    2015-01-01

    Gene synthesis is frequently used in modern molecular biology research either to create novel genes or to obtain natural genes when the synthesis approach is more flexible and reliable than cloning. DNA chemical synthesis has limits on both its length and yield, thus full-length genes have to be hierarchically constructed from synthesized DNA fragments. Gibson Assembly and its derivatives are the simplest methods to assemble multiple double-stranded DNA fragments. Currently, up to 12 dsDNA fragments can be assembled at once with Gibson Assembly according to its vendor. In practice, the number of dsDNA fragments that can be assembled in a single reaction are much lower. We have developed a rational design method for gene construction that allows high-number dsDNA fragments to be assembled into full-length genes in a single reaction. Using this new design method and a modified version of the Gibson Assembly protocol, we have assembled 3 different genes from up to 45 dsDNA fragments at once. Our design method uses the thermodynamic analysis software Picky that identifies all unique junctions in a gene where consecutive DNA fragments are specifically made to connect to each other. Our novel method is generally applicable to most gene sequences, and can improve both the efficiency and cost of gene assembly.

  6. Rational Design of High-Number dsDNA Fragments Based on Thermodynamics for the Construction of Full-Length Genes in a Single Reaction.

    Directory of Open Access Journals (Sweden)

    Bhagyashree S Birla

    Full Text Available Gene synthesis is frequently used in modern molecular biology research either to create novel genes or to obtain natural genes when the synthesis approach is more flexible and reliable than cloning. DNA chemical synthesis has limits on both its length and yield, thus full-length genes have to be hierarchically constructed from synthesized DNA fragments. Gibson Assembly and its derivatives are the simplest methods to assemble multiple double-stranded DNA fragments. Currently, up to 12 dsDNA fragments can be assembled at once with Gibson Assembly according to its vendor. In practice, the number of dsDNA fragments that can be assembled in a single reaction are much lower. We have developed a rational design method for gene construction that allows high-number dsDNA fragments to be assembled into full-length genes in a single reaction. Using this new design method and a modified version of the Gibson Assembly protocol, we have assembled 3 different genes from up to 45 dsDNA fragments at once. Our design method uses the thermodynamic analysis software Picky that identifies all unique junctions in a gene where consecutive DNA fragments are specifically made to connect to each other. Our novel method is generally applicable to most gene sequences, and can improve both the efficiency and cost of gene assembly.

  7. Development of a defined-sequence DNA system for use in DNA misrepair studies

    International Nuclear Information System (INIS)

    Sutton, S.; Tobias, C.A.

    1984-01-01

    The authors have developed a system that allows them to study cellular DNA repair processes at the molecular level. In particular, the authors are using this system to examine the consequences of a misrepair of radiation-induced DNA damage, as a function of dose. The cells being used are specially engineered haploid yeast cells. Maintained in the cells, at one copy per cell, is a cen plasmid, a plasmid that behaves like a functional chromosome. This plasmid carries a small defined sequence of DNA from the E. coli lac z gene. It is this lac z region (called the alpha region) that serves as the target for radiation damage. Two copies of the complimentary portion of the lac z gene are integrated into the yeast genome. Irradiated cells are screened for possible mutation in the alpha region by testing the cells' ability to hydrolyze xgal, a lactose substrate. The DNA of interest is then extracted from the cells, sequenced, and the sequence is compared to that of the control. Unlike the usual defined-sequence DNA systems, theirs is an in vivo system. A disadvantage is the relatively high background mutation rate. Results achieved with this system, as well as future applications, are discussed

  8. Rapid DNA sequencing by horizontal ultrathin gel electrophoresis.

    OpenAIRE

    Brumley, R L; Smith, L M

    1991-01-01

    A horizontal polyacrylamide gel electrophoresis apparatus has been developed that decreases the time required to separate the DNA fragments produced in enzymatic sequencing reactions. The configuration of this apparatus and the use of circulating coolant directly under the glass plates result in heat exchange that is approximately nine times more efficient than passive thermal transfer methods commonly used. Bubble-free gels as thin as 25 microns can be routinely cast on this device. The appl...

  9. Is photocleavage of DNA by YOYO-1 using a synchrotron radiation light source sequence dependent?

    DEFF Research Database (Denmark)

    Gilroy, Emma L.; Hoffmann, Søren Vrønning; Jones, Nykola C.

    2011-01-01

    ) throughout the irradiation period. The dependence of LD signals on DNA sequences and on time in the intense light beam was explored and quantified for single-stranded poly(dA), poly[(dA-dT)2], calf thymus DNA (ctDNA) and Micrococcus luteus DNA (mlDNA). The DNA and ligand regions of the spectrum showed...

  10. XenDB: Full length cDNA prediction and cross species mapping in Xenopus laevis

    Directory of Open Access Journals (Sweden)

    Giegerich Robert

    2005-09-01

    Full Text Available Abstract Background Research using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems. Description Using a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined. Conclusion The results of the analysis have been stored in a publicly available database XenDB http://bibiserv.techfak.uni-bielefeld.de/xendb/. A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches. Supplementary material can be found at http://bibiserv.techfak.uni-bielefeld.de/xendb/.

  11. Differential diagnosis of genetic disease by DNA restriction fragment length polymorphisms

    NARCIS (Netherlands)

    Bolhuis, P. A.; Defesche, J. C.; van der Helm, H. J.

    1987-01-01

    DNA restriction fragment length polymorphisms (RFLPs) are used for diagnosis of genetic disease in families known to be affected by specific disorders, but RFLPs can be also useful for the differential diagnosis of hereditary disease. An RFLP pattern represents the inheritance of chromosomal markers

  12. Construction and characterization of a full-length cDNA library for the wheat stripe rust pathogen (Puccinia striiformis f. sp. tritici

    Directory of Open Access Journals (Sweden)

    Chen Xianming

    2007-06-01

    Full Text Available Abstract Background Puccinia striiformis is a plant pathogenic fungus causing stripe rust, one of the most important diseases on cereal crops and grasses worldwide. However, little is know about its genome and genes involved in the biology and pathogenicity of the pathogen. We initiated the functional genomic research of the fungus by constructing a full-length cDNA and determined functions of the first group of genes by sequence comparison of cDNA clones to genes reported in other fungi. Results A full-length cDNA library, consisting of 42,240 clones with an average cDNA insert of 1.9 kb, was constructed using urediniospores of race PST-78 of P. striiformis f. sp. tritici. From 196 sequenced cDNA clones, we determined functions of 73 clones (37.2%. In addition, 36 clones (18.4% had significant homology to hypothetical proteins, 37 clones (18.9% had some homology to genes in other fungi, and the remaining 50 clones (25.5% did not produce any hits. From the 73 clones with functions, we identified 51 different genes encoding protein products that are involved in amino acid metabolism, cell defense, cell cycle, cell signaling, cell structure and growth, energy cycle, lipid and nucleotide metabolism, protein modification, ribosomal protein complex, sugar metabolism, transcription factor, transport metabolism, and virulence/infection. Conclusion The full-length cDNA library is useful in identifying functional genes of P. striiformis.

  13. Comparison of DNA Quantification Methods for Next Generation Sequencing.

    Science.gov (United States)

    Robin, Jérôme D; Ludlow, Andrew T; LaRanger, Ryan; Wright, Woodring E; Shay, Jerry W

    2016-04-06

    Next Generation Sequencing (NGS) is a powerful tool that depends on loading a precise amount of DNA onto a flowcell. NGS strategies have expanded our ability to investigate genomic phenomena by referencing mutations in cancer and diseases through large-scale genotyping, developing methods to map rare chromatin interactions (4C; 5C and Hi-C) and identifying chromatin features associated with regulatory elements (ChIP-seq, Bis-Seq, ChiA-PET). While many methods are available for DNA library quantification, there is no unambiguous gold standard. Most techniques use PCR to amplify DNA libraries to obtain sufficient quantities for optical density measurement. However, increased PCR cycles can distort the library's heterogeneity and prevent the detection of rare variants. In this analysis, we compared new digital PCR technologies (droplet digital PCR; ddPCR, ddPCR-Tail) with standard methods for the titration of NGS libraries. DdPCR-Tail is comparable to qPCR and fluorometry (QuBit) and allows sensitive quantification by analysis of barcode repartition after sequencing of multiplexed samples. This study provides a direct comparison between quantification methods throughout a complete sequencing experiment and provides the impetus to use ddPCR-based quantification for improvement of NGS quality.

  14. Application of synthetic DNA probes to the analysis of DNA sequence variants in man

    International Nuclear Information System (INIS)

    Wallace, R.B.; Petz, L.D.; Yam, P.Y.

    1986-01-01

    Oligonucleotide probes provide a tool to discriminate between any two alleles on the basis of hybridization. Random sampling of the genome with different oligonucleotide probes should reveal polymorphism in a certain percentage of the cases. In the hope of identifying polymorphic regions more efficiently, we chose to take advantage of the proposed hypermutability of repeated DNA sequences and the specificity of oligonucleotide hybridization. Since, under appropriate conditions, oligonucleotide probes require complete base pairing for hybridization to occur, they will only hybridize to a subset of the members of a repeat family when all members of the family are not identical. The results presented here suggest that oligonucleotide hybridization can be used to extend the genomic sequences that can be tested for the presence of RFLPs. This expands the tools available to human genetics. In addition, the results suggest that repeated DNA sequences are indeed more polymorphic than single-copy sequences. 28 references, 2 figures

  15. Sequence Dependent Electrophoretic Separations of DNA in Pluronic F127 Gels

    Science.gov (United States)

    You, Seungyong; van Winkle, David H.

    2010-03-01

    Two-dimensional (2-D) electrophoresis has successfully been used to visualize the separation of DNA fragments of the same length. We electrophorese a double-stranded DNA ladder in an Agarose gel for the first dimension and in gels of Pluronic F127 for the second dimension at room temperature. The 1000 bp band that travels together as a single band in an Agarose gel is split into two bands in Pluronic gels. The slower band follows the exponential decay trend that the other ladder constituents do. After sequencing the DNA fragments, the faster band has an apparently random sequence, while the slower band and the others have two A-tracts in each 250 bp segment. The A-tracts consist of a series of at least five adenine bases pairing with thymine bases. This result leads to the conclusion that the migration of the DNA molecules bent with A-tracts is more retarded in Pluronic gels than the wild-type of DNA molecules.

  16. Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Matt J Cahill

    Full Text Available BACKGROUND: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. METHODOLOGY/PRINCIPAL FINDINGS: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. CONCLUSIONS: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length.

  17. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.

    2010-07-12

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  18. Read length and repeat resolution: Exploring prokaryote genomes using next-generation sequencing technologies

    KAUST Repository

    Cahill, Matt J.; Kö ser, Claudio U.; Ross, Nicholas E.; Archer, John A.C.

    2010-01-01

    Background: There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats. Methodology/Principal Findings: Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads. Conclusions: Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 2010 Cahill et al.

  19. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    Directory of Open Access Journals (Sweden)

    Chengwei Luo

    Full Text Available Next-generation sequencing (NGS is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage correlated highly between the two platforms (R(2>0.9. Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  20. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    Science.gov (United States)

    Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

    2012-01-01

    Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  1. Complete chloroplast DNA sequence from a Korean endemic genus, Megaleranthis saniculifolia, and its evolutionary implications.

    Science.gov (United States)

    Kim, Young-Kyu; Park, Chong-wook; Kim, Ki-Joong

    2009-03-31

    The chloroplast DNA sequences of Megaleranthis saniculifolia, an endemic and monotypic endangered plant species, were completed in this study (GenBank FJ597983). The genome is 159,924 bp in length. It harbors a pair of IR regions consisting of 26,608 bp each. The lengths of the LSC and SSC regions are 88,326 bp and 18,382 bp, respectively. The structural organizations, gene and intron contents, gene orders, AT contents, codon usages, and transcription units of the Megaleranthis chloroplast genome are similar to those of typical land plant cp DNAs. However, the detailed features of Megaleranthis chloroplast genomes are substantially different from that of Ranunculus, which belongs to the same family, the Ranunculaceae. First, the Megaleranthis cp DNA was 4,797 bp longer than that of Ranunculus due to an expanded IR region into the SSC region and duplicated sequence elements in several spacer regions of the Megaleranthis cp genome. Second, the chloroplast genomes of Megaleranthis and Ranunculus evidence 5.6% sequence divergence in the coding regions, 8.9% sequence divergence in the intron regions, and 18.7% sequence divergence in the intergenic spacer regions, respectively. In both the coding and noncoding regions, average nucleotide substitution rates differed markedly, depending on the genome position. Our data strongly implicate the positional effects of the evolutionary modes of chloroplast genes. The genes evidencing higher levels of base substitutions also have higher incidences of indel mutations and low Ka/Ks ratios. A total of 54 simple sequence repeat loci were identified from the Megaleranthis cp genome. The existence of rich cp SSR loci in the Megaleranthis cp genome provides a rare opportunity to study the population genetic structures of this endangered species. Our phylogenetic trees based on the two independent markers, the nuclear ITS and chloroplast matK sequences, strongly support the inclusion of the Megaleranthis to the Trollius. Therefore, our

  2. Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus.

    Science.gov (United States)

    Sanford, J; Forrester, L; Chapman, V; Chandley, A; Hastie, N

    1984-03-26

    The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic DNA methylation between sperm and oocyte DNA. The methylation levels of the minor satellite sequences did not change during spermiogenesis, and were not associated with the onset of meiosis or a specific stage in sperm development.

  3. DNA-directed alkylating ligands as potential antitumor agents: sequence specificity of alkylation by intercalating aniline mustards.

    Science.gov (United States)

    Prakash, A S; Denny, W A; Gourdie, T A; Valu, K K; Woodgate, P D; Wakelin, L P

    1990-10-23

    The sequence preferences for alkylation of a series of novel parasubstituted aniline mustards linked to the DNA-intercalating chromophore 9-aminoacridine by an alkyl chain of variable length were studied by using procedures analogous to Maxam-Gilbert reactions. The compounds alkylate DNA at both guanine and adenine sites. For mustards linked to the acridine by a short alkyl chain through a para O- or S-link group, 5'-GT sequences are the most preferred sites at which N7-guanine alkylation occurs. For analogues with longer chain lengths, the preference of 5'-GT sequences diminishes in favor of N7-adenine alkylation at the complementary 5'-AC sequence. Magnesium ions are shown to selectively inhibit alkylation at the N7 of adenine (in the major groove) by these compounds but not the alkylation at the N3 of adenine (in the minor groove) by the antitumor antibiotic CC-1065. Effects of chromophore variation were also studied by using aniline mustards linked to quinazoline and sterically hindered tert-butyl-9-aminoacridine chromophores. The results demonstrate that in this series of DNA-directed mustards the noncovalent interactions of the carrier chromophores with DNA significantly modify the sequence selectivity of alkylation by the mustard. Relationships between the DNA alkylation patterns of these compounds and their biological activities are discussed.

  4. Fidelity and mutational spectrum of Pfu DNA polymerase on a human mitochondrial DNA sequence.

    Science.gov (United States)

    André, P; Kim, A; Khrapko, K; Thilly, W G

    1997-08-01

    The study of rare genetic changes in human tissues requires specialized techniques. Point mutations at fractions at or below 10(-6) must be observed to discover even the most prominent features of the point mutational spectrum. PCR permits the increase in number of mutant copies but does so at the expense of creating many additional mutations or "PCR noise". Thus, each DNA sequence studied must be characterized with regard to the DNA polymerase and conditions used to avoid interpreting a PCR-generated mutation as one arising in human tissue. The thermostable DNA polymerase derived from Pyrococcus furiosus designated Pfu has the highest fidelity of any DNA thermostable polymerase studied to date, and this property recommends it for analyses of tissue mutational spectra. Here, we apply constant denaturant capillary electrophoresis (CDCE) to separate and isolate the products of DNA amplification. This new strategy permitted direct enumeration and identification of point mutations created by Pfu DNA polymerase in a 96-bp low melting domain of a human mitochondrial sequence despite the very low mutant fractions generated in the PCR process. This sequence, containing part of the tRNA glycine and NADH dehydrogenase subunit 3 genes, is the target of our studies of mitochondrial mutagenesis in human cells and tissues. Incorrectly synthesized sequences were separated from the wild type as mutant/wild-type heteroduplexes by sequential enrichment on CDCE. An artificially constructed mutant was used as an internal standard to permit calculation of the mutant fraction. Our study found that the average error rate (mutations per base pair duplication) of Pfu was 6.5 x 10(-7), and five of its more frequent mutations (hot spots) consisted of three transversions (GC-->TA, AT-->TA, and AT-->CG), one transition (AT-->GC), and one 1-bp deletion (in an AAAAAA sequence). To achieve an even higher sensitivity, the amount of Pfu-induced mutants must be reduced.

  5. Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus.

    OpenAIRE

    Sanford, J; Forrester, L; Chapman, V; Chandley, A; Hastie, N

    1984-01-01

    The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic...

  6. Chimeric TALE recombinases with programmable DNA sequence specificity.

    Science.gov (United States)

    Mercer, Andrew C; Gaj, Thomas; Fuller, Roberta P; Barbas, Carlos F

    2012-11-01

    Site-specific recombinases are powerful tools for genome engineering. Hyperactivated variants of the resolvase/invertase family of serine recombinases function without accessory factors, and thus can be re-targeted to sequences of interest by replacing native DNA-binding domains (DBDs) with engineered zinc-finger proteins (ZFPs). However, imperfect modularity with particular domains, lack of high-affinity binding to all DNA triplets, and difficulty in construction has hindered the widespread adoption of ZFPs in unspecialized laboratories. The discovery of a novel type of DBD in transcription activator-like effector (TALE) proteins from Xanthomonas provides an alternative to ZFPs. Here we describe chimeric TALE recombinases (TALERs): engineered fusions between a hyperactivated catalytic domain from the DNA invertase Gin and an optimized TALE architecture. We use a library of incrementally truncated TALE variants to identify TALER fusions that modify DNA with efficiency and specificity comparable to zinc-finger recombinases in bacterial cells. We also show that TALERs recombine DNA in mammalian cells. The TALER architecture described herein provides a platform for insertion of customized TALE domains, thus significantly expanding the targeting capacity of engineered recombinases and their potential applications in biotechnology and medicine.

  7. Structural properties of replication origins in yeast DNA sequences

    International Nuclear Information System (INIS)

    Cao Xiaoqin; Zeng Jia; Yan Hong

    2008-01-01

    Sequence-dependent DNA flexibility is an important structural property originating from the DNA 3D structure. In this paper, we investigate the DNA flexibility of the budding yeast (S. Cerevisiae) replication origins on a genome-wide scale using flexibility parameters from two different models, the trinucleotide and the tetranucleotide models. Based on analyzing average flexibility profiles of 270 replication origins, we find that yeast replication origins are significantly rigid compared with their surrounding genomic regions. To further understand the highly distinctive property of replication origins, we compare the flexibility patterns between yeast replication origins and promoters, and find that they both contain significantly rigid DNAs. Our results suggest that DNA flexibility is an important factor that helps proteins recognize and bind the target sites in order to initiate DNA replication. Inspired by the role of the rigid region in promoters, we speculate that the rigid replication origins may facilitate binding of proteins, including the origin recognition complex (ORC), Cdc6, Cdt1 and the MCM2-7 complex

  8. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  9. Bacterial DNA Sequence Compression Models Using Artificial Neural Networks

    Directory of Open Access Journals (Sweden)

    Armando J. Pinho

    2013-08-01

    Full Text Available It is widely accepted that the advances in DNA sequencing techniques have contributed to an unprecedented growth of genomic data. This fact has increased the interest in DNA compression, not only from the information theory and biology points of view, but also from a practical perspective, since such sequences require storage resources. Several compression methods exist, and particularly, those using finite-context models (FCMs have received increasing attention, as they have been proven to effectively compress DNA sequences with low bits-per-base, as well as low encoding/decoding time-per-base. However, the amount of run-time memory required to store high-order finite-context models may become impractical, since a context-order as low as 16 requires a maximum of 17.2 x 109 memory entries. This paper presents a method to reduce such a memory requirement by using a novel application of artificial neural networks (ANN to build such probabilistic models in a compact way and shows how to use them to estimate the probabilities. Such a system was implemented, and its performance compared against state-of-the art compressors, such as XM-DNA (expert model and FCM-Mx (mixture of finite-context models , as well as with general-purpose compressors. Using a combination of order-10 FCM and ANN, similar encoding results to those of FCM, up to order-16, are obtained using only 17 megabytes of memory, whereas the latter, even employing hash-tables, uses several hundreds of megabytes.

  10. Using Synthetic Nanopores for Single-Molecule Analyses: Detecting SNPs, Trapping DNA Molecules, and the Prospects for Sequencing DNA

    Science.gov (United States)

    Dimitrov, Valentin V.

    2009-01-01

    This work focuses on studying properties of DNA molecules and DNA-protein interactions using synthetic nanopores, and it examines the prospects of sequencing DNA using synthetic nanopores. We have developed a method for discriminating between alleles that uses a synthetic nanopore to measure the binding of a restriction enzyme to DNA. There exists…

  11. Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics

    Science.gov (United States)

    Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1995-01-01

    We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.

  12. Synthesis of DNA

    Science.gov (United States)

    Mariella, Jr., Raymond P.

    2008-11-18

    A method of synthesizing a desired double-stranded DNA of a predetermined length and of a predetermined sequence. Preselected sequence segments that will complete the desired double-stranded DNA are determined. Preselected segment sequences of DNA that will be used to complete the desired double-stranded DNA are provided. The preselected segment sequences of DNA are assembled to produce the desired double-stranded DNA.

  13. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing

    Directory of Open Access Journals (Sweden)

    Zdepski Anna

    2011-05-01

    Full Text Available Abstract Background High throughput sequencing (HTS technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. Results We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR. We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Conclusions Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants.

  14. cDNA encoding a polypeptide including a hevein sequence

    Energy Technology Data Exchange (ETDEWEB)

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    2000-07-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  15. cDNA encoding a polypeptide including a hevein sequence

    Energy Technology Data Exchange (ETDEWEB)

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    1999-05-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli. 12 figs.

  16. cDNA encoding a polypeptide including a hevein sequence

    Energy Technology Data Exchange (ETDEWEB)

    Raikhel, Natasha V. (Okemos, MI); Broekaert, Willem F. (Dilbeek, BE); Chua, Nam-Hai (Scarsdale, NY); Kush, Anil (New York, NY)

    1999-05-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  17. cDNA encoding a polypeptide including a hevein sequence

    Energy Technology Data Exchange (ETDEWEB)

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    1995-03-21

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1,018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli. 11 figures.

  18. Single DNA imaging and length quantification through a mobile phone microscope

    Science.gov (United States)

    Wei, Qingshan; Luo, Wei; Chiang, Samuel; Kappel, Tara; Mejia, Crystal; Tseng, Derek; Chan, Raymond Yan L.; Yan, Eddie; Qi, Hangfei; Shabbir, Faizan; Ozkan, Haydar; Feng, Steve; Ozcan, Aydogan

    2016-03-01

    The development of sensitive optical microscopy methods for the detection of single DNA molecules has become an active research area which cultivates various promising applications including point-of-care (POC) genetic testing and diagnostics. Direct visualization of individual DNA molecules usually relies on sophisticated optical microscopes that are mostly available in well-equipped laboratories. For POC DNA testing/detection, there is an increasing need for the development of new single DNA imaging and sensing methods that are field-portable, cost-effective, and accessible for diagnostic applications in resource-limited or field-settings. For this aim, we developed a mobile-phone integrated fluorescence microscopy platform that allows imaging and sizing of single DNA molecules that are stretched on a chip. This handheld device contains an opto-mechanical attachment integrated onto a smartphone camera module, which creates a high signal-to-noise ratio dark-field imaging condition by using an oblique illumination/excitation configuration. Using this device, we demonstrated imaging of individual linearly stretched λ DNA molecules (48 kilobase-pair, kbp) over 2 mm2 field-of-view. We further developed a robust computational algorithm and a smartphone app that allowed the users to quickly quantify the length of each DNA fragment imaged using this mobile interface. The cellphone based device was tested by five different DNA samples (5, 10, 20, 40, and 48 kbp), and a sizing accuracy of <1 kbp was demonstrated for DNA strands longer than 10 kbp. This mobile DNA imaging and sizing platform can be very useful for various diagnostic applications including the detection of disease-specific genes and quantification of copy-number-variations at POC settings.

  19. Molecular Precision at Micrometer Length Scales: Hierarchical Assembly of DNA-Protein Nanostructures.

    Science.gov (United States)

    Schiffels, Daniel; Szalai, Veronika A; Liddle, J Alexander

    2017-07-25

    Robust self-assembly across length scales is a ubiquitous feature of biological systems but remains challenging for synthetic structures. Taking a cue from biology-where disparate molecules work together to produce large, functional assemblies-we demonstrate how to engineer microscale structures with nanoscale features: Our self-assembly approach begins by using DNA polymerase to controllably create double-stranded DNA (dsDNA) sections on a single-stranded template. The single-stranded DNA (ssDNA) sections are then folded into a mechanically flexible skeleton by the origami method. This process simultaneously shapes the structure at the nanoscale and directs the large-scale geometry. The DNA skeleton guides the assembly of RecA protein filaments, which provides rigidity at the micrometer scale. We use our modular design strategy to assemble tetrahedral, rectangular, and linear shapes of defined dimensions. This method enables the robust construction of complex assemblies, greatly extending the range of DNA-based self-assembly methods.

  20. High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.

    Science.gov (United States)

    Chandrananda, Dineika; Thorne, Natalie P; Bahlo, Melanie

    2015-06-17

    High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data. In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome. We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA. These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This

  1. DNA Sequencing as a Tool to Monitor Marine Ecological Status

    Directory of Open Access Journals (Sweden)

    Kelly D. Goodwin

    2017-05-01

    Full Text Available Many ocean policies mandate integrated, ecosystem-based approaches to marine monitoring, driving a global need for efficient, low-cost bioindicators of marine ecological quality. Most traditional methods to assess biological quality rely on specialized expertise to provide visual identification of a limited set of specific taxonomic groups, a time-consuming process that can provide a narrow view of ecological status. In addition, microbial assemblages drive food webs but are not amenable to visual inspection and thus are largely excluded from detailed inventory. Molecular-based assessments of biodiversity and ecosystem function offer advantages over traditional methods and are increasingly being generated for a suite of taxa using a “microbes to mammals” or “barcodes to biomes” approach. Progress in these efforts coupled with continued improvements in high-throughput sequencing and bioinformatics pave the way for sequence data to be employed in formal integrated ecosystem evaluation, including food web assessments, as called for in the European Union Marine Strategy Framework Directive. DNA sequencing of bioindicators, both traditional (e.g., benthic macroinvertebrates, ichthyoplankton and emerging (e.g., microbial assemblages, fish via eDNA, promises to improve assessment of marine biological quality by increasing the breadth, depth, and throughput of information and by reducing costs and reliance on specialized taxonomic expertise.

  2. cgDNA: a software package for the prediction of sequence-dependent coarse-grain free energies of B-form DNA.

    Science.gov (United States)

    Petkevičiūtė, D; Pasi, M; Gonzalez, O; Maddocks, J H

    2014-11-10

    cgDNA is a package for the prediction of sequence-dependent configuration-space free energies for B-form DNA at the coarse-grain level of rigid bases. For a fragment of any given length and sequence, cgDNA calculates the configuration of the associated free energy minimizer, i.e. the relative positions and orientations of each base, along with a stiffness matrix, which together govern differences in free energies. The model predicts non-local (i.e. beyond base-pair step) sequence dependence of the free energy minimizer. Configurations can be input or output in either the Curves+ definition of the usual helical DNA structural variables, or as a PDB file of coordinates of base atoms. We illustrate the cgDNA package by comparing predictions of free energy minimizers from (a) the cgDNA model, (b) time-averaged atomistic molecular dynamics (or MD) simulations, and (c) NMR or X-ray experimental observation, for (i) the Dickerson-Drew dodecamer and (ii) three oligomers containing A-tracts. The cgDNA predictions are rather close to those of the MD simulations, but many orders of magnitude faster to compute. Both the cgDNA and MD predictions are in reasonable agreement with the available experimental data. Our conclusion is that cgDNA can serve as a highly efficient tool for studying structural variations in B-form DNA over a wide range of sequences. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Evidence for a Complex Mosaic Genome Pattern in a Full-length Hepatitis C Virus Sequence

    Directory of Open Access Journals (Sweden)

    R.S. Ross

    2008-01-01

    Full Text Available The genome of the hepatitis C virus (HCV exhibits a high genetic variability. This remarkable heterogeneity is mainly attributed to the gradual accumulation of mutational changes, whereas the contribution of recombination events to the evolution of HCV remains controversial so far. While performing phylogenetic analyses including a large number of sequences deposited in the GenBank, we encountered a full-length HCV sequence (AY651061 that showed evidence for inter-subtype recombination and was, therefore, subjected to a detailed analysis of its molecular structure. The obtained results indicated that AY651061 does not represent a “simple” HCV 1c isolate, but a complex 1a/1c mosaic genome, showing five putative breakpoints in the core to NS3 regions. To our knowledge, this is the first report on a mosaic HCV full- length sequence with multiple breakpoints. The molecular structure of AY651061 is reminiscent of complex homologous recombinant variants occurring among other members of the flaviviridae family, e.g. GB virus C, dengue virus, and Japanese encephalitis virus. Our finding of a mosaic HCV sequence may have important implications for many fields of current HCV research which merit careful consideration.

  4. Identification of the full-length β-actin sequence and expression profiles in the tree shrew (Tupaia belangeri).

    Science.gov (United States)

    Zheng, Yu; Yun, Chenxia; Wang, Qihui; Smith, Wanli W; Leng, Jing

    2015-02-01

    The tree shrew (Tupaia belangeri) diverges from the primate order (Primates) and is classified as a separate taxonomic group of mammals - Scandentia. It has been suggested that the tree shrew can be used as an animal model for studying human diseases; however, the genomic sequence of the tree shrew is largely unidentified. In the present study, we reported the full-length cDNA sequence of the housekeeping gene, β-actin, in the tree shrew. The amino acid sequence of β-actin in the tree shrew was compared to that of humans and other species; a simple phylogenetic relationship was discovered. Quantitative polymerase chain reaction (qPCR) and western blot analysis further demonstrated that the expression profiles of β-actin, as a general conservative housekeeping gene, in the tree shrew were similar to those in humans, although the expression levels varied among different types of tissue in the tree shrew. Our data provide evidence that the tree shrew has a close phylogenetic association with humans. These findings further enhance the potential that the tree shrew, as a species, may be used as an animal model for studying human disorders.

  5. Characterization of four species of Trichuris (Nematoda: Enoplida) by their second internal transcribed spacer ribosomal DNA sequence.

    Science.gov (United States)

    Oliveros, R; Cutillas, C; De Rojas, M; Arias, P

    2000-12-01

    Adult worms of Trichuris ovis and T. globulosa were collected from Ovis aries (sheep) and Capra hircus (goats). T. suis was isolated from Sus scrofa domestica (swine) and T. leporis was isolated from Lepus europaeus (rabbits) in Spain. Genomic DNA was isolated and a ribosomal internal transcribed spacer (ITS2) was amplified and sequenced using polymerase-chain-reaction (PCR) techniques. The ITS2 of T. ovis and T. globulosa was 407 nucleotides in length and had a GC content of about 62%. Furthermore, the ITS2 of T. suis and T. leporis was 534 and 418 nucleotides in length and had a GC content of about 64.8% and 62.4%, respectively. There was evidence of slight variation in the sequence within individuals of all species analyzed, indicating intraindividual variation in the sequence of different copies of the ribosomal DNA. Furthermore, low-level intraspecific variation was detected. Sequence analyses of ITS2 products of T. ovis and T. globulosa demonstrated no sequence difference between them. Nevertheless, differences were detected between the ITS2 sequences of T. suis, T. leporis, and T. ovis, indicating that Trichuris species can reliably be differentiated by their ITS2 sequences and PCR-linked restriction-fragment-length polymorphism (RFLP).

  6. A MapReduce Framework for DNA Sequencing Data Processing

    Directory of Open Access Journals (Sweden)

    Samy Ghoneimy

    2016-12-01

    Full Text Available Genomics and Next Generation Sequencers (NGS like Illumina Hiseq produce data in the order of ‎‎200 billion base pairs in a single one-week run for a 60x human genome coverage, which ‎requires modern high-throughput experimental technologies that can ‎only be tackled with high performance computing (HPC and specialized software algorithms called ‎‎“short read aligners”. This paper focuses on the implementation of the DNA sequencing as a set of MapReduce programs that will accept a DNA data set as a FASTQ file and finally generate a VCF (variant call format file, which has variants for a given DNA data set. In this paper MapReduce/Hadoop along with Burrows-Wheeler Aligner (BWA, Sequence Alignment/Map (SAM ‎tools, are fully utilized to provide various utilities for manipulating alignments, including sorting, merging, indexing, ‎and generating alignments. The Map-Sort-Reduce process is designed to be suited for a Hadoop framework in ‎which each cluster is a traditional N-node Hadoop cluster to utilize all of the Hadoop features like HDFS, program ‎management and fault tolerance. The Map step performs multiple instances of the short read alignment algorithm ‎‎(BoWTie that run in parallel in Hadoop. The ordered list of the sequence reads are used as input tuples and the ‎output tuples are the alignments of the short reads. In the Reduce step many parallel instances of the Short ‎Oligonucleotide Analysis Package for SNP (SOAPsnp algorithm run in the cluster. Input tuples are sorted ‎alignments for a partition and the output tuples are SNP calls. Results are stored via HDFS, and then archived in ‎SOAPsnp format. ‎ The proposed framework enables extremely fast discovering somatic mutations, inferring population genetical ‎parameters, and performing association tests directly based on sequencing data without explicit genotyping or ‎linkage-based imputation. It also demonstrate that this method achieves comparable

  7. Retroviral DNA Sequences as a Means for Determining Ancient Diets.

    Directory of Open Access Journals (Sweden)

    Jessica I Rivera-Perez

    Full Text Available For ages, specialists from varying fields have studied the diets of the primeval inhabitants of our planet, detecting diet remains in archaeological specimens using a range of morphological and biochemical methods. As of recent, metagenomic ancient DNA studies have allowed for the comparison of the fecal and gut microbiomes associated to archaeological specimens from various regions of the world; however the complex dynamics represented in those microbial communities still remain unclear. Theoretically, similar to eukaryote DNA the presence of genes from key microbes or enzymes, as well as the presence of DNA from viruses specific to key organisms, may suggest the ingestion of specific diet components. In this study we demonstrate that ancient virus DNA obtained from coprolites also provides information reconstructing the host's diet, as inferred from sequences obtained from pre-Columbian coprolites. This depicts a novel and reliable approach to determine new components as well as validate the previously suggested diets of extinct cultures and animals. Furthermore, to our knowledge this represents the first description of the eukaryotic viral diversity found in paleofaeces belonging to pre-Columbian cultures.

  8. Mitochondrial DNA sequencing of cat hair: an informative forensic tool.

    Science.gov (United States)

    Tarditi, Christy R; Grahn, Robert A; Evans, Jeffrey J; Kurushima, Jennifer D; Lyons, Leslie A

    2011-01-01

    Approximately 81.7 million cats are in 37.5 million U.S. households. Shed fur can be criminal evidence because of transfer to victims, suspects, and/or their belongings. To improve cat hairs as forensic evidence, the mtDNA control region from single hairs, with and without root tags, was sequenced. A dataset of a 402-bp control region segment from 174 random-bred cats representing four U.S. geographic areas was generated to determine the informativeness of the mtDNA region. Thirty-two mtDNA mitotypes were observed ranging in frequencies from 0.6-27%. Four common types occurred in all populations. Low heteroplasmy, 1.7%, was determined. Unique mitotypes were found in 18 individuals, 10.3% of the population studied. The calculated discrimination power implied that 8.3 of 10 randomly selected individuals can be excluded by this region. The genetic characteristics of the region and the generated dataset support the use of this cat mtDNA region in forensic applications. 2010 American Academy of Forensic Sciences. Published 2010. This article is a U.S. Government work and is in the public domain in the U.S.A.

  9. Complete nuclear ribosomal DNA sequence amplification and molecular analyses of Bangia (Bangiales, Rhodophyta) from China

    Science.gov (United States)

    Xu, Jiajie; Jiang, Bo; Chai, Sanming; He, Yuan; Zhu, Jianyi; Shen, Zonggen; Shen, Songdong

    2016-09-01

    Filamentous Bangia, which are distributed extensively throughout the world, have simple and similar morphological characteristics. Scientists can classify these organisms using molecular markers in combination with morphology. We successfully sequenced the complete nuclear ribosomal DNA, approximately 13 kb in length, from a marine Bangia population. We further analyzed the small subunit ribosomal DNA gene (nrSSU) and the internal transcribed spacer (ITS) sequence regions along with nine other marine, and two freshwater Bangia samples from China. Pairwise distances of the nrSSU and 5.8S ribosomal DNA gene sequences show the marine samples grouping together with low divergences (00.003; 0-0.006, respectively) from each other, but high divergences (0.123-0.126; 0.198, respectively) from freshwater samples. An exception is the marine sample collected from Weihai, which shows high divergence from both other marine samples (0.063-0.065; 0.129, respectively) and the freshwater samples (0.097; 0.120, respectively). A maximum likelihood phylogenetic tree based on a combined SSU-ITS dataset with maximum likelihood method shows the samples divided into three clades, with the two marine sample clades containing Bangia spp. from North America, Europe, Asia, and Australia; and one freshwater clade, containing Bangia atropurpurea from North America and China.

  10. Exploration of methods to localize DNA sequences missing from c-locus deletions

    International Nuclear Information System (INIS)

    Albritton, L.M.; Russell, L.B.; Montgomery, C.S.

    1987-01-01

    The authors have earlier characterized a large number of radiation-induced mutations at the c locus (on Chromosome 7) through genetic analysis, including extensive complementation tests. Based on this work, they have postulated that many of these mutations are deletions of various lengths, overlapping at c (the marker used in the mutation-rate experiments that generated the mutants). It was possible to apportion these deletions among 13 complementation groups and to fit them to a linear map of 8 functional units. Collectively, the deletions extend from a point between tp and c to one between sh-1 and Hbb, i.e., a genetic distance of from 6 to 10 cM, corresponding to at least 10 4 Kb of DNA. This year, the authors completed a pilot study designed to explore methods for finding DNA sequences that map to the region covered by the various c-deletions. The general plan was to probe DNA with clones derived from Chromosome-7-enriched libraries or with sequences known (or suspected) to reside in Chromosome 7. Three methods were explored for deriving the c-region-deficient DNA: (a) from mouse-hamster somatic-cell hydrids retaining a deleted mouse Chromosome 7, but no homologue; (b) from F 1 hybrids of M. musculus domesticus (carrying a c-locus deletion) by M. spretus; and (c) from F 1 hybrids of M. domesticus stocks carrying complementing deletions

  11. Transcription blockage by homopurine DNA sequences: role of sequence composition and single-strand breaks

    Science.gov (United States)

    Belotserkovskii, Boris P.; Neil, Alexander J.; Saleh, Syed Shayon; Shin, Jane Hae Soo; Mirkin, Sergei M.; Hanawalt, Philip C.

    2013-01-01

    The ability of DNA to adopt non-canonical structures can affect transcription and has broad implications for genome functioning. We have recently reported that guanine-rich (G-rich) homopurine-homopyrimidine sequences cause significant blockage of transcription in vitro in a strictly orientation-dependent manner: when the G-rich strand serves as the non-template strand [Belotserkovskii et al. (2010) Mechanisms and implications of transcription blockage by guanine-rich DNA sequences., Proc. Natl Acad. Sci. USA, 107, 12816–12821]. We have now systematically studied the effect of the sequence composition and single-stranded breaks on this blockage. Although substitution of guanine by any other base reduced the blockage, cytosine and thymine reduced the blockage more significantly than adenine substitutions, affirming the importance of both G-richness and the homopurine-homopyrimidine character of the sequence for this effect. A single-strand break in the non-template strand adjacent to the G-rich stretch dramatically increased the blockage. Breaks in the non-template strand result in much weaker blockage signals extending downstream from the break even in the absence of the G-rich stretch. Our combined data support the notion that transcription blockage at homopurine-homopyrimidine sequences is caused by R-loop formation. PMID:23275544

  12. Automated methods for single-stranded DNA isolation and dideoxynucleotide DNA sequencing reactions on a robotic workstation

    International Nuclear Information System (INIS)

    Mardis, E.R.; Roe, B.A.

    1989-01-01

    Automated procedures have been developed for both the simultaneous isolation of 96 single-stranded M13 chimeric template DNAs in less than two hours, and for simultaneously pipetting 24 dideoxynucleotide sequencing reactions on a commercially available laboratory workstation. The DNA sequencing results obtained by either radiolabeled or fluorescent methods are consistent with the premise that automation of these portions of DNA sequencing projects will improve the reproducibility of the DNA isolation and the procedures for these normally labor-intensive steps provides an approach for rapid acquisition of large amounts of high quality, reproducible DNA sequence data

  13. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure...... that the data produced is optimal. Although much of the procedure can be followed directly from the manufacturer's protocols, the key differences lie in the library preparation steps. This chapter presents an optimized protocol for the sequencing of fossil remains and museum specimens, commonly referred...

  14. DNA interaction with platinum-based cytostatics revealed by DNA sequencing.

    Science.gov (United States)

    Smerkova, Kristyna; Vaculovic, Tomas; Vaculovicova, Marketa; Kynicky, Jindrich; Brtnicky, Martin; Eckschlager, Tomas; Stiborova, Marie; Hubalek, Jaromir; Adam, Vojtech

    2017-12-15

    The main mechanism of action of platinum-based cytostatic drugs - cisplatin, oxaliplatin and carboplatin - is the formation of DNA cross-links, which restricts the transcription due to the disability of DNA to enter the active site of the polymerase. The polymerase chain reaction (PCR) was employed as a simplified model of the amplification process in the cell nucleus. PCR with fluorescently labelled dideoxynucleotides commonly employed for DNA sequencing was used to monitor the effect of platinum-based cytostatics on DNA in terms of decrease in labeling efficiency dependent on a presence of the DNA-drug cross-link. It was found that significantly different amounts of the drugs - cisplatin (0.21 μg/mL), oxaliplatin (5.23 μg/mL), and carboplatin (71.11 μg/mL) - were required to cause the same quenching effect (50%) on the fluorescent labelling of 50 μg/mL of DNA. Moreover, it was found that even though the amounts of the drugs was applied to the reaction mixture differing by several orders of magnitude, the amount of incorporated platinum, quantified by inductively coupled plasma mass spectrometry, was in all cases at the level of tenths of μg per 5 μg of DNA. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. New scoring schema for finding motifs in DNA Sequences

    Directory of Open Access Journals (Sweden)

    Nowzari-Dalini Abbas

    2009-03-01

    Full Text Available Abstract Background Pattern discovery in DNA sequences is one of the most fundamental problems in molecular biology with important applications in finding regulatory signals and transcription factor binding sites. An important task in this problem is to search (or predict known binding sites in a new DNA sequence. For this reason, all subsequences of the given DNA sequence are scored based on an scoring function and the prediction is done by selecting the best score. By assuming no dependency between binding site base positions, most of the available tools for known binding site prediction are designed. Recently Tomovic and Oakeley investigated the statistical basis for either a claim of dependence or independence, to determine whether such a claim is generally true, and they presented a scoring function for binding site prediction based on the dependency between binding site base positions. Our primary objective is to investigate the scoring functions which can be used in known binding site prediction based on the assumption of dependency or independency in binding site base positions. Results We propose a new scoring function based on the dependency between all positions in biding site base positions. This scoring function uses joint information content and mutual information as a measure of dependency between positions in transcription factor binding site. Our method for modeling dependencies is simply an extension of position independency methods. We evaluate our new scoring function on the real data sets extracted from JASPAR and TRANSFAC data bases, and compare the obtained results with two other well known scoring functions. Conclusion The results demonstrate that the new approach improves known binding site discovery and show that the joint information content and mutual information provide a better and more general criterion to investigate the relationships between positions in the TFBS. Our scoring function is formulated by simple

  16. Sequence- and structure-dependent DNA base dynamics: Synthesis, structure, and dynamics of site and sequence specifically spin-labeled DNA

    International Nuclear Information System (INIS)

    Spaltenstein, A.; Robinson, B.H.; Hopkins, P.B.

    1989-01-01

    A nitroxide spin-labeled analogue of thymidine (1a), in which the methyl group is replaced by an acetylene-tethered nitroxide, was evaluated as a probe for structural and dynamics studies of sequence specifically spin-labeled DNA. Residue 1a was incorporated into synthetic deoxyoligonucleotides by using automated phosphite triester methods. 1 H NMR, CD, and thermal denaturation studies indicate that 1a (T) does not significantly alter the structure of 5'-d(CGCGAATT*CGCG) from that of the native dodecamer. EPR studies on monomer, single-stranded, and duplexed DNA show that 1a readily distinguishes environments of different rigidity. Comparison of the general line-shape features of the observed EPR spectra of several small duplexes (12-mer, 24-mer) with simulated EPR spectra assuming isotropic motion suggests that probe 1a monitors global tumbling of small duplexes. Increasing the length of the DNA oligomers results in significant deviation from isotropic motion, with line-shape features similar to those of calculated spectra of objects with isotropic rotational correlation times of 20-100 ns. EPR spectra of a spin-labeled GT mismatch and a T bulge in long DNAs are distinct from those of spin-labeled Watson-Crick paired DNAs, further demonstrating the value of EPR as a tool in the evaluation of local dynamic and structural features in macromolecules

  17. A Novel Low Energy Electron Microscope for DNA Sequencing and Surface Analysis

    Science.gov (United States)

    Mankos, M.; Shadman, K.; Persson, H.H.J.; N’Diaye, A.T.; Schmid, A.K.; Davis, R.W.

    2014-01-01

    Monochromatic, aberration-corrected, dual-beam low energy electron microscopy (MAD-LEEM) is a novel technique that is directed towards imaging nanostructures and surfaces with sub-nanometer resolution. The technique combines a monochromator, a mirror aberration corrector, an energy filter, and dual beam illumination in a single instrument. The monochromator reduces the energy spread of the illuminating electron beam, which significantly improves spectroscopic and spatial resolution. Simulation results predict that the novel aberration corrector design will eliminate the second rank chromatic and third and fifth order spherical aberrations, thereby improving the resolution into the sub-nanometer regime at landing energies as low as one hundred electron-Volts. The energy filter produces a beam that can extract detailed information about the chemical composition and local electronic states of non-periodic objects such as nanoparticles, interfaces, defects, and macromolecules. The dual flood illumination eliminates charging effects that are generated when a conventional LEEM is used to image insulating specimens. A potential application for MAD-LEEM is in DNA sequencing, which requires high resolution to distinguish the individual bases and high speed to reduce the cost. The MAD-LEEM approach images the DNA with low electron impact energies, which provides nucleobase contrast mechanisms without organometallic labels. Furthermore, the micron-size field of view when combined with imaging on the fly provides long read lengths, thereby reducing the demand on assembling the sequence. Experimental results from bulk specimens with immobilized single-base oligonucleotides demonstrate that base specific contrast is available with reflected, photo-emitted, and Auger electrons. Image contrast simulations of model rectangular features mimicking the individual nucleotides in a DNA strand have been developed to translate measurements of contrast on bulk DNA to the detectability of

  18. A novel low energy electron microscope for DNA sequencing and surface analysis.

    Science.gov (United States)

    Mankos, M; Shadman, K; Persson, H H J; N'Diaye, A T; Schmid, A K; Davis, R W

    2014-10-01

    Monochromatic, aberration-corrected, dual-beam low energy electron microscopy (MAD-LEEM) is a novel technique that is directed towards imaging nanostructures and surfaces with sub-nanometer resolution. The technique combines a monochromator, a mirror aberration corrector, an energy filter, and dual beam illumination in a single instrument. The monochromator reduces the energy spread of the illuminating electron beam, which significantly improves spectroscopic and spatial resolution. Simulation results predict that the novel aberration corrector design will eliminate the second rank chromatic and third and fifth order spherical aberrations, thereby improving the resolution into the sub-nanometer regime at landing energies as low as one hundred electron-Volts. The energy filter produces a beam that can extract detailed information about the chemical composition and local electronic states of non-periodic objects such as nanoparticles, interfaces, defects, and macromolecules. The dual flood illumination eliminates charging effects that are generated when a conventional LEEM is used to image insulating specimens. A potential application for MAD-LEEM is in DNA sequencing, which requires high resolution to distinguish the individual bases and high speed to reduce the cost. The MAD-LEEM approach images the DNA with low electron impact energies, which provides nucleobase contrast mechanisms without organometallic labels. Furthermore, the micron-size field of view when combined with imaging on the fly provides long read lengths, thereby reducing the demand on assembling the sequence. Experimental results from bulk specimens with immobilized single-base oligonucleotides demonstrate that base specific contrast is available with reflected, photo-emitted, and Auger electrons. Image contrast simulations of model rectangular features mimicking the individual nucleotides in a DNA strand have been developed to translate measurements of contrast on bulk DNA to the detectability of

  19. Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs

    Directory of Open Access Journals (Sweden)

    Sugano Sumio

    2009-07-01

    Full Text Available Abstract Background Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites. Results In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes. Conclusion Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of

  20. Magnetic bead purification of labeled DNA fragments forhigh-throughput capillary electrophoresis sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Elkin, Christopher; Kapur, Hitesh; Smith, Troy; Humphries, David; Pollard, Martin; Hammon, Nancy; Hawkins, Trevor

    2001-09-15

    We have developed an automated purification method for terminator sequencing products based on a magnetic bead technology. This 384-well protocol generates labeled DNA fragments that are essentially free of contaminates for less than $0.005 per reaction. In comparison to laborious ethanol precipitation protocols, this method increases the phred20 read length by forty bases with various DNA templates such as PCR fragments, Plasmids, Cosmids and RCA products. Our method eliminates centrifugation and is compatible with both the MegaBACE 1000 and ABIPrism 3700 capillary instruments. As of September 2001, this method has produced over 1.6 million samples with 93 percent averaging 620 phred20 bases as part of Joint Genome Institutes Production Process.

  1. A pneumatic device for rapid loading of DNA sequencing gels.

    Science.gov (United States)

    Panussis, D A; Cook, M W; Rifkin, L L; Snider, J E; Strong, J T; McGrane, R M; Wilson, R K; Mardis, E R

    1998-05-01

    This work describes the design and construction of a device that facilitates the loading of DNA samples onto polyacrylamide gels for detection in the Perkin Elmer/Applied Biosystems (PE/ABI) 373 and 377 DNA sequencing instruments. The device is mounted onto the existing gel cassettes and makes the process of loading high-density gels less cumbersome while the associated time and errors are reduced. The principle of operation includes the simultaneous transfer of the entire batch of samples, in which a spring-loaded air cylinder generates positive pressure and flexible silica capillaries transfer the samples. A retractable capillary array carrier allows the delivery ends of the capillaries to be held up clear of the gel during loader attachment on the gel plates, while enabling their insertion in the gel wells once the device is securely mounted. Gel-loading devices capable of simultaneously transferring 72 samples onto the PE/ABI 373 and 377 are currently being used in our production sequencing groups while a 96-sample transfer prototype undergoes testing.

  2. Phylogenomics of Phrynosomatid Lizards: Conflicting Signals from Sequence Capture versus Restriction Site Associated DNA Sequencing

    Science.gov (United States)

    Leaché, Adam D.; Chavez, Andreas S.; Jones, Leonard N.; Grummer, Jared A.; Gottscho, Andrew D.; Linkem, Charles W.

    2015-01-01

    Sequence capture and restriction site associated DNA sequencing (RADseq) are popular methods for obtaining large numbers of loci for phylogenetic analysis. These methods are typically used to collect data at different evolutionary timescales; sequence capture is primarily used for obtaining conserved loci, whereas RADseq is designed for discovering single nucleotide polymorphisms (SNPs) suitable for population genetic or phylogeographic analyses. Phylogenetic questions that span both “recent” and “deep” timescales could benefit from either type of data, but studies that directly compare the two approaches are lacking. We compared phylogenies estimated from sequence capture and double digest RADseq (ddRADseq) data for North American phrynosomatid lizards, a species-rich and diverse group containing nine genera that began diversifying approximately 55 Ma. Sequence capture resulted in 584 loci that provided a consistent and strong phylogeny using concatenation and species tree inference. However, the phylogeny estimated from the ddRADseq data was sensitive to the bioinformatics steps used for determining homology, detecting paralogs, and filtering missing data. The topological conflicts among the SNP trees were not restricted to any particular timescale, but instead were associated with short internal branches. Species tree analysis of the largest SNP assembly, which also included the most missing data, supported a topology that matched the sequence capture tree. This preferred phylogeny provides strong support for the paraphyly of the earless lizard genera Holbrookia and Cophosaurus, suggesting that the earless morphology either evolved twice or evolved once and was subsequently lost in Callisaurus. PMID:25663487

  3. Isolation and sequence analysis of the wheat B genome subtelomeric DNA

    Directory of Open Access Journals (Sweden)

    Huneau Cecile

    2009-09-01

    Full Text Available Abstract Background Telomeric and subtelomeric regions are essential for genome stability and regular chromosome replication. In this work, we have characterized the wheat BAC (bacterial artificial chromosome clones containing Spelt1 and Spelt52 sequences, which belong to the subtelomeric repeats of the B/G genomes of wheats and Aegilops species from the section Sitopsis. Results The BAC library from Triticum aestivum cv. Renan was screened using Spelt1 and Spelt52 as probes. Nine positive clones were isolated; of them, clone 2050O8 was localized mainly to the distal parts of wheat chromosomes by in situ hybridization. The distribution of the other clones indicated the presence of different types of repetitive sequences in BACs. Use of different approaches allowed us to prove that seven of the nine isolated clones belonged to the subtelomeric chromosomal regions. Clone 2050O8 was sequenced and its sequence of 119 737 bp was annotated. It is composed of 33% transposable elements (TEs, 8.2% Spelt52 (namely, the subfamily Spelt52.2 and five non-TE-related genes. DNA transposons are predominant, making up 24.6% of the entire BAC clone, whereas retroelements account for 8.4% of the clone length. The full-length CACTA transposon Caspar covers 11 666 bp, encoding a transposase and CTG-2 proteins, and this transposon accounts for 40% of the DNA transposons. The in situ hybridization data for 2050O8 derived subclones in combination with the BLAST search against wheat mapped ESTs (expressed sequence tags suggest that clone 2050O8 is located in the terminal bin 4BL-10 (0.95-1.0. Additionally, four of the predicted 2050O8 genes showed significant homology to four putative orthologous rice genes in the distal part of rice chromosome 3S and confirm the synteny to wheat 4BL. Conclusion Satellite DNA sequences from the subtelomeric regions of diploid wheat progenitor can be used for selecting the BAC clones from the corresponding regions of hexaploid wheat

  4. Isolation and sequence analysis of the wheat B genome subtelomeric DNA.

    Science.gov (United States)

    Salina, Elena A; Sergeeva, Ekaterina M; Adonina, Irina G; Shcherban, Andrey B; Afonnikov, Dmitry A; Belcram, Harry; Huneau, Cecile; Chalhoub, Boulos

    2009-09-05

    Telomeric and subtelomeric regions are essential for genome stability and regular chromosome replication. In this work, we have characterized the wheat BAC (bacterial artificial chromosome) clones containing Spelt1 and Spelt52 sequences, which belong to the subtelomeric repeats of the B/G genomes of wheats and Aegilops species from the section Sitopsis. The BAC library from Triticum aestivum cv. Renan was screened using Spelt1 and Spelt52 as probes. Nine positive clones were isolated; of them, clone 2050O8 was localized mainly to the distal parts of wheat chromosomes by in situ hybridization. The distribution of the other clones indicated the presence of different types of repetitive sequences in BACs. Use of different approaches allowed us to prove that seven of the nine isolated clones belonged to the subtelomeric chromosomal regions. Clone 2050O8 was sequenced and its sequence of 119,737 bp was annotated. It is composed of 33% transposable elements (TEs), 8.2% Spelt52 (namely, the subfamily Spelt52.2) and five non-TE-related genes. DNA transposons are predominant, making up 24.6% of the entire BAC clone, whereas retroelements account for 8.4% of the clone length. The full-length CACTA transposon Caspar covers 11,666 bp, encoding a transposase and CTG-2 proteins, and this transposon accounts for 40% of the DNA transposons. The in situ hybridization data for 2050O8 derived subclones in combination with the BLAST search against wheat mapped ESTs (expressed sequence tags) suggest that clone 2050O8 is located in the terminal bin 4BL-10 (0.95-1.0). Additionally, four of the predicted 2050O8 genes showed significant homology to four putative orthologous rice genes in the distal part of rice chromosome 3S and confirm the synteny to wheat 4BL. Satellite DNA sequences from the subtelomeric regions of diploid wheat progenitor can be used for selecting the BAC clones from the corresponding regions of hexaploid wheat chromosomes. It has been demonstrated for the first time

  5. Effect of temperature and cycle length on microbial competition in PHB-producing sequencing batch reactor.

    Science.gov (United States)

    Jiang, Yang; Marang, Leonie; Kleerebezem, Robbert; Muyzer, Gerard; van Loosdrecht, Mark C M

    2011-05-01

    The impact of temperature and cycle length on microbial competition between polyhydroxybutyrate (PHB)-producing populations enriched in feast-famine sequencing batch reactors (SBRs) was investigated at temperatures of 20 °C and 30 °C, and in a cycle length range of 1-18 h. In this study, the microbial community structure of the PHB-producing enrichments was found to be strongly dependent on temperature, but not on cycle length. Zoogloea and Plasticicumulans acidivorans dominated the SBRs operated at 20 °C and 30 °C, respectively. Both enrichments accumulated PHB more than 75% of cell dry weight. Short-term temperature change experiments revealed that P. acidivorans was more temperature sensitive as compared with Zoogloea. This is particularly true for the PHB degradation, resulting in incomplete PHB degradation in P. acidivorans at 20 °C. Incomplete PHB degradation limited biomass growth and allowed Zoogloea to outcompete P. acidivorans. The PHB content at the end of the feast phase correlated well with the cycle length at a constant solid retention time (SRT). These results suggest that to establish enrichment with the capacity to store a high fraction of PHB, the number of cycles per SRT should be minimized independent of the temperature.

  6. DNA hybridization kinetics: zippering, internal displacement and sequence dependence.

    Science.gov (United States)

    Ouldridge, Thomas E; Sulc, Petr; Romano, Flavio; Doye, Jonathan P K; Louis, Ard A

    2013-10-01

    Although the thermodynamics of DNA hybridization is generally well established, the kinetics of this classic transition is less well understood. Providing such understanding has new urgency because DNA nanotechnology often depends critically on binding rates. Here, we explore DNA oligomer hybridization kinetics using a coarse-grained model. Strand association proceeds through a complex set of intermediate states, with successful binding events initiated by a few metastable base-pairing interactions, followed by zippering of the remaining bonds. But despite reasonably strong interstrand interactions, initial contacts frequently dissociate because typical configurations in which they form differ from typical states of similar enthalpy in the double-stranded equilibrium ensemble. Initial contacts must be stabilized by two or three base pairs before full zippering is likely, resulting in negative effective activation enthalpies. Non-Arrhenius behavior arises because the number of base pairs required for nucleation increases with temperature. In addition, we observe two alternative pathways-pseudoknot and inchworm internal displacement-through which misaligned duplexes can rearrange to form duplexes. These pathways accelerate hybridization. Our results explain why experimentally observed association rates of GC-rich oligomers are higher than rates of AT- rich equivalents, and more generally demonstrate how association rates can be modulated by sequence choice.

  7. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  8. A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties.

    Science.gov (United States)

    Pan, Gaofeng; Jiang, Limin; Tang, Jijun; Guo, Fei

    2018-02-08

    DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods-especially machine learning methods-have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k -gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria-area under the receiver operating characteristic curve (AUC), Matthew's correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity-are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.

  9. A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties

    Directory of Open Access Journals (Sweden)

    Gaofeng Pan

    2018-02-01

    Full Text Available DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods—especially machine learning methods—have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria—area under the receiver operating characteristic curve (AUC, Matthew’s correlation coefficient (MCC, accuracy (ACC, sensitivity (SN, and specificity—are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399 . For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.

  10. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  11. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    Science.gov (United States)

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  12. Determination of cDNA and genomic DNA sequences of hevamine, a chitinase from the rubber tree Hevea brasiliensis

    NARCIS (Netherlands)

    Bokma, E; Spiering, M; Chow, KS; Mulder, PPMFA; Subroto, T; Beintema, JJ

    Hevamine is a chitinase from the rubber tree Hevea brasiliensis and belongs to the family 18 glycosyl hydrolases. This paper describes the cloning of hevamine DNA and cDNA sequences. Hevamine contains a signal peptide at the N-terminus and a putative vacuolar targeting sequence at the C-terminus

  13. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    Directory of Open Access Journals (Sweden)

    Alamar Santiago

    2009-09-01

    Full Text Available Abstract Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new

  14. A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

    Science.gov (United States)

    Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

    2009-01-01

    Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an

  15. Construction of cDNA library and preliminary analysis of expressed sequence tags from Siberian tiger

    Science.gov (United States)

    Liu, Chang-Qing; Lu, Tao-Feng; Feng, Bao-Gang; Liu, Dan; Guan, Wei-Jun; Ma, Yue-Hui

    2010-01-01

    In this study we successfully constructed a full-length cDNA library from Siberian tiger, Panthera tigris altaica, the most well-known wild Animal. Total RNA was extracted from cultured Siberian tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.30×106 pfu/ml and 1.62×109 pfu/ml respectively. The proportion of recombinants from unamplified library was 90.5% and average length of exogenous inserts was 1.13 kb. A total of 282 individual ESTs with sizes ranging from 328 to 1,142bps were then analyzed the BLASTX score revealed that 53.9% of the sequences were classified as strong match, 38.6% as nominal and 7.4% as weak match. 28.0% of them were found to be related to enzyme/catalytic protein, 20.9% ESTs to metabolism, 13.1% ESTs to transport, 12.1% ESTs to signal transducer/cell communication, 9.9% ESTs to structure protein, 3.9% ESTs to immunity protein/defense metabolism, 3.2% ESTs to cell cycle, and 8.9 ESTs classified as novel genes. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genomic research of Siberian tigers. PMID:20941376

  16. Anti-DNA antibodies: Sequencing, cloning, and expression

    Energy Technology Data Exchange (ETDEWEB)

    Barry, M.M.

    1992-01-01

    To gain some insight into the mechanism of systemic lupus erythematosus, and the interactions involved in proteins binding to DNA four anti-DNA antibodies have been investigated. Two of the antibodies, Hed 10 and Jel 242, have previously been prepared from female NZB/NZW mice which develop an autoimmune disease resembling human SLE. The remaining two antibodies, Jel 72 and Jel 318, have previously been produced via immunization of C57BL/6 mice. The isotypes of the four antibodies investigated in this thesis were determined by an enzyme-linked-immunosorbent assay. All four antibodies contained [kappa] light chains and [gamma]2a heavy chains except Jel 318 which contains a [gamma]2b heavy chain. The complete variable regions of the heavy and light chains of these four antibodies were sequenced from their respective mRNAs. The gene segments and variable gene families expressed in each antibody were identified. Analysis of the genes used in the autoimmune anti-DNA antibodies and those produced by immunization indicated no obvious differences to account for their different origins. Examination of the amino acid residues present in the complementary-determining regions of these four antibodies indicates a preference for aromatic amino acids. Jel 72 and Jel 242 contain three arginine residues in the third complementary-determining region. A single-chain Fv and the variable region of the heavy chain of Hed 10 were expressed in Escherichia coli. Expression resulted in the production of a 26,000 M[sub r] protein and a 15,000 M[sub r] protein. An immunoblot indicated that the 26,000 M[sub r] protein was the Fv for Hed 10, while the 15,000 M[sub r] protein was shown to bind poly (dT). The contribution of the heavy chain to DNA binding was assessed.

  17. Efficient generation of recombinant RNA viruses using targeted recombination-mediated mutagenesis of bacterial artificial chromosomes containing full-length cDNA

    DEFF Research Database (Denmark)

    Rasmussen, Thomas Bruun; Risager, Peter Christian; Fahnøe, Ulrik

    2013-01-01

    Background Infectious cDNA clones are a prerequisite for directed genetic manipulation of RNA viruses. Here, a strategy to facilitate manipulation and rescue of classical swine fever viruses (CSFVs) from full-length cDNAs present within bacterial artificial chromosomes (BACs) is described....... This strategy allows manipulation of viral cDNA by targeted recombination-mediated mutagenesis within bacteria. Results A new CSFV-BAC (pBeloR26) derived from the Riems vaccine strain has been constructed and subsequently modified in the E2 coding sequence, using the targeted recombination strategy to enable...

  18. Comparative d2/d3 LSU–rDNA sequence study of some Iranian ...

    African Journals Online (AJOL)

    SERVER

    2007-11-05

    Nov 5, 2007 ... segments yielded one fragment at over all sequenced isolates as 787 bp in size. The DNA sequences were aligned .... expansion segments of the 28S rDNA subunit (D2/D3. LSU-rDNA) are the ... isolated from different geographical location from tea shrubs infested roots of Guilan province, Iran (Table 1).

  19. Sequence specificity and biological consequences of drugs that bind covalently in the minor groove of DNA

    International Nuclear Information System (INIS)

    Hurley, L.H.; Needham-VanDevanter, D.R.

    1986-01-01

    DNA ligands which bind within the minor groove of DNA exhibit varying degrees of sequence selectivity. Factors which contribute to nucleotide sequence recognition by minor groove ligands have been extensively investigated. Electrostatic interactions, ligand and DNA dehydration energies, hydrophobic interactions and steric factors all play significant roles in sequence selectivity in the minor groove. Interestingly, ligand recognition of nucleotide sequence in the minor groove does not involve significant hydrogen bonding. This is in sharp contrast to cellular enzyme and protein recognition of nucleotide sequence, which is achieved in the major groove via specific hydrogen bond formation between individual bases and the ligand. The ability to read nucleotide sequence via hydrogen bonding allows precise binding of proteins to specific DNA sequences. Minor groove ligands examined to date exhibit a much lower sequence specificity, generally binding to a subset of possible sequences, rather than a single sequence. 19 refs., 7 figs

  20. The DNA sequence of the human X chromosome

    Science.gov (United States)

    Ross, Mark T.; Grafham, Darren V.; Coffey, Alison J.; Scherer, Steven; McLay, Kirsten; Muzny, Donna; Platzer, Matthias; Howell, Gareth R.; Burrows, Christine; Bird, Christine P.; Frankish, Adam; Lovell, Frances L.; Howe, Kevin L.; Ashurst, Jennifer L.; Fulton, Robert S.; Sudbrak, Ralf; Wen, Gaiping; Jones, Matthew C.; Hurles, Matthew E.; Andrews, T. Daniel; Scott, Carol E.; Searle, Stephen; Ramser, Juliane; Whittaker, Adam; Deadman, Rebecca; Carter, Nigel P.; Hunt, Sarah E.; Chen, Rui; Cree, Andrew; Gunaratne, Preethi; Havlak, Paul; Hodgson, Anne; Metzker, Michael L.; Richards, Stephen; Scott, Graham; Steffen, David; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Ainscough, Rachael; Ambrose, Kerrie D.; Ansari-Lari, M. Ali; Aradhya, Swaroop; Ashwell, Robert I. S.; Babbage, Anne K.; Bagguley, Claire L.; Ballabio, Andrea; Banerjee, Ruby; Barker, Gary E.; Barlow, Karen F.; Barrett, Ian P.; Bates, Karen N.; Beare, David M.; Beasley, Helen; Beasley, Oliver; Beck, Alfred; Bethel, Graeme; Blechschmidt, Karin; Brady, Nicola; Bray-Allen, Sarah; Bridgeman, Anne M.; Brown, Andrew J.; Brown, Mary J.; Bonnin, David; Bruford, Elspeth A.; Buhay, Christian; Burch, Paula; Burford, Deborah; Burgess, Joanne; Burrill, Wayne; Burton, John; Bye, Jackie M.; Carder, Carol; Carrel, Laura; Chako, Joseph; Chapman, Joanne C.; Chavez, Dean; Chen, Ellson; Chen, Guan; Chen, Yuan; Chen, Zhijian; Chinault, Craig; Ciccodicola, Alfredo; Clark, Sue Y.; Clarke, Graham; Clee, Chris M.; Clegg, Sheila; Clerc-Blankenburg, Kerstin; Clifford, Karen; Cobley, Vicky; Cole, Charlotte G.; Conquer, Jen S.; Corby, Nicole; Connor, Richard E.; David, Robert; Davies, Joy; Davis, Clay; Davis, John; Delgado, Oliver; DeShazo, Denise; Dhami, Pawandeep; Ding, Yan; Dinh, Huyen; Dodsworth, Steve; Draper, Heather; Dugan-Rocha, Shannon; Dunham, Andrew; Dunn, Matthew; Durbin, K. James; Dutta, Ireena; Eades, Tamsin; Ellwood, Matthew; Emery-Cohen, Alexandra; Errington, Helen; Evans, Kathryn L.; Faulkner, Louisa; Francis, Fiona; Frankland, John; Fraser, Audrey E.; Galgoczy, Petra; Gilbert, James; Gill, Rachel; Glöckner, Gernot; Gregory, Simon G.; Gribble, Susan; Griffiths, Coline; Grocock, Russell; Gu, Yanghong; Gwilliam, Rhian; Hamilton, Cerissa; Hart, Elizabeth A.; Hawes, Alicia; Heath, Paul D.; Heitmann, Katja; Hennig, Steffen; Hernandez, Judith; Hinzmann, Bernd; Ho, Sarah; Hoffs, Michael; Howden, Phillip J.; Huckle, Elizabeth J.; Hume, Jennifer; Hunt, Paul J.; Hunt, Adrienne R.; Isherwood, Judith; Jacob, Leni; Johnson, David; Jones, Sally; de Jong, Pieter J.; Joseph, Shirin S.; Keenan, Stephen; Kelly, Susan; Kershaw, Joanne K.; Khan, Ziad; Kioschis, Petra; Klages, Sven; Knights, Andrew J.; Kosiura, Anna; Kovar-Smith, Christie; Laird, Gavin K.; Langford, Cordelia; Lawlor, Stephanie; Leversha, Margaret; Lewis, Lora; Liu, Wen; Lloyd, Christine; Lloyd, David M.; Loulseged, Hermela; Loveland, Jane E.; Lovell, Jamieson D.; Lozado, Ryan; Lu, Jing; Lyne, Rachael; Ma, Jie; Maheshwari, Manjula; Matthews, Lucy H.; McDowall, Jennifer; McLaren, Stuart; McMurray, Amanda; Meidl, Patrick; Meitinger, Thomas; Milne, Sarah; Miner, George; Mistry, Shailesh L.; Morgan, Margaret; Morris, Sidney; Müller, Ines; Mullikin, James C.; Nguyen, Ngoc; Nordsiek, Gabriele; Nyakatura, Gerald; O’Dell, Christopher N.; Okwuonu, Geoffery; Palmer, Sophie; Pandian, Richard; Parker, David; Parrish, Julia; Pasternak, Shiran; Patel, Dina; Pearce, Alex V.; Pearson, Danita M.; Pelan, Sarah E.; Perez, Lesette; Porter, Keith M.; Ramsey, Yvonne; Reichwald, Kathrin; Rhodes, Susan; Ridler, Kerry A.; Schlessinger, David; Schueler, Mary G.; Sehra, Harminder K.; Shaw-Smith, Charles; Shen, Hua; Sheridan, Elizabeth M.; Shownkeen, Ratna; Skuce, Carl D.; Smith, Michelle L.; Sotheran, Elizabeth C.; Steingruber, Helen E.; Steward, Charles A.; Storey, Roy; Swann, R. Mark; Swarbreck, David; Tabor, Paul E.; Taudien, Stefan; Taylor, Tineace; Teague, Brian; Thomas, Karen; Thorpe, Andrea; Timms, Kirsten; Tracey, Alan; Trevanion, Steve; Tromans, Anthony C.; d’Urso, Michele; Verduzco, Daniel; Villasana, Donna; Waldron, Lenee; Wall, Melanie; Wang, Qiaoyan; Warren, James; Warry, Georgina L.; Wei, Xuehong; West, Anthony; Whitehead, Siobhan L.; Whiteley, Mathew N.; Wilkinson, Jane E.; Willey, David L.; Williams, Gabrielle; Williams, Leanne; Williamson, Angela; Williamson, Helen; Wilming, Laurens; Woodmansey, Rebecca L.; Wray, Paul W.; Yen, Jennifer; Zhang, Jingkun; Zhou, Jianling; Zoghbi, Huda; Zorilla, Sara; Buck, David; Reinhardt, Richard; Poustka, Annemarie; Rosenthal, André; Lehrach, Hans; Meindl, Alfons; Minx, Patrick J.; Hillier, LaDeana W.; Willard, Huntington F.; Wilson, Richard K.; Waterston, Robert H.; Rice, Catherine M.; Vaudin, Mark; Coulson, Alan; Nelson, David L.; Weinstock, George; Sulston, John E.; Durbin, Richard; Hubbard, Tim; Gibbs, Richard A.; Beck, Stephan; Rogers, Jane; Bentley, David R.

    2009-01-01

    The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence. PMID:15772651

  1. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    Science.gov (United States)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  2. Production of a full-length infectious GFP-tagged cDNA clone of Beet mild yellowing virus for the study of plant-polerovirus interactions.

    Science.gov (United States)

    Stevens, Mark; Viganó, Felicita

    2007-04-01

    The full-length cDNA of Beet mild yellowing virus (Broom's Barn isolate) was sequenced and cloned into the vector pLitmus 29 (pBMYV-BBfl). The sequence of BMYV-BBfl (5721 bases) shared 96% and 98% nucleotide identity with the other complete sequences of BMYV (BMYV-2ITB, France and BMYV-IPP, Germany respectively). Full-length capped RNA transcripts of pBMYV-BBfl were synthesised and found to be biologically active in Arabidopsis thaliana protoplasts following electroporation or PEG inoculation when the protoplasts were subsequently analysed using serological and molecular methods. The BMYV sequence was modified by inserting DNA that encoded the jellyfish green fluorescent protein (GFP) into the P5 gene close to its 3' end. A. thaliana protoplasts electroporated with these RNA transcripts were biologically active and up to 2% of transfected protoplasts showed GFP-specific fluorescence. The exploitation of these cDNA clones for the study of the biology of beet poleroviruses is discussed.

  3. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    Science.gov (United States)

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  4. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human.

    Science.gov (United States)

    Wu, Chengchao; Yao, Shixin; Li, Xinghao; Chen, Chujia; Hu, Xuehai

    2017-02-16

    DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.

  5. Characterization of Satellite DNA Sequences from the Commercially Important Marine Rotifers Brachionus rotundiformis and Brachionus plicatilis.

    Science.gov (United States)

    Boehm; Gibson; Lubzens

    2000-01-01

    This study was initiated to search for species-specific and strain-specific satellite DNA sequences for which oligonucleotide primers could be designed to differentiate between various commercially important strains of the marine monogonont rotifers Brachionus rotundiformis and Brachionus plicatilis. Two unrelated, highly reiterated satellite sequences were cloned and characterized. The eight sequenced monomers from B. rotundiformis and six from B. plicatilis had low intrarepeat variability and were similar in their overall lengths, A + T compositions, and high degrees of repeated motif substructure. However, hybridizations to 19 representative strains, sequence characterizations, and GenBank searches indicated that these two satellites are morphotype-specific and population-specific, respectively, and share little homology to each other or to other characterized sequences in the database. Primer pairs designed for the B. rotundiformis satellite confirmed hybridization specificities on polymerase chain reaction and could serve as a useful molecular diagnostic tool to identify strains belonging to the SS morphotype, which are gaining widespread usage as first feeds for marine fish in commercial production.

  6. DNA fingerprinting of Mycobacterium tuberculosis: from phage typing to whole-genome sequencing.

    Science.gov (United States)

    Schürch, Anita C; van Soolingen, Dick

    2012-06-01

    Current typing methods for Mycobacterium tuberculosis complex evolved from simple phenotypic approaches like phage typing and drug susceptibility profiling to DNA-based strain typing methods, such as IS6110-restriction fragment length polymorphisms (RFLP) and variable number of tandem repeats (VNTR) typing. Examples of the usefulness of molecular typing are source case finding and epidemiological linkage of tuberculosis (TB) cases, international transmission of MDR/XDR-TB, the discrimination between endogenous reactivation and exogenous re-infection as a cause of relapses after curative treatment of tuberculosis, the evidence of multiple M. tuberculosis infections, and the disclosure of laboratory cross-contaminations. Simultaneously, phylogenetic analyses were developed based on single nucleotide polymorphisms (SNPs), genomic deletions usually referred to as regions of difference (RDs) and spoligotyping which served both strain typing and phylogenetic analysis. National and international initiatives that rely on the application of these typing methods have brought significant insight into the molecular epidemiology of tuberculosis. However, current DNA fingerprinting methods have important limitations. They can often not distinguish between genetically closely related strains and the turn-over of these markers is variable. Moreover, the suitability of most DNA typing methods for phylogenetic reconstruction is limited as they show a high propensity of convergent evolution or misinfer genetic distances. In order to fully explore the possibilities of genotyping in the molecular epidemiology of tuberculosis and to study the phylogeny of the causative bacteria reliably, the application of whole-genome sequencing (WGS) analysis for all M. tuberculosis isolates is the optimal, although currently still a costly solution. In the last years WGS for typing of pathogens has been explored and yielded important additional information on strain diversity in comparison to the

  7. Beyond DNA Sequencing in Space: Current and Future Omics Capabilities of the Biomolecule Sequencer Payload

    Science.gov (United States)

    Wallace, Sarah

    2017-01-01

    Why do we need a DNA sequencer to support the human exploration of space? (A) Operational environmental monitoring; (1) Identification of contaminating microbes, (2) Infectious disease diagnosis, (3) Reduce down mass (sample return for environmental monitoring, crew health, etc.). (B) Research; (1) Human, (2) Animal, (3) Microbes/Cell lines, (4) Plant. (C) Med Ops; (1) Response to countermeasures, (2) Radiation, (3) Real-time analysis can influence medical intervention. (C) Support astrobiology science investigations; (1) Technology superiorly suited to in situ nucleic acid-based life detection, (2) Functional testing for integration into robotics for extraplanetary exploration mission.

  8. Biomolecule Sequencer: Next-Generation DNA Sequencing Technology for In-Flight Environmental Monitoring, Research, and Beyond

    Science.gov (United States)

    Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.

    2016-01-01

    On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human

  9. Mechanism of sequence-specific template binding by the DNA primase of bacteriophage T7

    KAUST Repository

    Lee, Seung-Joo; Zhu, Bin; Hamdan, Samir; Richardson, Charles C.

    2010-01-01

    DNA primases catalyze the synthesis of the oligoribonucleotides required for the initiation of lagging strand DNA synthesis. Biochemical studies have elucidated the mechanism for the sequence-specific synthesis of primers. However, the physical

  10. Assessing the fidelity of ancient DNA sequences amplified from nuclear genes

    DEFF Research Database (Denmark)

    Binladen, Jonas; Wiuf, Carsten Henrik; Gilbert, M. Thomas P.

    2006-01-01

    To date, the field of ancient DNA has relied almost exclusively on mitochondrial DNA (mtDNA) sequences. However, a number of recent studies have reported the successful recovery of ancient nuclear DNA (nuDNA) sequences, thereby allowing the characterization of genetic loci directly involved...... in phenotypic traits of extinct taxa. It is well documented that postmortem damage in ancient mtDNA can lead to the generation of artifactual sequences. However, as yet no one has thoroughly investigated the damage spectrum in ancient nuDNA. By comparing clone sequences from 23 fossil specimens, recovered from...... adenine), respectively. Type 2 transitions are by far the most dominant and increase relative to those of type 1 with damage load. The results suggest that the deamination of cytosine (and 5-methyl cytosine) to uracil (and thymine) is the main cause of miscoding lesions in both ancient mtDNA and nu...

  11. Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.

    Science.gov (United States)

    Casals, Ferran; Anglada, Roger; Bonet, Núria; Rasal, Raquel; van der Gaag, Kristiaan J; Hoogenboom, Jerry; Solé-Morata, Neus; Comas, David; Calafell, Francesc

    2017-09-01

    We have genotyped the 58 STRs (27 autosomal, 24 Y-STRs and 7 X-STRs) and 94 autosomal SNPs in Illumina ForenSeq™ Primer Mix A in 88 Spanish Roma (Gypsy) samples and 143 Catalans. Since this platform is based in massive parallel sequencing, we have used simple R scripts to uncover the sequence variation in the repeat region. Thus, we have found, across 58 STRs, 541 length-based alleles, which, after considering repeat-sequence variation, became 804 different alleles. All loci in both populations were in Hardy-Weinberg equilibrium. F ST between both populations was 0.0178 for autosomal SNPs, 0.0146 for autosomal STRs, 0.0101 for X-STRs and 0.1866 for Y-STRs. Combined a priori statistics showed quite large; for instance, pooling all the autosomal loci, the a priori probabilities of discriminating a suspect become 1-(2.3×10 -70 ) and 1-(5.9×10 -73 ), for Roma and Catalans respectively, and the chances of excluding a false father in a trio are 1-(2.6×10 -20 ) and 1-(2.0×10 -21 ). Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Evaluation of a transposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogram quantities of DNA.

    Science.gov (United States)

    Marine, Rachel; Polson, Shawn W; Ravel, Jacques; Hatfull, Graham; Russell, Daniel; Sullivan, Matthew; Syed, Fraz; Dumas, Michael; Wommack, K Eric

    2011-11-01

    Construction of DNA fragment libraries for next-generation sequencing can prove challenging, especially for samples with low DNA yield. Protocols devised to circumvent the problems associated with low starting quantities of DNA can result in amplification biases that skew the distribution of genomes in metagenomic data. Moreover, sample throughput can be slow, as current library construction techniques are time-consuming. This study evaluated Nextera, a new transposon-based method that is designed for quick production of DNA fragment libraries from a small quantity of DNA. The sequence read distribution across nine phage genomes in a mock viral assemblage met predictions for six of the least-abundant phages; however, the rank order of the most abundant phages differed slightly from predictions. De novo genome assemblies from Nextera libraries provided long contigs spanning over half of the phage genome; in four cases where full-length genome sequences were available for comparison, consensus sequences were found to match over 99% of the genome with near-perfect identity. Analysis of areas of low and high sequence coverage within phage genomes indicated that GC content may influence coverage of sequences from Nextera libraries. Comparisons of phage genomes prepared using both Nextera and a standard 454 FLX Titanium library preparation protocol suggested that the coverage biases according to GC content observed within the Nextera libraries were largely attributable to bias in the Nextera protocol rather than to the 454 sequencing technology. Nevertheless, given suitable sequence coverage, the Nextera protocol produced high-quality data for genomic studies. For metagenomics analyses, effects of GC amplification bias would need to be considered; however, the library preparation standardization that Nextera provides should benefit comparative metagenomic analyses.

  13. Complete chloroplast genome and 45S nrDNA sequences of the medicinal plant species Glycyrrhiza glabra and Glycyrrhiza uralensis.

    Science.gov (United States)

    Kang, Sang-Ho; Lee, Jeong-Hoon; Lee, Hyun Oh; Ahn, Byoung Ohg; Won, So Youn; Sohn, Seong-Han; Kim, Jung Sun

    2017-10-06

    Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from these plants are widely used as natural sweeteners because of their much greater sweetness than sucrose. In this study, the three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of these two licorice species and an interspecific hybrid are presented. The chloroplast genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and 127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes, including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type. The complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA. We identified simple sequence repeat and tandem repeat sequences. We also developed four reliable markers for analysis of Glycyrrhiza diversity authentication.

  14. Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics.

    Science.gov (United States)

    Harvey, Michael G; Smith, Brian Tilston; Glenn, Travis C; Faircloth, Brant C; Brumfield, Robb T

    2016-09-01

    Sequence capture and restriction site associated DNA sequencing (RAD-Seq) are two genomic enrichment strategies for applying next-generation sequencing technologies to systematics studies. At shallow timescales, such as within species, RAD-Seq has been widely adopted among researchers, although there has been little discussion of the potential limitations and benefits of RAD-Seq and sequence capture. We discuss a series of issues that may impact the utility of sequence capture and RAD-Seq data for shallow systematics in non-model species. We review prior studies that used both methods, and investigate differences between the methods by re-analyzing existing RAD-Seq and sequence capture data sets from a Neotropical bird (Xenops minutus). We suggest that the strengths of RAD-Seq data sets for shallow systematics are the wide dispersion of markers across the genome, the relative ease and cost of laboratory work, the deep coverage and read overlap at recovered loci, and the high overall information that results. Sequence capture's benefits include flexibility and repeatability in the genomic regions targeted, success using low-quality samples, more straightforward read orthology assessment, and higher per-locus information content. The utility of a method in systematics, however, rests not only on its performance within a study, but on the comparability of data sets and inferences with those of prior work. In RAD-Seq data sets, comparability is compromised by low overlap of orthologous markers across species and the sensitivity of genetic diversity in a data set to an interaction between the level of natural heterozygosity in the samples examined and the parameters used for orthology assessment. In contrast, sequence capture of conserved genomic regions permits interrogation of the same loci across divergent species, which is preferable for maintaining comparability among data sets and studies for the purpose of drawing general conclusions about the impact of

  15. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects.

    Science.gov (United States)

    Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen

    2015-04-15

    In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA. The repDNA Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repDNA/. bliu@insun.hit.edu.cn or kcchou@gordonlifescience.org Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Dibenzotetraaza[14]annulene-adenine conjugate recognizes complementary poly dT among ss-DNA/ss-RNA sequences.

    Science.gov (United States)

    Radić Stojković, Marijana; Škugor, Marko; Tomić, Sanja; Grabar, Marina; Smrečki, Vilko; Dudek, Łukasz; Grolik, Jarosław; Eilmes, Julita; Piantanida, Ivo

    2013-06-28

    Among three novel DBTAA derivatives only the DBTAA-propyl-adenine conjugate showed recognition of the consecutive oligo dT sequence by increased affinity and specific induced chirooptical response in comparison to other single stranded RNA and DNA; whereby of particular importance is the up until now unique efficient differentiation between dT and rU. At variance, its close analogue DBTAA-hexyl-adenine did not reveal any selectivity between ss-DNA/RNA pointing out the important role of steric factors (linker length); moreover non-selectivity of the reference compound (, lacking adenine) stressed the importance of adenine interactions in the selectivity.

  17. [Whole Genome Sequencing of Human mtDNA Based on Ion Torrent PGM™ Platform].

    Science.gov (United States)

    Cao, Y; Zou, K N; Huang, J P; Ma, K; Ping, Y

    2017-08-01

    To analyze and detect the whole genome sequence of human mitochondrial DNA (mtDNA) by Ion Torrent PGM™ platform and to study the differences of mtDNA sequence in different tissues. Samples were collected from 6 unrelated individuals by forensic postmortem examination, including chest blood, hair, costicartilage, nail, skeletal muscle and oral epithelium. Amplification of whole genome sequence of mtDNA was performed by 4 pairs of primer. Libraries were constructed with Ion Shear™ Plus Reagents kit and Ion Plus Fragment Library kit. Whole genome sequencing of mtDNA was performed using Ion Torrent PGM™ platform. Sanger sequencing was used to determine the heteroplasmy positions and the mutation positions on HVⅠ region. The whole genome sequence of mtDNA from all samples were amplified successfully. Six unrelated individuals belonged to 6 different haplotypes. Different tissues in one individual had heteroplasmy difference. The heteroplasmy positions and the mutation positions on HVⅠ region were verified by Sanger sequencing. After a consistency check by the Kappa method, it was found that the results of mtDNA sequence had a high consistency in different tissues. The testing method used in present study for sequencing the whole genome sequence of human mtDNA can detect the heteroplasmy difference in different tissues, which have good consistency. The results provide guidance for the further applications of mtDNA in forensic science. Copyright© by the Editorial Department of Journal of Forensic Medicine

  18. Comparison of base composition analysis and Sanger sequencing of mitochondrial DNA for four U.S. population groups.

    Science.gov (United States)

    Kiesler, Kevin M; Coble, Michael D; Hall, Thomas A; Vallone, Peter M

    2014-01-01

    A set of 711 samples from four U.S. population groups was analyzed using a novel mass spectrometry based method for mitochondrial DNA (mtDNA) base composition profiling. Comparison of the mass spectrometry results with Sanger sequencing derived data yielded a concordance rate of 99.97%. Length heteroplasmy was identified in 46% of samples and point heteroplasmy was observed in 6.6% of samples in the combined mass spectral and Sanger data set. Using discrimination capacity as a metric, Sanger sequencing of the full control region had the highest discriminatory power, followed by the mass spectrometry base composition method, which was more discriminating than Sanger sequencing of just the hypervariable regions. This trend is in agreement with the number of nucleotides covered by each of the three assays. Published by Elsevier Ireland Ltd.

  19. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences

    DEFF Research Database (Denmark)

    Wernersson, Rasmus; Pedersen, Anders Gorm

    2003-01-01

    The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit...... proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans. RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA...

  20. A novel low energy electron microscope for DNA sequencing and surface analysis

    Energy Technology Data Exchange (ETDEWEB)

    Mankos, M., E-mail: marian@electronoptica.com [Electron Optica Inc., 1000 Elwell Court #110, Palo Alto, CA 94303 (United States); Shadman, K. [Electron Optica Inc., 1000 Elwell Court #110, Palo Alto, CA 94303 (United States); Persson, H.H.J. [Stanford Genome Technology Center, Stanford University School of Medicine, 855 California Avenue, Palo Alto, CA 94304 (United States); N’Diaye, A.T. [Electron Optica Inc., 1000 Elwell Court #110, Palo Alto, CA 94303 (United States); NCEM, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720 (United States); Schmid, A.K. [NCEM, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720 (United States); Davis, R.W. [Stanford Genome Technology Center, Stanford University School of Medicine, 855 California Avenue, Palo Alto, CA 94304 (United States)

    2014-10-15

    Monochromatic, aberration-corrected, dual-beam low energy electron microscopy (MAD-LEEM) is a novel technique that is directed towards imaging nanostructures and surfaces with sub-nanometer resolution. The technique combines a monochromator, a mirror aberration corrector, an energy filter, and dual beam illumination in a single instrument. The monochromator reduces the energy spread of the illuminating electron beam, which significantly improves spectroscopic and spatial resolution. Simulation results predict that the novel aberration corrector design will eliminate the second rank chromatic and third and fifth order spherical aberrations, thereby improving the resolution into the sub-nanometer regime at landing energies as low as one hundred electron-Volts. The energy filter produces a beam that can extract detailed information about the chemical composition and local electronic states of non-periodic objects such as nanoparticles, interfaces, defects, and macromolecules. The dual flood illumination eliminates charging effects that are generated when a conventional LEEM is used to image insulating specimens. A potential application for MAD-LEEM is in DNA sequencing, which requires high resolution to distinguish the individual bases and high speed to reduce the cost. The MAD-LEEM approach images the DNA with low electron impact energies, which provides nucleobase contrast mechanisms without organometallic labels. Furthermore, the micron-size field of view when combined with imaging on the fly provides long read lengths, thereby reducing the demand on assembling the sequence. Experimental results from bulk specimens with immobilized single-base oligonucleotides demonstrate that base specific contrast is available with reflected, photo-emitted, and Auger electrons. Image contrast simulations of model rectangular features mimicking the individual nucleotides in a DNA strand have been developed to translate measurements of contrast on bulk DNA to the detectability of

  1. Integrative annotation of 21,037 human genes validated by full-length cDNA clones.

    Directory of Open Access Journals (Sweden)

    Tadashi Imanishi

    2004-06-01

    Full Text Available The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/. It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs, identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA

  2. Endophytic bacterial diversity in grapevine (Vitis vinifera L.) leaves described by 16S rRNA gene sequence analysis and length heterogeneity-PCR.

    Science.gov (United States)

    Bulgari, Daniela; Casati, Paola; Brusetti, Lorenzo; Quaglino, Fabio; Brasca, Milena; Daffonchio, Daniele; Bianco, Piero Attilio

    2009-08-01

    Diversity of bacterial endophytes associated with grapevine leaf tissues was analyzed by cultivation and cultivation-independent methods. In order to identify bacterial endophytes directly from metagenome, a protocol for bacteria enrichment and DNA extraction was optimized. Sequence analysis of 16S rRNA gene libraries underscored five diverse Operational Taxonomic Units (OTUs), showing best sequence matches with gamma-Proteobacteria, family Enterobacteriaceae, with a dominance of the genus Pantoea. Bacteria isolation through cultivation revealed the presence of six OTUs, showing best sequence matches with Actinobacteria, genus Curtobacterium, and with Firmicutes genera Bacillus and Enterococcus. Length Heterogeneity-PCR (LH-PCR) electrophoretic peaks from single bacterial clones were used to setup a database representing the bacterial endophytes identified in association with grapevine tissues. Analysis of healthy and phytoplasma-infected grapevine plants showed that LH-PCR could be a useful complementary tool for examining the diversity of bacterial endophytes especially for diversity survey on a large number of samples.

  3. Transformation of Cowpea Vigna unguiculata with a Full-Length DNA Copy of Cowpea Mosaic Virus M-RNA

    NARCIS (Netherlands)

    Hille, Jacques; Goldbach, Rob

    1987-01-01

    A full-length DNA copy of the M-RNA of cowpea mosaic virus (CPMV), supplied with either the 35S promoter from cauliflower mosaic virus (CaMV) or the nopaline synthase promoter from Agrobacterium tumefaciens, was introduced into the T-DNA region of a Ti-plasmid-derived gene vector and transferred to

  4. Mapped DNA probes from Ioblolly pine can be used for restriction fragment length polymorphism mapping in other conifers

    Science.gov (United States)

    M.R. Ahuja; M.E. Devey; A.T. Groover; K.D. Jermstad; D.B Neale

    1994-01-01

    A high-density genetic map based on restriction fragment length polymorphisms (RFLPs) is being constructed for loblolly pine (Pinus taeda L.). Consequently, a large number of DNA probes from loblolly pine are potentially available for use in other species. We have used some of these DNA probes to detect RFLPs in 12 conifers and an angiosperm....

  5. Sequencing of megabase plus DNA by hybridization: Method development ENT. Final technical progress report

    Energy Technology Data Exchange (ETDEWEB)

    Crkvenjakov, R.; Drmanac, R.

    1991-01-31

    Sequencing by hybridization (SBH) is the only sequencing method based on the experimental determination of the content of oligonucleotide sequences. The data acquisition relies on the natural process of base pairing. It is possible to determine the content of complementary oligosequences in the target DNA by the process of hybridization with oligonucleotide probes of known sequences.

  6. Microsatellite DNA in genomic survey sequences and UniGenes of loblolly pine

    Science.gov (United States)

    Craig S Echt; Surya Saha; Dennis L Deemer; C Dana Nelson

    2011-01-01

    Genomic DNA sequence databases are a potential and growing resource for simple sequence repeat (SSR) marker development in loblolly pine (Pinus taeda L.). Loblolly pine also has many expressed sequence tags (ESTs) available for microsatellite (SSR) marker development. We compared loblolly pine SSR densities in genome survey sequences (GSSs) to those in non-redundant...

  7. DNA fingerprinting of Mycobacterium leprae strains using variable number tandem repeat (VNTR) - fragment length analysis (FLA).

    Science.gov (United States)

    Jensen, Ronald W; Rivest, Jason; Li, Wei; Vissa, Varalakshmi

    2011-07-15

    presence of the desired DNA segments, and then submitted for fluorescent fragment length analysis (FLA) using capillary electrophoresis. DNA from armadillo passaged bacteria with a known number of repeat copies for each locus is used as a positive control. The FLA chromatograms are then examined using Peak Scanner software and fragment length is converted to number of VNTR copies (allele). Finally, the VNTR haplotypes are analyzed for patterns, and when combined with patient clinical data can be used to track distribution of strain types.

  8. A 28,000 Years Old Cro-Magnon mtDNA Sequence Differs from All Potentially Contaminating Modern Sequences

    Science.gov (United States)

    Caramelli, David; Milani, Lucio; Vai, Stefania; Modi, Alessandra; Pecchioli, Elena; Girardi, Matteo; Pilli, Elena; Lari, Martina; Lippi, Barbara; Ronchitelli, Annamaria; Mallegni, Francesco; Casoli, Antonella; Bertorelle, Giorgio; Barbujani, Guido

    2008-01-01

    Background DNA sequences from ancient speciments may in fact result from undetected contamination of the ancient specimens by modern DNA, and the problem is particularly challenging in studies of human fossils. Doubts on the authenticity of the available sequences have so far hampered genetic comparisons between anatomically archaic (Neandertal) and early modern (Cro-Magnoid) Europeans. Methodology/Principal Findings We typed the mitochondrial DNA (mtDNA) hypervariable region I in a 28,000 years old Cro-Magnoid individual from the Paglicci cave, in Italy (Paglicci 23) and in all the people who had contact with the sample since its discovery in 2003. The Paglicci 23 sequence, determined through the analysis of 152 clones, is the Cambridge reference sequence, and cannot possibly reflect contamination because it differs from all potentially contaminating modern sequences. Conclusions/Significance: The Paglicci 23 individual carried a mtDNA sequence that is still common in Europe, and which radically differs from those of the almost contemporary Neandertals, demonstrating a genealogical continuity across 28,000 years, from Cro-Magnoid to modern Europeans. Because all potential sources of modern DNA contamination are known, the Paglicci 23 sample will offer a unique opportunity to get insight for the first time into the nuclear genes of early modern Europeans. PMID:18628960

  9. A 28,000 years old Cro-Magnon mtDNA sequence differs from all potentially contaminating modern sequences.

    Directory of Open Access Journals (Sweden)

    David Caramelli

    Full Text Available BACKGROUND: DNA sequences from ancient specimens may in fact result from undetected contamination of the ancient specimens by modern DNA, and the problem is particularly challenging in studies of human fossils. Doubts on the authenticity of the available sequences have so far hampered genetic comparisons between anatomically archaic (Neandertal and early modern (Cro-Magnoid Europeans. METHODOLOGY/PRINCIPAL FINDINGS: We typed the mitochondrial DNA (mtDNA hypervariable region I in a 28,000 years old Cro-Magnoid individual from the Paglicci cave, in Italy (Paglicci 23 and in all the people who had contact with the sample since its discovery in 2003. The Paglicci 23 sequence, determined through the analysis of 152 clones, is the Cambridge reference sequence, and cannot possibly reflect contamination because it differs from all potentially contaminating modern sequences. CONCLUSIONS/SIGNIFICANCE: The Paglicci 23 individual carried a mtDNA sequence that is still common in Europe, and which radically differs from those of the almost contemporary Neandertals, demonstrating a genealogical continuity across 28,000 years, from Cro-Magnoid to modern Europeans. Because all potential sources of modern DNA contamination are known, the Paglicci 23 sample will offer a unique opportunity to get insight for the first time into the nuclear genes of early modern Europeans.

  10. Sequence analysis of mitochondrial DNA hypervariable region III of ...

    African Journals Online (AJOL)

    Aghomotsegin

    2015-07-01

    Jul 1, 2015 ... population genetics research, studies based on mitochondrial DNA (mtDNA) and Y-chromosome DNA are an excellent way of illustrating population structure .... avoid landing investigators into serious situations of medical genetic privacy and ethnics, especially for. mtDNA coding area whose mutation often ...

  11. Protein and DNA sequence determinants of thermophilic adaptation.

    Directory of Open Access Journals (Sweden)

    Konstantin B Zeldovich

    2007-01-01

    Full Text Available There have been considerable attempts in the past to relate phenotypic trait--habitat temperature of organisms--to their genotypes, most importantly compositions of their genomes and proteomes. However, despite accumulation of anecdotal evidence, an exact and conclusive relationship between the former and the latter has been elusive. We present an exhaustive study of the relationship between amino acid composition of proteomes, nucleotide composition of DNA, and optimal growth temperature (OGT of prokaryotes. Based on 204 complete proteomes of archaea and bacteria spanning the temperature range from -10 degrees C to 110 degrees C, we performed an exhaustive enumeration of all possible sets of amino acids and found a set of amino acids whose total fraction in a proteome is correlated, to a remarkable extent, with the OGT. The universal set is Ile, Val, Tyr, Trp, Arg, Glu, Leu (IVYWREL, and the correlation coefficient is as high as 0.93. We also found that the G + C content in 204 complete genomes does not exhibit a significant correlation with OGT (R = -0.10. On the other hand, the fraction of A + G in coding DNA is correlated with temperature, to a considerable extent, due to codon patterns of IVYWREL amino acids. Further, we found strong and independent correlation between OGT and the frequency with which pairs of A and G nucleotides appear as nearest neighbors in genome sequences. This adaptation is achieved via codon bias. These findings present a direct link between principles of proteins structure and stability and evolutionary mechanisms of thermophylic adaptation. On the nucleotide level, the analysis provides an example of how nature utilizes codon bias for evolutionary adaptation to extreme conditions. Together these results provide a complete picture of how compositions of proteomes and genomes in prokaryotes adjust to the extreme conditions of the environment.

  12. A leaf sequencing algorithm to enlarge treatment field length in IMRT

    International Nuclear Information System (INIS)

    Xia Ping; Hwang, Andrew B.; Verhey, Lynn J.

    2002-01-01

    With MLC-based IMRT, the maximum usable field size is often smaller than the maximum field size for conventional treatments. This is due to the constraints of the overtravel distances of MLC leaves and/or jaws. Using a new leaf sequencing algorithm, the usable IMRT field length (perpendicular to the MLC motion) can be mostly made equal to the full length of the MLC field without violating the upper jaw overtravel limit. For any given intensity pattern, a criterion was proposed to assess whether an intensity pattern can be delivered without violation of the jaw position constraints. If the criterion is met, the new algorithm will consider the jaw position constraints during the segmentation for the step and shoot delivery method. The strategy employed by the algorithm is to connect the intensity elements outside the jaw overtravel limits with those inside the jaw overtravel limits. Several methods were used to establish these connections during segmentation by modifying a previously published algorithm (areal algorithm), including changing the intensity level, alternating the leaf-sequencing direction, or limiting the segment field size. The algorithm was tested with 1000 random intensity patterns with dimensions of 21x27 cm2, 800 intensity patterns with higher intensity outside the jaw overtravel limit, and three different types of clinical treatment plans that were undeliverable using a segmentation method from a commercial treatment planning system. The new algorithm achieved a success rate of 100% with these test patterns. For the 1000 random patterns, the new algorithm yields a similar average number of segments of 36.9±2.9 in comparison to 36.6±1.3 when using the areal algorithm. For the 800 patterns with higher intensities outside the jaw overtravel limits, the new algorithm results in an increase of 25% in the average number of segments compared to the areal algorithm. However, the areal algorithm fails to create deliverable segments for 90% of these

  13. Complete sequence analysis of 18S rDNA based on genomic DNA extraction from individual Demodex mites (Acari: Demodicidae).

    Science.gov (United States)

    Zhao, Ya-E; Xu, Ji-Ru; Hu, Li; Wu, Li-Ping; Wang, Zheng-Hang

    2012-05-01

    The study for the first time attempted to accomplish 18S ribosomal DNA (rDNA) complete sequence amplification and analysis for three Demodex species (Demodex folliculorum, Demodex brevis and Demodex canis) based on gDNA extraction from individual mites. The mites were treated by DNA Release Additive and Hot Start II DNA Polymerase so as to promote mite disruption and increase PCR specificity. Determination of D. folliculorum gDNA showed that the gDNA yield reached the highest at 1 mite, tending to descend with the increase of mite number. The individual mite gDNA was successfully used for 18S rDNA fragment (about 900 bp) amplification examination. The alignments of 18S rDNA complete sequences of individual mite samples and those of pooled mite samples ( ≥ 1000mites/sample) showed over 97% identities for each species, indicating that the gDNA extracted from a single individual mite was as satisfactory as that from pooled mites for PCR amplification. Further pairwise sequence analyses showed that average divergence, genetic distance, transition/transversion or phylogenetic tree could not effectively identify the three Demodex species, largely due to the differentiation in the D. canis isolates. It can be concluded that the individual Demodex mite gDNA can satisfy the molecular study of Demodex. 18S rDNA complete sequence is suitable for interfamily identification in Cheyletoidea, but whether it is suitable for intrafamily identification cannot be confirmed until the ascertainment of the types of Demodex mites parasitizing in dogs. Copyright © 2012 Elsevier Inc. All rights reserved.

  14. Complete sequences of the mitochondrial DNA of the wild Gracilariopsis lemaneiformis and two mutagenic cultivated breeds (Gracilariaceae, Rhodophyta.

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    Full Text Available The complete mitochondrial DNA (mtDNA of Gracilariopsis lemaneiformis was sequenced (25883 bp and mapped to a circular model. The A+T composition was 72.5%. Forty six genes and two potentially functional open reading frames were identified. They include 24 protein-coding genes, 2 rRNA genes, 20 tRNA genes and 2 ORFs (orf60, orf142. There is considerable sequence synteny across the five red algal mtDNAs falling into Florideophyceae including Gr. lemaneiformis in this study and previously sequenced species. A long stem-loop and a hairpin structure were identified in intergenic regions of mt genome of Gr. lemaneiformis, which are believed to be involved with transcription and replication. In addition, the mtDNAs of two mutagenic cultivated breeds ("981" and "07-2" were also sequenced. Compared with the mtDNA of wild Gr. lemaneiformis, the genome size and gene length and order of three strains were completely identical except nine base mutations including eight in the protein-coding genes and one in the tRNA gene. None of the base mutations caused frameshift or a premature stop codon in the mtDNA genes. Phylogenetic analyses based on mitochondrial protein-coding genes and rRNA genes demonstrated Gracilariopsis andersonii had closer phylogenetic relationship with its parasite Gracilariophila oryzoides than Gracilariopsis lemaneiformis which was from the same genus of Gracilariopsis.

  15. Complete sequences of the mitochondrial DNA of the wild Gracilariopsis lemaneiformis and two mutagenic cultivated breeds (Gracilariaceae, Rhodophyta).

    Science.gov (United States)

    Zhang, Lei; Wang, Xumin; Qian, Hao; Chi, Shan; Liu, Cui; Liu, Tao

    2012-01-01

    The complete mitochondrial DNA (mtDNA) of Gracilariopsis lemaneiformis was sequenced (25883 bp) and mapped to a circular model. The A+T composition was 72.5%. Forty six genes and two potentially functional open reading frames were identified. They include 24 protein-coding genes, 2 rRNA genes, 20 tRNA genes and 2 ORFs (orf60, orf142). There is considerable sequence synteny across the five red algal mtDNAs falling into Florideophyceae including Gr. lemaneiformis in this study and previously sequenced species. A long stem-loop and a hairpin structure were identified in intergenic regions of mt genome of Gr. lemaneiformis, which are believed to be involved with transcription and replication. In addition, the mtDNAs of two mutagenic cultivated breeds ("981" and "07-2") were also sequenced. Compared with the mtDNA of wild Gr. lemaneiformis, the genome size and gene length and order of three strains were completely identical except nine base mutations including eight in the protein-coding genes and one in the tRNA gene. None of the base mutations caused frameshift or a premature stop codon in the mtDNA genes. Phylogenetic analyses based on mitochondrial protein-coding genes and rRNA genes demonstrated Gracilariopsis andersonii had closer phylogenetic relationship with its parasite Gracilariophila oryzoides than Gracilariopsis lemaneiformis which was from the same genus of Gracilariopsis.

  16. High-throughput sequencing of 16S rRNA gene amplicons: effects of extraction procedure, primer length and annealing temperature.

    Science.gov (United States)

    Sergeant, Martin J; Constantinidou, Chrystala; Cogan, Tristan; Penn, Charles W; Pallen, Mark J

    2012-01-01

    The analysis of 16S-rDNA sequences to assess the bacterial community composition of a sample is a widely used technique that has increased with the advent of high throughput sequencing. Although considerable effort has been devoted to identifying the most informative region of the 16S gene and the optimal informatics procedures to process the data, little attention has been paid to the PCR step, in particular annealing temperature and primer length. To address this, amplicons derived from 16S-rDNA were generated from chicken caecal content DNA using different annealing temperatures, primers and different DNA extraction procedures. The amplicons were pyrosequenced to determine the optimal protocols for capture of maximum bacterial diversity from a chicken caecal sample. Even at very low annealing temperatures there was little effect on the community structure, although the abundance of some OTUs such as Bifidobacterium increased. Using shorter primers did not reveal any novel OTUs but did change the community profile obtained. Mechanical disruption of the sample by bead beating had a significant effect on the results obtained, as did repeated freezing and thawing. In conclusion, existing primers and standard annealing temperatures captured as much diversity as lower annealing temperatures and shorter primers.

  17. Sequence-based separation of single-stranded DNA using nucleotides in capillary electrophoresis: focus on phosphate.

    Science.gov (United States)

    Zhang, Xueru; McGown, Linda B

    2013-06-01

    DNA analysis has widespread applicability in biology, medicine, biotechnology, and forensics. DNA separation by length is readily achieved using sieving gels in electrophoresis. Separation by sequence is less simple, generally requiring adequate differences in native or induced conformation or differences in thermal or chemical stability of the strands that are hybridized prior to measurement. We previously demonstrated separation of four single-stranded DNA 76-mers that differ by only a few A-G substitutions based solely on sequence using guanosine-5'-monophosphate (GMP) in the running buffer. We attributed separation to the unique self-assembly of GMP to form higher order structures. Here, we examine an expanded set of 76-mers designed to probe the mechanism of the separation and effects of experimental conditions. We were surprised to find that other ribonucleotides achieved the similar separation to GMP, and that some separation was achieved using sodium phosphate instead of GMP. Potassium phosphate achieved almost as good separations as the ribonucleotides. This suggests that the separation medium provides a physicochemical environment for the DNA that effects strand migration in a sequence-selective manner. Further investigation is needed to determine whether the mechanism involves specific interactions between the phosphates and the DNA strands or is a result of other properties of the separation medium. Phosphate generally has been avoided in DNA separations by capillary gel electrophoresis because its high ionic strength exacerbates Joule heating. Our results suggest that phosphate compounds should be examined for separation of DNA based on sequence. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. DNA breaks and repair in interstitial telomere sequences: Influence of chromatin structure

    International Nuclear Information System (INIS)

    Revaud, D.

    2009-06-01

    Interstitial Telomeric Sequences (ITS) are over-involved in spontaneous and radiationinduced chromosome aberrations in chinese hamster cells. We have performed a study to investigate the origin of their instability, spontaneously or after low doses irradiation. Our results demonstrate that ITS have a particular chromatin structure: short nucleotide repeat length, less compaction of the 30 nm chromatin fiber, presence of G-quadruplex structures. These features would modulate breaks production and would favour the recruitment of alternative DNA repair mechanisms, which are prone to produce chromosome aberrations. These pathways could be at the origin of chromosome aberrations in ITS whereas NHEJ and HR Double Strand Break repair pathways are rather required for a correct repair in these regions. (author)

  19. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution

    NARCIS (Netherlands)

    Falconer, Ester; Hills, Mark; Naumann, Ulrike; Poon, Steven S. S.; Chavez, Elizabeth A.; Sanders, Ashley D.; Zhao, Yongjun; Hirst, Martin; Lansdorp, Peter M.

    DNA rearrangements such as sister chromatid exchanges (SCEs) are sensitive indicators of genomic stress and instability, but they are typically masked by single-cell sequencing techniques. We developed Strand-seq to independently sequence parental DNA template strands from single cells, making it

  20. The DNA sequence and biology of human chromosome 19

    Energy Technology Data Exchange (ETDEWEB)

    Grimwood, J; Gordon, L A; Olsen, A; Terry, A; Schmutz, J; Lamerdin, J; Hellsten, U; Goodstein, D; Couronne, O; Tran-Gyamfi, M

    2004-04-06

    Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high GC content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in Mendelian disorders, including familial hypercholesterolemia and insulin-resistant diabetes. Nearly one quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

  1. Phylogenetic analysis of Gossypium L. using restriction fragment length polymorphism of repeated sequences.

    Science.gov (United States)

    Zhang, Meiping; Rong, Ying; Lee, Mi-Kyung; Zhang, Yang; Stelly, David M; Zhang, Hong-Bin

    2015-10-01

    Cotton is the world's leading textile fiber crop and is also grown as a bioenergy and food crop. Knowledge of the phylogeny of closely related species and the genome origin and evolution of polyploid species is significant for advanced genomics research and breeding. We have reconstructed the phylogeny of the cotton genus, Gossypium L., and deciphered the genome origin and evolution of its five polyploid species by restriction fragment analysis of repeated sequences. Nuclear DNA of 84 accessions representing 35 species and all eight genomes of the genus were analyzed. The phylogenetic tree of the genus was reconstructed using the parsimony method on 1033 polymorphic repeated sequence restriction fragments. The genome origin of its polyploids was determined by calculating the diploid-polyploid restriction fragment correspondence (RFC). The tree is consistent with the morphological classification, genome designation and geographic distribution of the species at subgenus, section and subsection levels. Gossypium lobatum (D7) was unambiguously shown to have the highest RFC with the D-subgenomes of all five polyploids of the genus, while the common ancestor of Gossypium herbaceum (A1) and Gossypium arboreum (A2) likely contributed to the A-subgenomes of the polyploids. These results provide a comprehensive phylogenetic tree of the cotton genus and new insights into the genome origin and evolution of its polyploid species. The results also further demonstrate a simple, rapid and inexpensive method suitable for phylogenetic analysis of closely related species, especially congeneric species, and the inference of genome origin of polyploids that constitute over 70 % of flowering plants.

  2. Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

    Directory of Open Access Journals (Sweden)

    Rachel Caldwell

    2015-01-01

    Full Text Available There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length.

  3. High Interlaboratory Reprocucibility of DNA Sequence-based Typing of Bacteria in a Multicenter Study

    DEFF Research Database (Denmark)

    Sousa, MA de; Boye, Kit; Lencastre, H de

    2006-01-01

    Current DNA amplification-based typing methods for bacterial pathogens often lack interlaboratory reproducibility. In this international study, DNA sequence-based typing of the Staphylococcus aureus protein A gene (spa, 110 to 422 bp) showed 100% intra- and interlaboratory reproducibility without...... extensive harmonization of protocols for 30 blind-coded S. aureus DNA samples sent to 10 laboratories. Specialized software for automated sequence analysis ensured a common typing nomenclature....

  4. Targeted DNA Methylation Analysis by High Throughput Sequencing in Porcine Peri-attachment Embryos

    OpenAIRE

    MORRILL, Benson H.; COX, Lindsay; WARD, Anika; HEYWOOD, Sierra; PRATHER, Randall S.; ISOM, S. Clay

    2013-01-01

    Abstract The purpose of this experiment was to implement and evaluate the effectiveness of a next-generation sequencing-based method for DNA methylation analysis in porcine embryonic samples. Fourteen discrete genomic regions were amplified by PCR using bisulfite-converted genomic DNA derived from day 14 in vivo-derived (IVV) and parthenogenetic (PA) porcine embryos as template DNA. Resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome Analyzer IIx plat...

  5. Elasticity of short DNA molecules: theory and experiment for contour lengths of 0.6-7 microm.

    Science.gov (United States)

    Seol, Yeonee; Li, Jinyu; Nelson, Philip C; Perkins, Thomas T; Betterton, M D

    2007-12-15

    The wormlike chain (WLC) model currently provides the best description of double-stranded DNA elasticity for micron-sized molecules. This theory requires two intrinsic material parameters-the contour length L and the persistence length p. We measured and then analyzed the elasticity of double-stranded DNA as a function of L (632 nm-7.03 microm) using the classic solution to the WLC model. When the elasticity data were analyzed using this solution, the resulting fitted value for the persistence length p(wlc) depended on L; even for moderately long DNA molecules (L = 1300 nm), this apparent persistence length was 10% smaller than its limiting value for long DNA. Because p is a material parameter, and cannot depend on length, we sought a new solution to the WLC model, which we call the "finite wormlike chain (FWLC)," to account for effects not considered in the classic solution. Specifically we accounted for the finite chain length, the chain-end boundary conditions, and the bead rotational fluctuations inherent in optical trapping assays where beads are used to apply the force. After incorporating these corrections, we used our FWLC solution to generate force-extension curves, and then fit those curves with the classic WLC solution, as done in the standard experimental analysis. These results qualitatively reproduced the apparent dependence of p(wlc) on L seen in experimental data when analyzed with the classic WLC solution. Directly fitting experimental data to the FWLC solution reduces the apparent dependence of p(fwlc) on L by a factor of 3. Thus, the FWLC solution provides a significantly improved theoretical framework in which to analyze single-molecule experiments over a broad range of experimentally accessible DNA lengths, including both short (a few hundred nanometers in contour length) and very long (microns in contour length) molecules.

  6. Sequence analysis of mitochondrial DNA hypervariable region III of ...

    African Journals Online (AJOL)

    The aims of this research were to study mitochondrial DNA hypervariable region III and establish the degree of variation characteristic of a fragment. The mitochondrial DNA (mtDNA) is a small circular genome located within the mitochondria in the cytoplasm of the cell and a smaller 1.2 kb pair fragment, called the control ...

  7. cDNA sequencing improves the detection of P53 missense mutations in colorectal cancer

    International Nuclear Information System (INIS)

    Szybka, Malgorzata; Kordek, Radzislaw; Zakrzewska, Magdalena; Rieske, Piotr; Pasz-Walczak, Grazyna; Kulczycka-Wojdala, Dominika; Zawlik, Izabela; Stawski, Robert; Jesionek-Kupnicka, Dorota; Liberski, Pawel P

    2009-01-01

    Recently published data showed discrepancies beteween P53 cDNA and DNA sequencing in glioblastomas. We hypothesised that similar discrepancies may be observed in other human cancers. To this end, we analyzed 23 colorectal cancers for P53 mutations and gene expression using both DNA and cDNA sequencing, real-time PCR and immunohistochemistry. We found P53 gene mutations in 16 cases (15 missense and 1 nonsense). Two of the 15 cases with missense mutations showed alterations based only on cDNA, and not DNA sequencing. Moreover, in 6 of the 15 cases with a cDNA mutation those mutations were difficult to detect in the DNA sequencing, so the results of DNA analysis alone could be misinterpreted if the cDNA sequencing results had not also been available. In all those 15 cases, we observed a higher ratio of the mutated to the wild type template by cDNA analysis, but not by the DNA analysis. Interestingly, a similar overexpression of P53 mRNA was present in samples with and without P53 mutations. In terms of colorectal cancer, those discrepancies might be explained under three conditions: 1, overexpression of mutated P53 mRNA in cancer cells as compared with normal cells; 2, a higher content of cells without P53 mutation (normal cells and cells showing K-RAS and/or APC but not P53 mutation) in samples presenting P53 mutation; 3, heterozygous or hemizygous mutations of P53 gene. Additionally, for heterozygous mutations unknown mechanism(s) causing selective overproduction of mutated allele should also be considered. Our data offer new clues for studying discrepancy in P53 cDNA and DNA sequencing analysis

  8. Functional role of a highly repetitive DNA sequence in anchorage of the mouse genome.

    Science.gov (United States)

    Neuer-Nitsche, B; Lu, X N; Werner, D

    1988-09-12

    The major portion of the eukaryotic genome consists of various categories of repetitive DNA sequences which have been studied with respect to their base compositions, organizations, copy numbers, transcription and species specificities; their biological roles, however, are still unclear. A novel quality of a highly repetitive mouse DNA sequence is described which points to a functional role: All copies (approximately 50,000 per haploid genome) of this DNA sequence reside on genomic Alu I DNA fragments each associated with nuclear polypeptides that are not released from DNA by proteinase K, SDS and phenol extraction. By this quality the repetitive DNA sequence is classified as a member of the sub-set of DNA sequences involved in tight DNA-polypeptide complexes which have been previously shown to be components of the subnuclear structure termed 'nuclear matrix'. From these results it has to be concluded that the repetitive DNA sequence characterized in this report represents or comprises a signal for a large number of site specific attachment points of the mouse genome in the nuclear matrix.

  9. Sequencing historical specimens: successful preparation of small specimens with low amounts of degraded DNA.

    Science.gov (United States)

    Sproul, John S; Maddison, David R

    2017-11-01

    Despite advances that allow DNA sequencing of old museum specimens, sequencing small-bodied, historical specimens can be challenging and unreliable as many contain only small amounts of fragmented DNA. Dependable methods to sequence such specimens are especially critical if the specimens are unique. We attempt to sequence small-bodied (3-6 mm) historical specimens (including nomenclatural types) of beetles that have been housed, dried, in museums for 58-159 years, and for which few or no suitable replacement specimens exist. To better understand ideal approaches of sample preparation and produce preparation guidelines, we compared different library preparation protocols using low amounts of input DNA (1-10 ng). We also explored low-cost optimizations designed to improve library preparation efficiency and sequencing success of historical specimens with minimal DNA, such as enzymatic repair of DNA. We report successful sample preparation and sequencing for all historical specimens despite our low-input DNA approach. We provide a list of guidelines related to DNA repair, bead handling, reducing adapter dimers and library amplification. We present these guidelines to facilitate more economical use of valuable DNA and enable more consistent results in projects that aim to sequence challenging, irreplaceable historical specimens. © 2017 John Wiley & Sons Ltd.

  10. Sequence-Dependent Diastereospecific and Diastereodivergent Crosslinking of DNA by Decarbamoylmitomycin C.

    Science.gov (United States)

    Aguilar, William; Paz, Manuel M; Vargas, Anayatzinc; Clement, Cristina C; Cheng, Shu-Yuan; Champeil, Elise

    2018-04-20

    Mitomycin C (MC), a potent antitumor drug, and decarbamoylmitomycin C (DMC), a derivative lacking the carbamoyl group, form highly cytotoxic DNA interstrand crosslinks. The major interstrand crosslink formed by DMC is the C1'' epimer of the major crosslink formed by MC. The molecular basis for the stereochemical configuration exhibited by DMC was investigated using biomimetic synthesis. The formation of DNA-DNA crosslinks by DMC is diastereospecific and diastereodivergent: Only the 1''S-diastereomer of the initially formed monoadduct can form crosslinks at GpC sequences, and only the 1''R-diastereomer of the monoadduct can form crosslinks at CpG sequences. We also show that CpG and GpC sequences react with divergent diastereoselectivity in the first alkylation step: 1"S stereochemistry is favored at GpC sequences and 1''R stereochemistry is favored at CpG sequences. Therefore, the first alkylation step results, at each sequence, in the selective formation of the diastereomer able to generate an interstrand DNA-DNA crosslink after the "second arm" alkylation. Examination of the known DNA adduct pattern obtained after treatment of cancer cell cultures with DMC indicates that the GpC sequence is the major target for the formation of DNA-DNA crosslinks in vivo by this drug. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Cloning, sequencing, and expression of dnaK-operon proteins from the thermophilic bacterium Thermus thermophilus.

    Science.gov (United States)

    Osipiuk, J; Joachimiak, A

    1997-09-12

    We propose that the dnaK operon of Thermus thermophilus HB8 is composed of three functionally linked genes: dnaK, grpE, and dnaJ. The dnaK and dnaJ gene products are most closely related to their cyanobacterial homologs. The DnaK protein sequence places T. thermophilus in the plastid Hsp70 subfamily. In contrast, the grpE translated sequence is most similar to GrpE from Clostridium acetobutylicum, a Gram-positive anaerobic bacterium. A single promoter region, with homology to the Escherichia coli consensus promoter sequences recognized by the sigma70 and sigma32 transcription factors, precedes the postulated operon. This promoter is heat-shock inducible. The dnaK mRNA level increased more than 30 times upon 10 min of heat shock (from 70 degrees C to 85 degrees C). A strong transcription terminating sequence was found between the dnaK and grpE genes. The individual genes were cloned into pET expression vectors and the thermophilic proteins were overproduced at high levels in E. coli and purified to homogeneity. The recombinant T. thermophilus DnaK protein was shown to have a weak ATP-hydrolytic activity, with an optimum at 90 degrees C. The ATPase was stimulated by the presence of GrpE and DnaJ. Another open reading frame, coding for ClpB heat-shock protein, was found downstream of the dnaK operon.

  12. Analysis of T-DNA/Host-Plant DNA Junction Sequences in Single-Copy Transgenic Barley Lines

    Directory of Open Access Journals (Sweden)

    Joanne G. Bartlett

    2014-01-01

    Full Text Available Sequencing across the junction between an integrated transfer DNA (T-DNA and a host plant genome provides two important pieces of information. The junctions themselves provide information regarding the proportion of T-DNA which has integrated into the host plant genome, whilst the transgene flanking sequences can be used to study the local genetic environment of the integrated transgene. In addition, this information is important in the safety assessment of GM crops and essential for GM traceability. In this study, a detailed analysis was carried out on the right-border T-DNA junction sequences of single-copy independent transgenic barley lines. T-DNA truncations at the right-border were found to be relatively common and affected 33.3% of the lines. In addition, 14.3% of lines had rearranged construct sequence after the right border break-point. An in depth analysis of the host-plant flanking sequences revealed that a significant proportion of the T-DNAs integrated into or close to known repetitive elements. However, this integration into repetitive DNA did not have a negative effect on transgene expression.

  13. Allelic sequence variations in the hypervariable region of a T-cell receptor β chain: Correlation with restriction fragment length polymorphism in human families and populations

    International Nuclear Information System (INIS)

    Robinson, M.A.

    1989-01-01

    Direct sequence analysis of the human T-cell antigen receptor (TCR) V β1 variable gene identified a single base-pair allelic variation (C/G) located within the coding region. This change results in substitution of a histidine (CAC) for a glutamine (CAG) at position 48 of the TCR β chain, a position predicted to be in the TCR antigen binding site. The V β1 polymorphism was found by DNA sequence analysis of V β1 genes from seven unrelated individuals; V β1 genes were amplified by the polymerase chain reaction, the amplified fragments were cloned into M13 phage vectors, and sequences were determined. To determined the inheritance patterns of the V β1 substitution and to test correlation with V β1 restriction fragment length polymorphism detected with Pvu II and Taq I, allele-specific oligonucleotides were constructed and used to characterize amplified DNA samples. Seventy unrelated individuals and six families were tested for both restriction fragment length polymorphism and for the V β1 substitution. The correlation was also tested using amplified, size-selected, Pvu II- and Taq I-digested DNA samples from heterozygotes. Pvu II allele 1 (61/70) and Taq I allele 1 (66/70) were found to be correlated with the substitution giving rise to a histidine at position 48. Because there are exceptions to the correlation, the use of specific probes to characterize allelic forms of TCR variable genes will provide important tools for studies of basic TCR genetics and disease associations

  14. Two dimensional molecular electronics spectroscopy for molecular fingerprinting, DNA sequencing, and cancerous DNA recognition.

    Science.gov (United States)

    Rajan, Arunkumar Chitteth; Rezapour, Mohammad Reza; Yun, Jeonghun; Cho, Yeonchoo; Cho, Woo Jong; Min, Seung Kyu; Lee, Geunsik; Kim, Kwang S

    2014-02-25

    Laser-driven molecular spectroscopy of low spatial resolution is widely used, while electronic current-driven molecular spectroscopy of atomic scale resolution has been limited because currents provide only minimal information. However, electron transmission of a graphene nanoribbon on which a molecule is adsorbed shows molecular fingerprints of Fano resonances, i.e., characteristic features of frontier orbitals and conformations of physisorbed molecules. Utilizing these resonance profiles, here we demonstrate two-dimensional molecular electronics spectroscopy (2D MES). The differential conductance with respect to bias and gate voltages not only distinguishes different types of nucleobases for DNA sequencing but also recognizes methylated nucleobases which could be related to cancerous cell growth. This 2D MES could open an exciting field to recognize single molecule signatures at atomic resolution. The advantages of the 2D MES over the one-dimensional (1D) current analysis can be comparable to those of 2D NMR over 1D NMR analysis.

  15. Cytogenetic Analysis of Populus trichocarpa - Ribosomal DNA, Telomere Repeat Sequence, and Marker-selected BACs

    Science.gov (United States)

    M.N. lslam-Faridi; C.D. Nelson; S.P. DiFazio; L.E. Gunter; G.A. Tuskan

    2009-01-01

    The 185-285 rDNA and 55 rDNA loci in Populus trichocarpa were localized using fluorescent in situ hybridization (FISH). Two 185-285 rDNA sites and one 55 rDNA site were identified and located at the ends of 3 different chromosomes. FISH signals from the Arabidopsis-type telomere repeat sequence were observed at the distal ends of each chromosome. Six BAC clones...

  16. cDNA sequence of human transforming gene hst and identification of the coding sequence required for transforming activity

    International Nuclear Information System (INIS)

    Taira, M.; Yoshida, T.; Miyagawa, K.; Sakamoto, H.; Terada, M.; Sugimura, T.

    1987-01-01

    The hst gene was originally identified as a transforming gene in DNAs from human stomach cancers and from a noncancerous portion of stomach mucosa by DNA-mediated transfection assay using NIH3T3 cells. cDNA clones of hst were isolated from the cDNA library constructed from poly(A) + RNA of a secondary transformant induced by the DNA from a stomach cancer. The sequence analysis of the hst cDNA revealed the presence of two open reading frames. When this cDNA was inserted into an expression vector containing the simian virus 40 promoter, it efficiently induced the transformation of NIH3T3 cells upon transfection. It was found that one of the reading frames, which coded for 206 amino acids, was responsible for the transforming activity

  17. Real sequence effects on the search dynamics of transcription factors on DNA

    DEFF Research Database (Denmark)

    Bauer, Maximilian; Rasmussen, Emil S.; Lomholt, Michael A.

    2015-01-01

    Recent experiments show that transcription factors (TFs) indeed use the facilitated diffusion mechanism to locate their target sequences on DNA in living bacteria cells: TFs alternate between sliding motion along DNA and relocation events through the cytoplasm. From simulations and theoretical...... analysis we study the TF-sliding motion for a large section of the DNA-sequence of a common E. coli strain, based on the two-state TF-model with a fast-sliding search state and a recognition state enabling target detection. For the probability to detect the target before dissociating from DNA the TF...... on the underlying nucleotide sequence is varied. A moderate dependence maximises the capability to distinguish between the main operator and similar sequences. Moreover, these auxiliary operators serve as starting points for DNA looping with the main operator, yielding a spectrum of target detection times spanning...

  18. [Cloning and sequencing of KIR2DL1 framework gene cDNA and identification of a novel allele].

    Science.gov (United States)

    Sun, Ge; Wang, Chang; Zhen, Jianxin; Zhang, Guobin; Xu, Yunping; Deng, Zhihui

    2016-10-01

    To develop an assay for cDNA cloning and haplotype sequencing of KIR2DL1 framework gene and determine the genotype of an ethnic Han from southern China. Total RNA was isolated from peripheral blood sample, and complementary DNA (cDNA) transcript was synthesized by RT-PCR. The entire coding sequence of the KIR2DL1 framework gene was amplified with a pair of KIR2DL1-specific PCR primers. The PCR products with a length of approximately 1.2 kb were then subjected to cloning and haplotype sequencing. A specific target fragment of the KIR2DL1 framework gene was obtained. Following allele separation, a wild-type KIR2DL1*00302 allele and a novel variant allele, KIR2DL1*031, were identified. Sequence alignment with KIR2DL1 alleles from the IPD-KIR Database showed that the novel allele KIR2DL1*031 has differed from the closest allele KIR2DL1*00302 by a non-synonymous mutation at CDS nt 188A>G (codon 42 GAG>GGG) in exon 4, which has caused an amino acid change Glu42Gly. The sequence of the novel allele KIR2DL1*031 was submitted to GenBank under the accession number KP025960 and to the IPD-KIR Database under the submission number IWS40001982. A name KIR2DL1*031 has been officially assigned by the World Health Organization (WHO) Nomenclature Committee. An assay for cDNA cloning and haplotype sequencing of KIR2DL1 has been established, which has a broad applications in KIR studies at allelic level.

  19. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors.

    Science.gov (United States)

    Adalsteinsson, Viktor A; Ha, Gavin; Freeman, Samuel S; Choudhury, Atish D; Stover, Daniel G; Parsons, Heather A; Gydush, Gregory; Reed, Sarah C; Rotem, Denisse; Rhoades, Justin; Loginov, Denis; Livitz, Dimitri; Rosebrock, Daniel; Leshchiner, Ignaty; Kim, Jaegil; Stewart, Chip; Rosenberg, Mara; Francis, Joshua M; Zhang, Cheng-Zhong; Cohen, Ofir; Oh, Coyin; Ding, Huiming; Polak, Paz; Lloyd, Max; Mahmud, Sairah; Helvie, Karla; Merrill, Margaret S; Santiago, Rebecca A; O'Connor, Edward P; Jeong, Seong H; Leeson, Rachel; Barry, Rachel M; Kramkowski, Joseph F; Zhang, Zhenwei; Polacek, Laura; Lohr, Jens G; Schleicher, Molly; Lipscomb, Emily; Saltzman, Andrea; Oliver, Nelly M; Marini, Lori; Waks, Adrienne G; Harshman, Lauren C; Tolaney, Sara M; Van Allen, Eliezer M; Winer, Eric P; Lin, Nancy U; Nakabayashi, Mari; Taplin, Mary-Ellen; Johannessen, Cory M; Garraway, Levi A; Golub, Todd R; Boehm, Jesse S; Wagle, Nikhil; Getz, Gad; Love, J Christopher; Meyerson, Matthew

    2017-11-06

    Whole-exome sequencing of cell-free DNA (cfDNA) could enable comprehensive profiling of tumors from blood but the genome-wide concordance between cfDNA and tumor biopsies is uncertain. Here we report ichorCNA, software that quantifies tumor content in cfDNA from 0.1× coverage whole-genome sequencing data without prior knowledge of tumor mutations. We apply ichorCNA to 1439 blood samples from 520 patients with metastatic prostate or breast cancers. In the earliest tested sample for each patient, 34% of patients have ≥10% tumor-derived cfDNA, sufficient for standard coverage whole-exome sequencing. Using whole-exome sequencing, we validate the concordance of clonal somatic mutations (88%), copy number alterations (80%), mutational signatures, and neoantigens between cfDNA and matched tumor biopsies from 41 patients with ≥10% cfDNA tumor content. In summary, we provide methods to identify patients eligible for comprehensive cfDNA profiling, revealing its applicability to many patients, and demonstrate high concordance of cfDNA and metastatic tumor whole-exome sequencing.

  20. Targeting and tracing of specific DNA sequences with dTALEs in living cells

    Science.gov (United States)

    Thanisch, Katharina; Schneider, Katrin; Morbitzer, Robert; Solovei, Irina; Lahaye, Thomas; Bultmann, Sebastian; Leonhardt, Heinrich

    2014-01-01

    Epigenetic regulation of gene expression involves, besides DNA and histone modifications, the relative positioning of DNA sequences within the nucleus. To trace specific DNA sequences in living cells, we used programmable sequence-specific DNA binding of designer transcription activator-like effectors (dTALEs). We designed a recombinant dTALE (msTALE) with variable repeat domains to specifically bind a 19-bp target sequence of major satellite DNA. The msTALE was fused with green fluorescent protein (GFP) and stably expressed in mouse embryonic stem cells. Hybridization with a major satellite probe (3D-fluorescent in situ hybridization) and co-staining for known cellular structures confirmed in vivo binding of the GFP-msTALE to major satellite DNA present at nuclear chromocenters. Dual tracing of major satellite DNA and the replication machinery throughout S-phase showed co-localization during mid to late S-phase, directly demonstrating the late replication timing of major satellite DNA. Fluorescence bleaching experiments indicated a relatively stable but still dynamic binding, with mean residence times in the range of minutes. Fluorescently labeled dTALEs open new perspectives to target and trace DNA sequences and to monitor dynamic changes in subnuclear positioning as well as interactions with functional nuclear structures during cell cycle progression and cellular differentiation. PMID:24371265

  1. Targeting and tracing of specific DNA sequences with dTALEs in living cells.

    Science.gov (United States)

    Thanisch, Katharina; Schneider, Katrin; Morbitzer, Robert; Solovei, Irina; Lahaye, Thomas; Bultmann, Sebastian; Leonhardt, Heinrich

    2014-04-01

    Epigenetic regulation of gene expression involves, besides DNA and histone modifications, the relative positioning of DNA sequences within the nucleus. To trace specific DNA sequences in living cells, we used programmable sequence-specific DNA binding of designer transcription activator-like effectors (dTALEs). We designed a recombinant dTALE (msTALE) with variable repeat domains to specifically bind a 19-bp target sequence of major satellite DNA. The msTALE was fused with green fluorescent protein (GFP) and stably expressed in mouse embryonic stem cells. Hybridization with a major satellite probe (3D-fluorescent in situ hybridization) and co-staining for known cellular structures confirmed in vivo binding of the GFP-msTALE to major satellite DNA present at nuclear chromocenters. Dual tracing of major satellite DNA and the replication machinery throughout S-phase showed co-localization during mid to late S-phase, directly demonstrating the late replication timing of major satellite DNA. Fluorescence bleaching experiments indicated a relatively stable but still dynamic binding, with mean residence times in the range of minutes. Fluorescently labeled dTALEs open new perspectives to target and trace DNA sequences and to monitor dynamic changes in subnuclear positioning as well as interactions with functional nuclear structures during cell cycle progression and cellular differentiation.

  2. MotifMark: Finding Regulatory Motifs in DNA Sequences

    OpenAIRE

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L.; Wang, May D.

    2017-01-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity be...

  3. Sequence analysis of the canine mitochondrial DNA control region from shed hair samples in criminal investigations.

    Science.gov (United States)

    Berger, C; Berger, B; Parson, W

    2012-01-01

    In recent years, evidence from domestic dogs has increasingly been analyzed by forensic DNA testing. Especially, canine hairs have proved most suitable and practical due to the high rate of hair transfer occurring between dogs and humans. Starting with the description of a contamination-free sample handling procedure, we give a detailed workflow for sequencing hypervariable segments (HVS) of the mtDNA control region from canine evidence. After the hair material is lysed and the DNA extracted by Phenol/Chloroform, the amplification and sequencing strategy comprises the HVS I and II of the canine control region and is optimized for DNA of medium-to-low quality and quantity. The sequencing procedure is based on the Sanger Big-dye deoxy-terminator method and the separation of the sequencing reaction products is performed on a conventional multicolor fluorescence detection capillary electrophoresis platform. Finally, software-aided base calling and sequence interpretation are addressed exemplarily.

  4. Molecular Cloning and Characterization of Full-Length cDNA of Calmodulin Gene from Pacific Oyster Crassostrea gigas.

    Science.gov (United States)

    Li, Xing-Xia; Yu, Wen-Chao; Cai, Zhong-Qiang; He, Cheng; Wei, Na; Wang, Xiao-Tong; Yue, Xi-Qing

    2016-01-01

    The shell of the pearl oyster ( Pinctada fucata ) mainly comprises aragonite whereas that of the Pacific oyster ( Crassostrea gigas ) is mainly calcite, thereby suggesting the different mechanisms of shell formation between above two mollusks. Calmodulin (CaM) is an important gene for regulating the uptake, transport, and secretion of calcium during the process of shell formation in pearl oyster. It is interesting to characterize the CaM in oysters, which could facilitate the understanding of the different shell formation mechanisms among mollusks. We cloned the full-length cDNA of Pacific oyster CaM (cgCaM) and found that the cgCaM ORF encoded a peptide of 113 amino acids containing three EF-hand calcium-binding domains, its expression level was highest in the mantle, hinting that the cgCaM gene is probably involved in shell formation of Pacific oyster, and the common ancestor of Gastropoda and Bivalvia may possess at least three CaM genes. We also found that the numbers of some EF hand family members in highly calcified species were higher than those in lowly calcified species and the numbers of these motifs in oyster genome were the highest among the mollusk species with whole genome sequence, further hinting the correlation between CaM and biomineralization.

  5. Molecular Cloning and Characterization of Full-Length cDNA of Calmodulin Gene from Pacific Oyster Crassostrea gigas

    Directory of Open Access Journals (Sweden)

    Xing-Xia Li

    2016-01-01

    Full Text Available The shell of the pearl oyster (Pinctada fucata mainly comprises aragonite whereas that of the Pacific oyster (Crassostrea gigas is mainly calcite, thereby suggesting the different mechanisms of shell formation between above two mollusks. Calmodulin (CaM is an important gene for regulating the uptake, transport, and secretion of calcium during the process of shell formation in pearl oyster. It is interesting to characterize the CaM in oysters, which could facilitate the understanding of the different shell formation mechanisms among mollusks. We cloned the full-length cDNA of Pacific oyster CaM (cgCaM and found that the cgCaM ORF encoded a peptide of 113 amino acids containing three EF-hand calcium-binding domains, its expression level was highest in the mantle, hinting that the cgCaM gene is probably involved in shell formation of Pacific oyster, and the common ancestor of Gastropoda and Bivalvia may possess at least three CaM genes. We also found that the numbers of some EF hand family members in highly calcified species were higher than those in lowly calcified species and the numbers of these motifs in oyster genome were the highest among the mollusk species with whole genome sequence, further hinting the correlation between CaM and biomineralization.

  6. A duplex DNA-gold nanoparticle probe composed as a colorimetric biosensor for sequence-specific DNA-binding proteins.

    Science.gov (United States)

    Ahn, Junho; Choi, Yeonweon; Lee, Ae-Ree; Lee, Joon-Hwa; Jung, Jong Hwa

    2016-03-21

    Using duplex DNA-AuNP aggregates, a sequence-specific DNA-binding protein, SQUAMOSA Promoter-binding-Like protein 12 (SPL-12), was directly determined by SPL-12-duplex DNA interaction-based colorimetric actions of DNA-Au assemblies. In order to prepare duplex DNA-Au aggregates, thiol-modified DNA 1 and DNA 2 were attached onto the surface of AuNPs, respectively, by the salt-aging method and then the DNA-attached AuNPs were mixed. Duplex-DNA-Au aggregates having the average size of 160 nm diameter and the maximum absorption at 529 nm were able to recognize SPL-12 and reached the equivalent state by the addition of ∼30 equivalents of SPL-12 accompanying a color change from red to blue with a red shift of the maximum absorption at 570 nm. As a result, the aggregation size grew to about 247 nm. Also, at higher temperatures of the mixture of duplex-DNA-Au aggregate solution and SPL-12, the equivalent state was reached rapidly. On the contrary, in the control experiment using Bovine Serum Albumin (BSA), no absorption band shift of duplex-DNA-Au aggregates was observed.

  7. Ecological niche modelling and nDNA sequencing support a new, morphologically cryptic beetle species unveiled by DNA barcoding.

    Science.gov (United States)

    Hawlitschek, Oliver; Porch, Nick; Hendrich, Lars; Balke, Michael

    2011-02-09

    DNA sequencing techniques used to estimate biodiversity, such as DNA barcoding, may reveal cryptic species. However, disagreements between barcoding and morphological data have already led to controversy. Species delimitation should therefore not be based on mtDNA alone. Here, we explore the use of nDNA and bioclimatic modelling in a new species of aquatic beetle revealed by mtDNA sequence data. The aquatic beetle fauna of Australia is characterised by high degrees of endemism, including local radiations such as the genus Antiporus. Antiporus femoralis was previously considered to exist in two disjunct, but morphologically indistinguishable populations in south-western and south-eastern Australia. We constructed a phylogeny of Antiporus and detected a deep split between these populations. Diagnostic characters from the highly variable nuclear protein encoding arginine kinase gene confirmed the presence of two isolated populations. We then used ecological niche modelling to examine the climatic niche characteristics of the two populations. All results support the status of the two populations as distinct species. We describe the south-western species as Antiporus occidentalis sp.n. In addition to nDNA sequence data and extended use of mitochondrial sequences, ecological niche modelling has great potential for delineating morphologically cryptic species.

  8. Comparative molecular analysis of Herbaspirillum strains by RAPD, RFLP, and 16S rDNA sequencing

    Directory of Open Access Journals (Sweden)

    Soares-Ramos Juliana R.L.

    2003-01-01

    Full Text Available Herbaspirillum spp. are endophytic diazotrophic bacteria associated with important agricultural crops. In this work, we analyzed six strains of H. seropedicae (Z78, M2, ZA69, ZA95, Z152, and Z67 and one strain of H. rubrisubalbicans (M4 by restriction fragment length polymorphism (RFLP using HindIII or DraI restriction endonucleases, random amplified polymorphic DNA (RAPD, and partial sequencing of 16S rDNA. The results of these analyses ascribed the strains studied to three distinct groups: group I, consisting of M2 and M4; group II, of ZA69; and group III, of ZA95, Z78, Z67, and Z152. RAPD fingerprinting showed a higher variability than the other methods, and each strain had a unique electrophoretic pattern with five of the six primers used. Interestingly, H. seropedicae M2 was found by all analyses to be genetically very close to H. rubrisubalbicans M4. Our results show that RAPD can distinguish between all Herbaspirillum strains tested.

  9. The Statistical Segment Length of DNA: Opportunities for Biomechanical Modeling in Polymer Physics and Next-Generation Genomics.

    Science.gov (United States)

    Dorfman, Kevin D

    2018-02-01

    The development of bright bisintercalating dyes for deoxyribonucleic acid (DNA) in the 1990s, most notably YOYO-1, revolutionized the field of polymer physics in the ensuing years. These dyes, in conjunction with modern molecular biology techniques, permit the facile observation of polymer dynamics via fluorescence microscopy and thus direct tests of different theories of polymer dynamics. At the same time, they have played a key role in advancing an emerging next-generation method known as genome mapping in nanochannels. The effect of intercalation on the bending energy of DNA as embodied by a change in its statistical segment length (or, alternatively, its persistence length) has been the subject of significant controversy. The precise value of the statistical segment length is critical for the proper interpretation of polymer physics experiments and controls the phenomena underlying the aforementioned genomics technology. In this perspective, we briefly review the model of DNA as a wormlike chain and a trio of methods (light scattering, optical or magnetic tweezers, and atomic force microscopy (AFM)) that have been used to determine the statistical segment length of DNA. We then outline the disagreement in the literature over the role of bisintercalation on the bending energy of DNA, and how a multiscale biomechanical approach could provide an important model for this scientifically and technologically relevant problem.

  10. cDNA encoding a polypeptide including a hevein sequence

    Energy Technology Data Exchange (ETDEWEB)

    Raikhel, N.V.; Broekaert, W.F.; Namhai Chua; Kush, A.

    1993-02-16

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1,018 nucleotides long and includes an open reading frame of 204 amino acids.

  11. Bisulfite sequencing reveals that Aspergillus flavus holds a hollow in DNA methylation.

    Directory of Open Access Journals (Sweden)

    Si-Yang Liu

    Full Text Available Aspergillus flavus first gained scientific attention for its production of aflatoxin. The underlying regulation of aflatoxin biosynthesis has been serving as a theoretical model for biosynthesis of other microbial secondary metabolites. Nevertheless, for several decades, the DNA methylation status, one of the important epigenomic modifications involved in gene regulation, in A. flavus remains to be controversial. Here, we applied bisulfite sequencing in conjunction with a biological replicate strategy to investigate the DNA methylation profiling of A. flavus genome. Both the bisulfite sequencing data and the methylome comparisons with other fungi confirm that the DNA methylation level of this fungus is negligible. Further investigation into the DNA methyltransferase of Aspergillus uncovers its close relationship with RID-like enzymes as well as its divergence with the methyltransferase of species with validated DNA methylation. The lack of repeat contents of the A. flavus' genome and the high RIP-index of the small amount of remanent repeat potentially support our speculation that DNA methylation may be absent in A. flavus or that it may possess de novo DNA methylation which occurs very transiently during the obscure sexual stage of this fungal species. This work contributes to our understanding on the DNA methylation status of A. flavus, as well as reinforces our views on the DNA methylation in fungal species. In addition, our strategy of applying bisulfite sequencing to DNA methylation detection in species with low DNA methylation may serve as a reference for later scientific investigations in other hypomethylated species.

  12. Analysis of Litopenaeus vannamei transcriptome using the next-generation DNA sequencing technique.

    Directory of Open Access Journals (Sweden)

    Chaozheng Li

    Full Text Available BACKGROUND: Pacific white shrimp (Litopenaeus vannamei, the major species of farmed shrimps in the world, has been attracting extensive studies, which require more and more genome background knowledge. The now available transcriptome data of L. vannamei are insufficient for research requirements, and have not been adequately assembled and annotated. METHODOLOGY/PRINCIPAL FINDINGS: This is the first study that used a next-generation high-throughput DNA sequencing technique, the Solexa/Illumina GA II method, to analyze the transcriptome from whole bodies of L. vannamei larvae. More than 2.4 Gb of raw data were generated, and 109,169 unigenes with a mean length of 396 bp were assembled using the SOAP denovo software. 73,505 unigenes (>200 bp with good quality sequences were selected and subjected to annotation analysis, among which 37.80% can be matched in NCBI Nr database, 37.3% matched in Swissprot, and 44.1% matched in TrEMBL. Using BLAST and BLAST2Go softwares, 11,153 unigenes were classified into 25 Clusters of Orthologous Groups of proteins (COG categories, 8171 unigenes were assigned into 51 Gene ontology (GO functional groups, and 18,154 unigenes were divided into 220 Kyoto Encyclopedia of Genes and Genomes (KEGG pathways. To primarily verify part of the results of assembly and annotations, 12 assembled unigenes that are homologous to many embryo development-related genes were chosen and subjected to RT-PCR for electrophoresis and Sanger sequencing analyses, and to real-time PCR for expression profile analyses during embryo development. CONCLUSIONS/SIGNIFICANCE: The L. vannamei transcriptome analyzed using the next-generation sequencing technique enriches the information of L. vannamei genes, which will facilitate our understanding of the genome background of crustaceans, and promote the studies on L. vannamei.

  13. Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains.

    Science.gov (United States)

    Zhao, Xiaoyan; Sze, Sing-Hoi

    2011-05-01

    One strategy to identify transcription factor binding sites is through motif finding in upstream DNA sequences of potentially co-regulated genes. Despite extensive efforts, none of the existing algorithms perform very well. We consider a string representation that allows arbitrary ignored positions within the nonconserved portion of single motifs, and use O(2(l)) Markov chains to model the background distributions of motifs of length l while skipping these positions within each Markov chain. By focusing initially on positions that have fixed nucleotides to define core occurrences, we develop an algorithm to identify motifs of moderate lengths. We compare the performance of our algorithm to other motif finding algorithms on a few benchmark data sets, and show that significant improvement in accuracy can be obtained when the sites are sufficiently conserved within a given sample, while comparable performance is obtained when the site conservation rate is low. A software program (PosMotif ) and detailed results are available online at http://faculty.cse.tamu.edu/shsze/posmotif.

  14. Molecular cloning and characterization of the full-length cDNA encoding the tree shrew (tupaia belangeri) CD28

    Science.gov (United States)

    Huang, Xiaoyan; Yan, Yan; Wang, Sha; Wang, Qinying; Shi, Jian; Shao, Zhanshe; Dai, Jiejie

    2017-11-01

    CD28 is one of the most important co-stimulatory molecules expressed by naive and primed T cells. The tree shrews (Tupaia belangeri), as an ideal animal model for analyzing mechanism of human diseases receiving extensive attentions, demands essential research tools, in particular in the study of cellular markers and monoclonal antibodies for immunological studies. However, little is known about tree shrew CD28 (tsCD28) until now. In this study, a 663 bp of the full-length CD28 cDNA, encoding a polypeptide of 220 amino acids was cloned from tree shrew spleen lymphocytes. The nucleotide sequence of the tsCD28 showed 85%, 76%, and 75% similarities with human, rat, and mouse, respectively, which showed the affinity relationship between tree shrew and human is much closer than between human and rodents. The open reading frame (ORF) sequence of tsCD28 gene was predicted to be in correspondence with the signal sequence, immunoglobulin variable-like (IgV) domain, transmembrane domain and cytoplasmic tail, respectively.We also analyzed its molecular characteristics with other mammals by using biology software such as Clustal W 2.0 and so forth. Our results showed that tsCD28 contained many features conserved in CD28 genes from other mammals, including conserved signal peptide and glycosylation sites, and several residues responsible for binding to the CD28R, and the tsCD28 amino acid sequence were found a close genetic relationship with human and monkey. The crystal structure and surface charge revealed most regions of tree shrew CD28 molecule surface charges are similar as human. However, compared with human CD28 (hCD28) regions, in some areas, the surface positive charge of tsCD28 was less than hCD28, which may affect antibody binding. The present study is the first report of cloning and characterization of CD28 in tree shrew. This study provides a theoretical basis for the further study the structure and function of tree shrew CD28 and utilize tree shrew as an effective

  15. Sequence of a cDNA encoding turtle high mobility group 1 protein.

    Science.gov (United States)

    Zheng, Jifang; Hu, Bi; Wu, Duansheng

    2005-07-01

    In order to understand sequence information about turtle HMG1 gene, a cDNA encoding HMG1 protein of the Chinese soft-shell turtle (Pelodiscus sinensis) was amplified by RT-PCR from kidney total RNA, and was cloned, sequenced and analyzed. The results revealed that the open reading frame (ORF) of turtle HMG1 cDNA is 606 bp long. The ORF codifies 202 amino acid residues, from which two DNA-binding domains and one polyacidic region are derived. The DNA-binding domains share higher amino acid identity with homologues sequences of chicken (96.5%) and mammalian (74%) than homologues sequence of rainbow trout (67%). The polyacidic region shows 84.6% amino acid homology with the equivalent region of chicken HMG1 cDNA. Turtle HMG1 protein contains 3 Cys residues located at completely conserved positions. Conservation in sequence and structure suggests that the functions of turtle HMG1 cDNA may be highly conserved during evolution. To our knowledge, this is the first report of HMG1 cDNA sequence in any reptilian.

  16. Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies.

    Directory of Open Access Journals (Sweden)

    2005-10-01

    Full Text Available With a draft genome-sequence assembly for the chimpanzee available, it is now possible to perform genome-wide analyses to identify, at a submicroscopic level, structural rearrangements that have occurred between chimpanzees and humans. The goal of this study was to investigate chromosomal regions that are inverted between the chimpanzee and human genomes. Using the net alignments for the builds of the human and chimpanzee genome assemblies, we identified a total of 1,576 putative regions of inverted orientation, covering more than 154 mega-bases of DNA. The DNA segments are distributed throughout the genome and range from 23 base pairs to 62 mega-bases in length. For the 66 inversions more than 25 kilobases (kb in length, 75% were flanked on one or both sides by (often unrelated segmental duplications. Using PCR and fluorescence in situ hybridization we experimentally validated 23 of 27 (85% semi-randomly chosen regions; the largest novel inversion confirmed was 4.3 mega-bases at human Chromosome 7p14. Gorilla was used as an out-group to assign ancestral status to the variants. All experimentally validated inversion regions were then assayed against a panel of human samples and three of the 23 (13% regions were found to be polymorphic in the human genome. These polymorphic inversions include 730 kb (at 7p22, 13 kb (at 7q11, and 1 kb (at 16q24 fragments with a 5%, 30%, and 48% minor allele frequency, respectively. Our results suggest that inversions are an important source of variation in primate genome evolution. The finding of at least three novel inversion polymorphisms in humans indicates this type of structural variation may be a more common feature of our genome than previously realized.

  17. Genotypic characterization of Salmonella by multilocus sequence typing, pulsed-field gel electrophoresis and amplified fragment length polymorphism

    DEFF Research Database (Denmark)

    Torpdahl, Mia; Skov, Marianne N.; Sandvang, Dorthe

    2005-01-01

    subspecies enterica isolates. A total of 25 serotypes were investigated that had been isolated from humans or veterinary sources in Denmark between 1995 and 2001. All isolates were genotyped by multilocus sequence typing (MLST), pulsed-field gel electrophoresis (PFGE) and amplified fragment length...

  18. Human pro. cap alpha. 1(III) collagen: cDNA sequence for the 3' end

    Energy Technology Data Exchange (ETDEWEB)

    Mankoo, B S; Dalgleish, R

    1988-03-25

    The authors have previously isolated two overlapping cDNA clones, pIII-21 and pIII-33, which encode the C-terminal end of human type III procollagen. They now present the sequence of 2520 bases encoded in these cDNAs which overlaps other previously published sequences for the same gene. The sequence presented differs from previously published sequences at five positions.

  19. Complete DNA sequence of the mitochondrial genome of the treehopper Leptobelus gazella (Membracoidea: Hemiptera).

    Science.gov (United States)

    Zhao, Xing; Liang, Ai-Ping

    2016-09-01

    The first complete DNA sequence of the mitochondrial genome (mitogenome) of Leptobelus gazelle (Membracoidea: Hemiptera) is determined in this study. The circular molecule is 16,007 bp in its full length, which encodes a set of 37 genes, including 13 proteins, 2 ribosomal RNAs, 22 transfer RNAs, and contains an A + T-rich region (CR). The gene numbers, content, and organization of L. gazelle are similar to other typical metazoan mitogenomes. Twelve of the 13 PCGs are initiated with ATR methionine or ATT isoleucine codons, except the atp8 gene that uses the ATC isoleucine as start signal. Ten of the 13 PCGs have complete termination codons, either TAA (nine genes) or TAG (cytb). The remaining 3 PCGs (cox1, cox2 and nad5) have incomplete termination codons T (AA). All of the 22 tRNAs can be folded in the form of a typical clover-leaf structure. The complete mitogenome sequence data of L. gazelle is useful for the phylogenetic and biogeographic studies of the Membracoidea and Hemiptera.

  20. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    Science.gov (United States)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  1. Knotting dynamics of DNA chains of different length confined in nanochannels

    International Nuclear Information System (INIS)

    Suma, Antonio; Micheletti, Cristian; Orlandini, Enzo

    2015-01-01

    Langevin dynamics simulations are used to characterize the typical mechanisms governing the spontaneous tying, untying and the dynamical evolution of knots in coarse-grained models of DNA chains confined in nanochannels. In particular we focus on how these mechanisms depend on the chain contour length, L c , at a fixed channel width D = 56 nm corresponding to the onset of the Odijk scaling regime where chain backfoldings and hence knots are disfavoured but not suppressed altogether. We find that the lifetime of knots grows significantly with L c , while that of unknots varies to a lesser extent. The underlying kinetic mechanisms are clarified by analysing the evolution of the knot position along the chain. At the considered confinement, in fact, knots are typically tied by local backfoldings of the chain termini where they are eventually untied after a stochastic motion along the chain. Consequently, the lifetime of unknots is mostly controlled by backfoldings events at the chain ends, which is largely independent of L c . The lifetime of knots, instead, increases significantly with L c because knots can, on average, travel farther along the chain before being untied. The observed interplay of knots and unknots lifetimes underpins the growth of the equilibrium knotting probability of longer and longer chains at fixed channel confinement. (paper)

  2. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    Directory of Open Access Journals (Sweden)

    Soichi Inagaki

    Full Text Available Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  3. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    Science.gov (United States)

    Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca

    2015-01-01

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  4. Sequence of a cloned cDNA encoding human ribosomal protein S11

    Energy Technology Data Exchange (ETDEWEB)

    Lott, J B; Mackie, G A

    1988-02-11

    The authors have isolated a cloned cDNA that encodes human ribosomal protein (rp) S11 by screening a human fibroblast cDNA library with a labelled 204 bp DNA fragment encompassing residues 212-416 of pRS11, a rat rp Sll cDNA clone. The human rp S11 cloned cDNA consists of 15 residues of the 5' leader, the entire coding sequence and all 51 residues of the 3' untranslated region. The predicted amino acid sequence of 158 residues is identical to rat rpS11. The nucleotide sequence in the coding region differs, however, from that in rat in the first position in two codons and in the third position in 44 codons.

  5. Plant DNA sequences from feces: potential means for assessing diets of wild primates.

    Science.gov (United States)

    Bradley, Brenda J; Stiller, Mathias; Doran-Sheehy, Diane M; Harris, Tara; Chapman, Colin A; Vigilant, Linda; Poinar, Hendrik

    2007-06-01

    Analyses of plant DNA in feces provides a promising, yet largely unexplored, means of documenting the diets of elusive primates. Here we demonstrate the promise and pitfalls of this approach using DNA extracted from fecal samples of wild western gorillas (Gorilla gorilla) and black and white colobus monkeys (Colobus guereza). From these DNA extracts we amplified, cloned, and sequenced small segments of chloroplast DNA (part of the rbcL gene) and plant nuclear DNA (ITS-2). The obtained sequences were compared to sequences generated from known plant samples and to those in GenBank to identify plant taxa in the feces. With further optimization, this method could provide a basic evaluation of minimum primate dietary diversity even when knowledge of local flora is limited. This approach may find application in studies characterizing the diets of poorly-known, unhabituated primate species or assaying consumer-resource relationships in an ecosystem. (c) 2007 Wiley-Liss, Inc.

  6. Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing

    NARCIS (Netherlands)

    Hovestadt, Volker; Jones, David T. W.; Picelli, Simone; Wang, Wei; Kool, Marcel; Northcott, Paul A.; Sultan, Marc; Stachurski, Katharina; Ryzhova, Marina; Warnatz, Hans-Jörg; Ralser, Meryem; Brun, Sonja; Bunt, Jens; Jäger, Natalie; Kleinheinz, Kortine; Erkek, Serap; Weber, Ursula D.; Bartholomae, Cynthia C.; von Kalle, Christof; Lawerenz, Chris; Eils, Jürgen; Koster, Jan; Versteeg, Rogier; Milde, Till; Witt, Olaf; Schmidt, Sabine; Wolf, Stephan; Pietsch, Torsten; Rutkowski, Stefan; Scheurlen, Wolfram; Taylor, Michael D.; Brors, Benedikt; Felsberg, Jörg; Reifenberger, Guido; Borkhardt, Arndt; Lehrach, Hans; Wechsler-Reya, Robert J.; Eils, Roland; Yaspo, Marie-Laure; Landgraf, Pablo; Korshunov, Andrey; Zapatka, Marc; Radlwimmer, Bernhard; Pfister, Stefan M.; Lichter, Peter

    2014-01-01

    Epigenetic alterations, that is, disruption of DNA methylation and chromatin architecture, are now acknowledged as a universal feature of tumorigenesis. Medulloblastoma, a clinically challenging, malignant childhood brain tumour, is no exception. Despite much progress from recent genomics studies,

  7. Designing universal primers for the isolation of DNA sequences encoding Proanthocyanidins biosynthetic enzymes in Crataegus aronia

    Directory of Open Access Journals (Sweden)

    Zuiter Afnan

    2012-08-01

    Full Text Available Abstract Background Hawthorn is the common name of all plant species in the genus Crataegus, which belongs to the Rosaceae family. Crataegus are considered useful medicinal plants because of their high content of proanthocyanidins (PAs and other related compounds. To improve PAs production in Crataegus tissues, the sequences of genes encoding PAs biosynthetic enzymes are required. Findings Different bioinformatics tools, including BLAST, multiple sequence alignment and alignment PCR analysis were used to design primers suitable for the amplification of DNA fragments from 10 candidate genes encoding enzymes involved in PAs biosynthesis in C. aronia. DNA sequencing results proved the utility of the designed primers. The primers were used successfully to amplify DNA fragments of different PAs biosynthesis genes in different Rosaceae plants. Conclusion To the best of our knowledge, this is the first use of the alignment PCR approach to isolate DNA sequences encoding PAs biosynthetic enzymes in Rosaceae plants.

  8. Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

    Science.gov (United States)

    Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

  9. Sequence-specific RNA Photocleavage by Single-stranded DNA in Presence of Riboflavin

    Science.gov (United States)

    Zhao, Yongyun; Chen, Gangyi; Yuan, Yi; Li, Na; Dong, Juan; Huang, Xin; Cui, Xin; Tang, Zhuo

    2015-10-01

    Constant efforts have been made to develop new method to realize sequence-specific RNA degradation, which could cause inhibition of the expression of targeted gene. Herein, by using an unmodified short DNA oligonucleotide for sequence recognition and endogenic small molecue, vitamin B2 (riboflavin) as photosensitizer, we report a simple strategy to realize the sequence-specific photocleavage of targeted RNA. The DNA strand is complimentary to the target sequence to form DNA/RNA duplex containing a G•U wobble in the middle. The cleavage reaction goes through oxidative elimination mechanism at the nucleoside downstream of U of the G•U wobble in duplex to obtain unnatural RNA terminal, and the whole process is under tight control by using light as switch, which means the cleavage could be carried out according to specific spatial and temporal requirements. The biocompatibility of this method makes the DNA strand in combination with riboflavin a promising molecular tool for RNA manipulation.

  10. Molecular-Sized DNA or RNA Sequencing Machine | NCI Technology Transfer Center | TTC

    Science.gov (United States)

    The National Cancer Institute's Gene Regulation and Chromosome Biology Laboratory is seeking statements of capability or interest from parties interested in collaborative research to co-develop a molecular-sized DNA or RNA sequencing machine.

  11. Structural basis for sequence-specific recognition of DNA by TAL effectors

    KAUST Repository

    Deng, Dong; Yan, Chuangye; Pan, Xiaojing; Mahfouz, Magdy M.; Wang, Jiawei; Zhu, Jiankang; Shi, Yi Gong; Yan, Nieng

    2012-01-01

    TAL (transcription activator-like) effectors, secreted by phytopathogenic bacteria, recognize host DNA sequences through a central domain of tandem repeats. Each repeat comprises 33 to 35 conserved amino acids and targets a specific base pair

  12. Efficiency of ITS Sequences for DNA Barcoding in Passiflora (Passifloraceae

    Directory of Open Access Journals (Sweden)

    Giovanna Câmara Giudicelli

    2015-04-01

    Full Text Available DNA barcoding is a technique for discriminating and identifying species using short, variable, and standardized DNA regions. Here, we tested for the first time the performance of plastid and nuclear regions as DNA barcodes in Passiflora. This genus is a largely variable, with more than 900 species of high ecological, commercial, and ornamental importance. We analyzed 1034 accessions of 222 species representing the four subgenera of Passiflora and evaluated the effectiveness of five plastid regions and three nuclear datasets currently employed as DNA barcodes in plants using barcoding gap, applied similarity-, and tree-based methods. The plastid regions were able to identify less than 45% of species, whereas the nuclear datasets were efficient for more than 50% using “best match” and “best close match” methods of TaxonDNA software. All subgenera presented higher interspecific pairwise distances and did not fully overlap with the intraspecific distance, and similarity-based methods showed better results than tree-based methods. The nuclear ribosomal internal transcribed spacer 1 (ITS1 region presented a higher discrimination power than the other datasets and also showed other desirable characteristics as a DNA barcode for this genus. Therefore, we suggest that this region should be used as a starting point to identify Passiflora species.

  13. Underwound DNA under Tension: Structure, Elasticity, and Sequence-Dependent Behaviors

    Science.gov (United States)

    Sheinin, Maxim Y.; Forth, Scott; Marko, John F.; Wang, Michelle D.

    2011-09-01

    DNA melting under torsion plays an important role in a wide variety of cellular processes. In the present Letter, we have investigated DNA melting at the single-molecule level using an angular optical trap. By directly measuring force, extension, torque, and angle of DNA, we determined the structural and elastic parameters of torsionally melted DNA. Our data reveal that under moderate forces, the melted DNA assumes a left-handed structure as opposed to an open bubble conformation and is highly torsionally compliant. We have also discovered that at low forces melted DNA properties are highly dependent on DNA sequence. These results provide a more comprehensive picture of the global DNA force-torque phase diagram.

  14. Description of a New Planktonic Mixotrophic Dinoflagellate Paragymnodinium shiwhaense n. gen., n. sp from the Coastal Waters off Western Korea: Morphology, Pigments, and Ribosomal DNA Gene Sequence

    DEFF Research Database (Denmark)

    Kang, Nam Seon; Jeong, Hae Jin; Moestrup, Øjvind

    2010-01-01

    The mixotrophic dinoflagellate Paragymnodinium shiwhaense n. gen., n. sp. is described from living cells and from cells prepared by light, scanning electron, and transmission electron microscopy. In addition, sequences of the small subunit (SSU) and large subunit (LSU) rDNA and photosynthetic...... extension-like furrow. The cingulum is as wide as 0.2-0.3 x cell length and displaced by 0.2-0.3 x cell length. Cell length and width of live cells fed Amphidinium carterae were 8.4-19.3 and 6.1-16.0 mu m, respectively. Paragymnodinium shiwhaense does not have a nuclear envelope chamber nor a nuclear...... fibrous connective (NFC). Cells contain chloroplasts, nematocysts, trichocysts, and peduncle, though eyespots, pyrenoids, and pusules are absent. The main accessory pigment is peridinin. The sequence of the SSU rDNA of this dinoflagellate (GenBank AM408889) is 4% different from that of Gymnodinium...

  15. A next generation semiconductor based sequencing approach for the identification of meat species in DNA mixtures.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available The identification of the species of origin of meat and meat products is an important issue to prevent and detect frauds that might have economic, ethical and health implications. In this paper we evaluated the potential of the next generation semiconductor based sequencing technology (Ion Torrent Personal Genome Machine for the identification of DNA from meat species (pig, horse, cattle, sheep, rabbit, chicken, turkey, pheasant, duck, goose and pigeon as well as from human and rat in DNA mixtures through the sequencing of PCR products obtained from different couples of universal primers that amplify 12S and 16S rRNA mitochondrial DNA genes. Six libraries were produced including PCR products obtained separately from 13 species or from DNA mixtures containing DNA from all species or only avian or only mammalian species at equimolar concentration or at 1:10 or 1:50 ratios for pig and horse DNA. Sequencing obtained a total of 33,294,511 called nucleotides of which 29,109,688 with Q20 (87.43% in a total of 215,944 reads. Different alignment algorithms were used to assign the species based on sequence data. Error rate calculated after confirmation of the obtained sequences by Sanger sequencing ranged from 0.0003 to 0.02 for the different species. Correlation about the number of reads per species between different libraries was high for mammalian species (0.97 and lower for avian species (0.70. PCR competition limited the efficiency of amplification and sequencing for avian species for some primer pairs. Detection of low level of pig and horse DNA was possible with reads obtained from different primer pairs. The sequencing of the products obtained from different universal PCR primers could be a useful strategy to overcome potential problems of amplification. Based on these results, the Ion Torrent technology can be applied for the identification of meat species in DNA mixtures.

  16. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution

    OpenAIRE

    Falconer, Ester; Hills, Mark; Naumann, Ulrike; Poon, Steven S. S.; Chavez, Elizabeth A.; Sanders, Ashley D.; Zhao, Yongjun; Hirst, Martin; Lansdorp, Peter M.

    2012-01-01

    DNA rearrangements such as sister chromatid exchanges (SCEs) are sensitive indicators of genomic stress and instability, but they are typically masked by single-cell sequencing techniques. We developed Strand-seq to independently sequence parental DNA template strands from single cells, making it possible to map SCEs at orders-of-magnitude greater resolution than was previously possible. On average, murine embryonic stem (mES) cells exhibit eight SCEs, which are detected at a resolution of up...

  17. The use of permanganate as a sequencing reagent for identification of 5-methylcytosine residues in DNA.

    OpenAIRE

    Fritzsche, E; Hayatsu, H; Igloi, G L; Iida, S; Kössel, H

    1987-01-01

    The use of permanganate as a reagent for DNA sequencing by chemical degradation has been studied with respect to its specificity for 5-methylcytosine residues. At weakly acidic pH and room temperature, 0.2 mM potassium permanganate reacts preferentially with thymine, 5-methylcytosine, and to a lesser extent with purine residues, while cytosine remains essentially intact. Permanganate oxidation is, therefore, a suitable DNA sequencing reaction for positive discrimination between 5-methylcytosi...

  18. Solving the Curriculum Sequencing Problem with DNA Computing Approach

    Science.gov (United States)

    Debbah, Amina; Ben Ali, Yamina Mohamed

    2014-01-01

    In the e-learning systems, a learning path is known as a sequence of learning materials linked to each others to help learners achieving their learning goals. As it is impossible to have the same learning path that suits different learners, the Curriculum Sequencing problem (CS) consists of the generation of a personalized learning path for each…

  19. Molecular cloning and expression of full-length DNA copies of the genomic RNAs of cowpea mosaic virus

    NARCIS (Netherlands)

    Vos, P.

    1987-01-01

    The experiments described in this thesis were designed to unravel various aspects of the mechanism of gene expression of cowpea mosaic virus (CPMV). For this purpose full-length DNA copies of both genomic RNAs of CPMV were constructed. Using powerful invitro

  20. Research on Image Encryption Based on DNA Sequence and Chaos Theory

    Science.gov (United States)

    Tian Zhang, Tian; Yan, Shan Jun; Gu, Cheng Yan; Ren, Ran; Liao, Kai Xin

    2018-04-01

    Nowadays encryption is a common technique to protect image data from unauthorized access. In recent years, many scientists have proposed various encryption algorithms based on DNA sequence to provide a new idea for the design of image encryption algorithm. Therefore, a new method of image encryption based on DNA computing technology is proposed in this paper, whose original image is encrypted by DNA coding and 1-D logistic chaotic mapping. First, the algorithm uses two modules as the encryption key. The first module uses the real DNA sequence, and the second module is made by one-dimensional logistic chaos mapping. Secondly, the algorithm uses DNA complementary rules to encode original image, and uses the key and DNA computing technology to compute each pixel value of the original image, so as to realize the encryption of the whole image. Simulation results show that the algorithm has good encryption effect and security.

  1. Detection of herpes simplex virus-specific DNA sequences in latently infected mice and in humans.

    Science.gov (United States)

    Efstathiou, S; Minson, A C; Field, H J; Anderson, J R; Wildy, P

    1986-02-01

    Herpes simplex virus-specific DNA sequences have been detected by Southern hybridization analysis in both central and peripheral nervous system tissues of latently infected mice. We have detected virus-specific sequences corresponding to the junction fragment but not the genomic termini, an observation first made by Rock and Fraser (Nature [London] 302:523-525, 1983). This "endless" herpes simplex virus DNA is both qualitatively and quantitatively stable in mouse neural tissue analyzed over a 4-month period. In addition, examination of DNA extracted from human trigeminal ganglia has shown herpes simplex virus DNA to be present in an "endless" form similar to that found in the mouse model system. Further restriction enzyme analysis of latently infected mouse brainstem and human trigeminal DNA has shown that this "endless" herpes simplex virus DNA is present in all four isomeric configurations.

  2. [Replication of Streptomyces plasmids: the DNA nucleotide sequence of plasmid pSB 24.2].

    Science.gov (United States)

    Bolotin, A P; Sorokin, A V; Aleksandrov, N N; Danilenko, V N; Kozlov, Iu I

    1985-11-01

    The nucleotide sequence of DNA in plasmid pSB 24.2, a natural deletion derivative of plasmid pSB 24.1 isolated from S. cyanogenus was studied. The plasmid amounted by its size to 3706 nucleotide pairs. The G-C composition was equal to 73 per cent. The analysis of the DNA structure in plasmid pSB 24.2 revealed the protein-encoding sequence of DNA, the continuity of which was significant for replication of the plasmid containing more than 1300 nucleotide pairs. The analysis also revealed two A-T-rich areas of DNA, the G-C composition of which was less than 55 per cent and a DNA area with a branched pin structure. The results may be of value in investigation of plasmid replication in actinomycetes and experimental cloning of DNA with this plasmid as a vector.

  3. cDNA cloning, sequence analysis, and chromosomal localization of the gene for human carnitine palmitoyltransferase

    International Nuclear Information System (INIS)

    Finocchiaro, G.; Taroni, F.; Martin, A.L.; Colombo, I.; Tarelli, G.T.; DiDonato, S.; Rocchi, M.

    1991-01-01

    The authors have cloned and sequenced a cDNA encoding human liver carnitine palmitoyltransferase an inner mitochondrial membrane enzyme that plays a major role in the fatty acid oxidation pathway. Mixed oligonucleotide primers whose sequences were deduced from one tryptic peptide obtained from purified CPTase were used in a polymerase chain reaction, allowing the amplification of a 0.12-kilobase fragment of human genomic DNA encoding such a peptide. A 60-base-pair (bp) oligonucleotide synthesized on the basis of the sequence from this fragment was used for the screening of a cDNA library from human liver and hybridized to a cDNA insert of 2255 bp. This cDNA contains an open reading frame of 1974 bp that encodes a protein of 658 amino acid residues including 25 residues of an NH 2 -terminal leader peptide. The assignment of this open reading frame to human liver CPTase is confirmed by matches to seven different amino acid sequences of tryptic peptides derived from pure human CPTase and by the 82.2% homology with the amino acid sequence of rat CPTase. The NH 2 -terminal region of CPTase contains a leucine-proline motif that is shared by carnitine acetyl- and octanoyltransferases and by choline acetyltransferase. The gene encoding CPTase was assigned to human chromosome 1, region 1q12-1pter, by hybridization of CPTase cDNA with a DNA panel of 19 human-hanster somatic cell hybrids

  4. Torque measurements reveal sequence-specific cooperative transitions in supercoiled DNA

    Science.gov (United States)

    Oberstrass, Florian C.; Fernandes, Louis E.; Bryant, Zev

    2012-01-01

    B-DNA becomes unstable under superhelical stress and is able to adopt a wide range of alternative conformations including strand-separated DNA and Z-DNA. Localized sequence-dependent structural transitions are important for the regulation of biological processes such as DNA replication and transcription. To directly probe the effect of sequence on structural transitions driven by torque, we have measured the torsional response of a panel of DNA sequences using single molecule assays that employ nanosphere rotational probes to achieve high torque resolution. The responses of Z-forming d(pGpC)n sequences match our predictions based on a theoretical treatment of cooperative transitions in helical polymers. “Bubble” templates containing 50–100 bp mismatch regions show cooperative structural transitions similar to B-DNA, although less torque is required to disrupt strand–strand interactions. Our mechanical measurements, including direct characterization of the torsional rigidity of strand-separated DNA, establish a framework for quantitative predictions of the complex torsional response of arbitrary sequences in their biological context. PMID:22474350

  5. Use of Non-Normalized, Non-Amplified cDNA for 454-Based RNA Sequencing of Fleshy Melon Fruit

    Directory of Open Access Journals (Sweden)

    Vitaly Portnoy

    2011-03-01

    Full Text Available The melon ( L. fruit is an important crop and model system for the genomic study of both fleshy fruit development and the Cucurbitaceae family. To obtain an accurate representation of the melon fruit transcriptome based on expressed sequence tag (EST abundance in 454-pyrosequencing data, we prepared double-stranded complementary DNA (cDNA of melon without the usual amplification and normalization steps. A purification step was also included to eliminate small fragments. Complementary DNAs were obtained from 14 individual fruit libraries derived from two genotypes, separated into flesh and peel tissues, and sampled throughout fruit development. Pyrosequencing was performed using Genome Sequencer FLX (GS FLX technology, resulting in 1,215,359 reads, with mean length of >200 nucleotides. The global digital expression data was validated by comparative reverse transcription quantitative real-time polymerase chain reaction (RT-qPCR of 40 selected genes and expression patterns were similar for the two methods. The results indicate that high-quality, nonbiased cDNA for next-generation sequencing can be prepared from mature, fleshy fruit, which are notorious for difficulties in ribonucleic acid (RNA preparation.

  6. Minding the gap: Frequency of indels in mtDNA control region sequence data and influence on population genetic analyses

    Science.gov (United States)

    Pearce, J.M.

    2006-01-01

    Insertions and deletions (indels) result in sequences of various lengths when homologous gene regions are compared among individuals or species. Although indels are typically phylogenetically informative, occurrence and incorporation of these characters as gaps in intraspecific population genetic data sets are rarely discussed. Moreover, the impact of gaps on estimates of fixation indices, such as FST, has not been reviewed. Here, I summarize the occurrence and population genetic signal of indels among 60 published studies that involved alignments of multiple sequences from the mitochondrial DNA (mtDNA) control region of vertebrate taxa. Among 30 studies observing indels, an average of 12% of both variable and parsimony-informative sites were composed of these sites. There was no consistent trend between levels of population differentiation and the number of gap characters in a data block. Across all studies, the average influence on estimates of ??ST was small, explaining only an additional 1.8% of among population variance (range 0.0-8.0%). Studies most likely to observe an increase in ??ST with the inclusion of gap characters were those with control region DNA appears small, dependent upon total number of variable sites in the data block, and related to species-specific characteristics and the spatial distribution of mtDNA lineages that contain indels. ?? 2006 Blackwell Publishing Ltd.

  7. DNA cross-linking by dehydromonocrotaline lacks apparent base sequence preference.

    Science.gov (United States)

    Rieben, W Kurt; Coulombe, Roger A

    2004-12-01

    Pyrrolizidine alkaloids (PAs) are ubiquitous plant toxins, many of which, upon oxidation by hepatic mixed-function oxidases, become reactive bifunctional pyrrolic electrophiles that form DNA-DNA and DNA-protein cross-links. The anti-mitotic, toxic, and carcinogenic action of PAs is thought to be caused, at least in part, by these cross-links. We wished to determine whether the activated PA pyrrole dehydromonocrotaline (DHMO) exhibits base sequence preferences when cross-linked to a set of model duplex poly A-T 14-mer oligonucleotides with varying internal and/or end 5'-d(CG), 5'-d(GC), 5'-d(TA), 5'-d(CGCG), or 5'-d(GCGC) sequences. DHMO-DNA cross-links were assessed by electrophoretic mobility shift assay (EMSA) of 32P endlabeled oligonucleotides and by HPLC analysis of cross-linked DNAs enzymatically digested to their constituent deoxynucleosides. The degree of DNA cross-links depended upon the concentration of the pyrrole, but not on the base sequence of the oligonucleotide target. Likewise, HPLC chromatograms of cross-linked and digested DNAs showed no discernible sequence preference for any nucleotide. Added glutathione, tyrosine, cysteine, and aspartic acid, but not phenylalanine, threonine, serine, lysine, or methionine competed with DNA as alternate nucleophiles for cross-linking by DHMO. From these data it appears that DHMO exhibits no strong base preference when forming cross-links with DNA, and that some cellular nucleophiles can inhibit DNA cross-link formation.

  8. Mitochondrial DNA sequence variation in Finnish patients with matrilineal diabetes mellitus

    Directory of Open Access Journals (Sweden)

    Soini Heidi K

    2012-07-01

    Full Text Available Abstract Background The genetic background of type 2 diabetes is complex involving contribution by both nuclear and mitochondrial genes. There is an excess of maternal inheritance in patients with type 2 diabetes and, furthermore, diabetes is a common symptom in patients with mutations in mitochondrial DNA (mtDNA. Polymorphisms in mtDNA have been reported to act as risk factors in several complex diseases. Findings We examined the nucleotide variation in complete mtDNA sequences of 64 Finnish patients with matrilineal diabetes. We used conformation sensitive gel electrophoresis and sequencing to detect sequence variation. We analysed the pathogenic potential of nonsynonymous variants detected in the sequences and examined the role of the m.16189 T>C variant. Controls consisted of non-diabetic subjects ascertained in the same population. The frequency of mtDNA haplogroup V was 3-fold higher in patients with diabetes. Patients harboured many nonsynonymous mtDNA substitutions that were predicted to be possibly or probably damaging. Furthermore, a novel m.13762 T>G in MTND5 leading to p.Ser476Ala and several rare mtDNA variants were found. Haplogroup H1b harbouring m.16189 T > C and m.3010 G > A was found to be more frequent in patients with diabetes than in controls. Conclusions Mildly deleterious nonsynonymous mtDNA variants and rare population-specific haplotypes constitute genetic risk factors for maternally inherited diabetes.

  9. DNA sequences from the quagga, an extinct member of the horse family.

    Science.gov (United States)

    Higuchi, R; Bowman, B; Freiberger, M; Ryder, O A; Wilson, A C

    To determine whether DNA survives and can be recovered from the remains of extinct creatures, we have examined dried muscle from a museum specimen of the quagga, a zebra-like species (Equus quagga) that became extinct in 1883 (ref. 1). We report that DNA was extracted from this tissue in amounts approaching 1% of that expected from fresh muscle, and that the DNA was of relatively low molecular weight. Among the many clones obtained from the quagga DNA, two containing pieces of mitochondrial DNA (mtDNA) were sequenced. These sequences, comprising 229 nucleotide pairs, differ by 12 base substitutions from the corresponding sequences of mtDNA from a mountain zebra, an extant member of the genus Equus. The number, nature and locations of the substitutions imply that there has been little or no postmortem modification of the quagga DNA sequences, and that the two species had a common ancestor 3-4 Myr ago, consistent with fossil evidence concerning the age of the genus Equus.

  10. Complete DNA sequence of the linear mitochondrial genome of the pathogenic yeast Candida parapsilosis

    DEFF Research Database (Denmark)

    Nosek, J.; Novotna, M.; Hlavatovicova, Z.

    2004-01-01

    The complete sequence of the mitochondrial DNA of the opportunistic yeast pathogen Candida parapsilosis was determined. The mitochondrial genome is represented by linear DNA molecules terminating with tandem repeats of a 738-bp unit. The number of repeats varies, thus generating a population...

  11. Nucleotide sequence determination of the region in adenovirus 5 DNA involved in cell transformation

    International Nuclear Information System (INIS)

    Maat, J.

    1978-01-01

    A description is given of investigations into the primary structure of the transforming region of adenovirus type 5 DNA. The phenomenon of cell transformation is discussed in general terms and the principles of a number of fairly recent techniques, which have been in use for DNA sequence determination since 1975 are dealt with. A few of the author's own techniques are described which deal both with nucleotide sequence analysis and with the determination of DNA cleavage sites of restriction endonucleases. The results are given of the mapping of cleavage sites in the HpaI-E fragment of adenovirus DNA of HpaII, HaeIII, AluI, HinfI and TaqI and of the determination of the nucleotide sequence in the transforming region of adenovirus type 5 DNA. The results of the sequence determination of the Ad5 HindIII-G fragment are discussed in relation with the investigation on the transforming proteins isolated from in vitro and in vivo synthesizing systems. Labelling procedures of DNA are described including the exonuclease III/DNA polymerase 1 method and TA polynucleotide kinase labelling of DNA fragments. (Auth.)

  12. DNA barcode sequencing from old type specimens as a tool in taxonomy: a case study in the diverse genus Eois (Lepidoptera: Geometridae.

    Directory of Open Access Journals (Sweden)

    Patrick Strutzenberger

    Full Text Available In this study we report on the sequencing of the COI barcode region from 96 historical specimens (92 type specimens +4 non-types of Eois. Eois is a diverse clade of tropical geometrid moths and is the target of a number of ongoing studies on life-histories, phylogeny, co-evolution with host plants or parasitoids, and diversity patterns across temporal and spatial dimensions. The unequivocal application of valid names is crucial for all aspects of biodiversity research as well as monitoring and conservation efforts. The availability of barcodes from historical type specimens has the potential to facilitate the much-needed acceleration of species description. We performed non-destructive DNA extraction on the abdomens of Eois specimens between 79 and 157 years of age. We used six primer combinations (recovering between 109 and 130 bp each to target the full-length barcode sequence of each specimen. We were able to obtain sequences for 91 of 96 specimens (success rate 94.8%. Sequence length ranged from 121 bp to full barcode sequences (658 bp, the average sequence length was ~500 bp. We detected a moderately strong and statistically significant negative correlation between specimen age and total sequence length, which is in agreement with expectations. The abdomen proved to be an exceedingly valuable source of DNA in old specimens of Lepidoptera. Barcode sequences obtained in this study are currently being used in an effort towards a step-wise taxonomic revision of Eois. We encourage that DNA barcodes obtained from types specimens should be included in all species descriptions and revisions whenever feasible.

  13. Protocols for 16S rDNA Array Analyses of Microbial Communities by Sequence-Specific Labeling of DNA Probes

    Directory of Open Access Journals (Sweden)

    Knut Rudi

    2003-01-01

    Full Text Available Analyses of complex microbial communities are becoming increasingly important. Bottlenecks in these analyses, however, are the tools to actually describe the biodiversity. Novel protocols for DNA array-based analyses of microbial communities are presented. In these protocols, the specificity obtained by sequence-specific labeling of DNA probes is combined with the possibility of detecting several different probes simultaneously by DNA array hybridization. The gene encoding 16S ribosomal RNA was chosen as the target in these analyses. This gene contains both universally conserved regions and regions with relatively high variability. The universally conserved regions are used for PCR amplification primers, while the variable regions are used for the specific probes. Protocols are presented for DNA purification, probe construction, probe labeling, and DNA array hybridizations.

  14. Ray Wu as Fifth Business: Deconstructing collective memory in the history of DNA sequencing.

    Science.gov (United States)

    Onaga, Lisa A

    2014-06-01

    The concept of 'Fifth Business' is used to analyze a minority standpoint and bring serious attention to the role of scientists who play a galvanizing role in a science but for multiple reasons appear less prominently in more common recounts of any particular development. Biochemist Ray Wu (1928-2008) published a DNA sequencing experiment in March 1970 using DNA polymerase catalysis and specific nucleotide labeling, both of which are foundational to general sequencing methods today. The scant mention of Wu's work from textbooks, research articles, and other accounts of DNA sequencing calls into question how scientific collective memory forms. This alternative history seeks to understand why a key figure in nucleic acid sequence analysis has remained less visibly connected or peripheral to solidifying narratives about the history of DNA sequencing. The study resists predictable dismissals of Wu's work in order to seriously examine the formation of his nucleic acid sequence analysis research program and how he shared his knowledge of sequencing during a period of rapid advancement in the field. An analysis of Wu's work on sequencing the cohesive ends of lambda bacteriophage in the 1960s and 1970s exemplifies how a variety of individuals and groups attempted to develop protocol for sequencing the order of nucleotide base pairs comprising DNA. This historical examination of the sociality of scientific research suggests a way to understand how Wu and others contributed to the very collective memory of DNA sequencing that Wu eventually tried to repair. The study of Wu, who was a Chinese immigrant to the United States, provides a foundation for further critical scholarship on the heterogeneous histories of Asian American bioscientists, the sociality of their scientific works, and how the resulting knowledge produced is preserved, if not evenly, in a scientific field's collective memory. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. GENETIC POLYMORPHISM IN GYMNODINIUM GALATHEANUM CHLOROPLAST DNA SEQUENCES AND DEVELOPMENT OF A MOLECULAR DETECTION ASSAY. (R827084)

    Science.gov (United States)

    Nuclear and chloroplast-encoded small subunit ribosomal DNA sequences were obtainedfrom several strains of the toxic dinoflagellate Gymnodinium galatheanum. Phylogenetic analyses andcomparison of sequences indicate that the chloroplast sequences show a higher degree of se...

  16. Sequence-specific DNA alkylation by tandem Py-Im polyamide conjugates.

    Science.gov (United States)

    Taylor, Rhys Dylan; Kawamoto, Yusuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi

    2014-09-01

    Tandem N-methylpyrrole-N-methylimidazole (Py-Im) polyamides with good sequence-specific DNA-alkylating activities have been designed and synthesized. Three alkylating tandem Py-Im polyamides with different linkers, which each contained the same moiety for the recognition of a 10 bp DNA sequence, were evaluated for their reactivity and selectivity by DNA alkylation, using high-resolution denaturing gel electrophoresis. All three conjugates displayed high reactivities for the target sequence. In particular, polyamide 1, which contained a β-alanine linker, displayed the most-selective sequence-specific alkylation towards the target 10 bp DNA sequence. The tandem Py-Im polyamide conjugates displayed greater sequence-specific DNA alkylation than conventional hairpin Py-Im polyamide conjugates (4 and 5). For further research, the design of tandem Py-Im polyamide conjugates could play an important role in targeting specific gene sequences. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Visual artificial grammar learning by rhesus macaques (Macaca mulatta): exploring the role of grammar complexity and sequence length.

    Science.gov (United States)

    Heimbauer, Lisa A; Conway, Christopher M; Christiansen, Morten H; Beran, Michael J; Owren, Michael J

    2018-03-01

    Humans and nonhuman primates can learn about the organization of stimuli in the environment using implicit sequential pattern learning capabilities. However, most previous artificial grammar learning studies with nonhuman primates have involved relatively simple grammars and short input sequences. The goal in the current experiments was to assess the learning capabilities of monkeys on an artificial grammar-learning task that was more complex than most others previously used with nonhumans. Three experiments were conducted using a joystick-based, symmetrical-response serial reaction time task in which two monkeys were exposed to grammar-generated sequences at sequence lengths of four in Experiment 1, six in Experiment 2, and eight in Experiment 3. Over time, the monkeys came to respond faster to the sequences generated from the artificial grammar compared to random versions. In a subsequent generalization phase, subjects generalized their knowledge to novel sequences, responding significantly faster to novel instances of sequences produced using the familiar grammar compared to those constructed using an unfamiliar grammar. These results reveal that rhesus monkeys can learn and generalize the statistical structure inherent in an artificial grammar that is as complex as some used with humans, for sequences up to eight items long. These findings are discussed in relation to whether or not rhesus macaques and other primate species possess implicit sequence learning abilities that are similar to those that humans draw upon to learn natural language grammar.

  18. Budding yeast cDNA sequencing project: S03036-05_I15 [Budding yeast cDNA sequencing project

    Lifescience Database Archive (English)

    Full Text Available EST - Link to UCSC Genome Browser - Sequence >S03036-05_I15.phd NNNTNNTNNNNCNCTCACATANAAGACGGANNAGNNNGCTGGGC...CAATGCGTTCCATATGCG AAAATTCTTGGNCAATGTATTCTCTAGCAATCTNTNCTTTTGTACANTCGGAGGNTTNTC ATGNTCCTTTCATANATTATANAAANNG

  19. Application of high-throughput DNA sequencing in phytopathology.

    Science.gov (United States)

    Studholme, David J; Glover, Rachel H; Boonham, Neil

    2011-01-01

    The new sequencing technologies are already making a big impact in academic research on medically important microbes and may soon revolutionize diagnostics, epidemiology, and infection control. Plant pathology also stands to gain from exploiting these opportunities. This manuscript reviews some applications of these high-throughput sequencing methods that are relevant to phytopathology, with emphasis on the associated computational and bioinformatics challenges and their solutions. Second-generation sequencing technologies have recently been exploited in genomics of both prokaryotic and eukaryotic plant pathogens. They are also proving to be useful in diagnostics, especially with respect to viruses. Copyright © 2011 by Annual Reviews. All rights reserved.

  20. Molecular profiling of microbial communities from contaminated sources: Use of subtractive cloning methods and rDNA spacer sequences. 1998 annual progress report

    International Nuclear Information System (INIS)

    Robb, F.T.

    1998-01-01

    'The major objective of the research is to provide appropriate sequences and to assemble a high-density DNA array of oligonucleotides that can be used for rapid profiling of microbial populations from polluted areas. The sequences to be assigned to the DNA array are chosen from from cloned genomic DNA sequences (the ribosomal operon, described below) from groundwater at DOE sites containing organic solvents. The sites, Hanford Nuclear Plant and Lawrence Livermore Site 300, have well characterized pollutant histories, which have been provided by the collaborators. At this mid-point of the project, over 60 unique sequence classes of intergenic spacer region have been identified from the first sample site. The use of these sequences as hybridization probes, and their frequency of occurrence, allow a clear distinction between bacterial communities before and after remediation by acetate/nitrate pumping. The authors have developed the hybridization conditions for identifying PCR products in a 96 well format, a versatile alignment and visualization program (acronym: MALIGN) developed by Dr. Dennis Maeder, has been used to align the ISRs, which are variable in length and sometimes in position of the tRNAs. Finally, in collaboration with Dr. W. Chen and Dr. J. Zhou at ORNL, they have significant evidence that mass spectrometer analysis can be used to determine the lengths of PCR amplified intergenic spacer DNA.'

  1. RT-PCR and sequence analysis of the full-length fusion protein of Canine Distemper Virus from domestic dogs.

    Science.gov (United States)

    Romanutti, Carina; Gallo Calderón, Marina; Keller, Leticia; Mattion, Nora; La Torre, José

    2016-02-01

    During 2007-2014, 84 out of 236 (35.6%) samples from domestic dogs submitted to our laboratory for diagnostic purposes were positive for Canine Distemper Virus (CDV), as analyzed by RT-PCR amplification of a fragment of the nucleoprotein gene. Fifty-nine of them (70.2%) were from dogs that had been vaccinated against CDV. The full-length gene encoding the Fusion (F) protein of fifteen isolates was sequenced and compared with that of those of other CDVs, including wild-type and vaccine strains. Phylogenetic analysis using the F gene full-length sequences grouped all the Argentinean CDV strains in the SA2 clade. Sequence identity with the Onderstepoort vaccine strain was 89.0-90.6%, and the highest divergence was found in the 135 amino acids corresponding to the F protein signal-peptide, Fsp (64.4-66.7% identity). In contrast, this region was highly conserved among the local strains (94.1-100% identity). One extra putative N-glycosylation site was identified in the F gene of CDV Argentinean strains with respect to the vaccine strain. The present report is the first to analyze full-length F protein sequences of CDV strains circulating in Argentina, and contributes to the knowledge of molecular epidemiology of CDV, which may help in understanding future disease outbreaks. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Tomato protoplast DNA transformation : physical linkage and recombination of exogenous DNA sequences

    NARCIS (Netherlands)

    Jongsma, Maarten; Koornneef, Maarten; Zabel, Pim; Hille, Jacques

    1987-01-01

    Tomato protoplasts have been transformed with plasmid DNA's, containing a chimeric kanamycin resistance gene and putative tomato origins of replication. A calcium phosphate-DNA mediated transformation procedure was employed in combination with either polyethylene glycol or polyvinyl alcohol. There

  3. Development of three full-length infectious cDNA clones of distinct brassica yellows virus genotypes for agrobacterium-mediated inoculation.

    Science.gov (United States)

    Zhang, Xiao-Yan; Dong, Shu-Wei; Xiang, Hai-Ying; Chen, Xiang-Ru; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

    2015-02-02

    Brassica yellows virus is a newly identified species in the genus of Polerovirus within the family Luteoviridae. Brassica yellows virus (BrYV) is prevalently distributed throughout Mainland China and South Korea, is an important virus infecting cruciferous crops. Based on six BrYV genomic sequences of isolates from oilseed rape, rutabaga, radish, and cabbage, three genotypes, BrYV-A, BrYV-B, and BrYV-C, exist, which mainly differ in the 5' terminal half of the genome. BrYV is an aphid-transmitted and phloem-limited virus. The use of infectious cDNA clones is an alternative means of infecting plants that allows reverse genetic studies to be performed. In this study, full-length cDNA clones of BrYV-A, recombinant BrYV5B3A, and BrYV-C were constructed under control of the cauliflower mosaic virus 35S promoter. An agrobacterium-mediated inoculation system of Nicotiana benthamiana was developed using these cDNA clones. Three days after infiltration with full-length BrYV cDNA clones, necrotic symptoms were observed in the inoculated leaves of N. benthamiana; however, no obvious symptoms appeared in the upper leaves. Reverse transcription-PCR (RT-PCR) and western blot detection of samples from the upper leaves showed that the maximum infection efficiency of BrYVs could reach 100%. The infectivity of the BrYV-A, BrYV-5B3A, and BrYV-C cDNA clones was further confirmed by northern hybridization. The system developed here will be useful for further studies of BrYV, such as host range, pathogenicity, viral gene functions, and plant-virus-vector interactions, and especially for discerning the differences among the three genotypes. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Utility of 16S rDNA Sequencing for Identification of Rare Pathogenic Bacteria.

    Science.gov (United States)

    Loong, Shih Keng; Khor, Chee Sieng; Jafar, Faizatul Lela; AbuBakar, Sazaly

    2016-11-01

    Phenotypic identification systems are established methods for laboratory identification of bacteria causing human infections. Here, the utility of phenotypic identification systems was compared against 16S rDNA identification method on clinical isolates obtained during a 5-year study period, with special emphasis on isolates that gave unsatisfactory identification. One hundred and eighty-seven clinical bacteria isolates were tested with commercial phenotypic identification systems and 16S rDNA sequencing. Isolate identities determined using phenotypic identification systems and 16S rDNA sequencing were compared for similarity at genus and species level, with 16S rDNA sequencing as the reference method. Phenotypic identification systems identified ~46% (86/187) of the isolates with identity similar to that identified using 16S rDNA sequencing. Approximately 39% (73/187) and ~15% (28/187) of the isolates showed different genus identity and could not be identified using the phenotypic identification systems, respectively. Both methods succeeded in determining the species identities of 55 isolates; however, only ~69% (38/55) of the isolates matched at species level. 16S rDNA sequencing could not determine the species of ~20% (37/187) of the isolates. The 16S rDNA sequencing is a useful method over the phenotypic identification systems for the identification of rare and difficult to identify bacteria species. The 16S rDNA sequencing method, however, does have limitation for species-level identification of some bacteria highlighting the need for better bacterial pathogen identification tools. © 2016 Wiley Periodicals, Inc.

  5. Genetic alterations of hepatocellular carcinoma by random amplified polymorphic DNA analysis and cloning sequencing of tumor differential DNA fragment

    Science.gov (United States)

    Xian, Zhi-Hong; Cong, Wen-Ming; Zhang, Shu-Hui; Wu, Meng-Chao

    2005-01-01

    AIM: To study the genetic alterations and their association with clinicopathological characteristics of hepatocellular carcinoma (HCC), and to find the tumor related DNA fragments. METHODS: DNA isolated from tumors and corresponding noncancerous liver tissues of 56 HCC patients was amplified by random amplified polymorphic DNA (RAPD) with 10 random 10-mer arbitrary primers. The RAPD bands showing obvious differences in tumor tissue DNA corresponding to that of normal tissue were separated, purified, cloned and sequenced. DNA sequences were analyzed and compared with GenBank data. RESULTS: A total of 56 cases of HCC were demonstrated to have genetic alterations, which were detected by at least one primer. The detestability of genetic alterations ranged from 20% to 70% in each case, and 17.9% to 50% in each primer. Serum HBV infection, tumor size, histological grade, tumor capsule, as well as tumor intrahepatic metastasis, might be correlated with genetic alterations on certain primers. A band with a higher intensity of 480 bp or so amplified fragments in tumor DNA relative to normal DNA could be seen in 27 of 56 tumor samples using primer 4. Sequence analysis of these fragments showed 91% homology with Homo sapiens double homeobox protein DUX10 gene. CONCLUSION: Genetic alterations are a frequent event in HCC, and tumor related DNA fragments have been found in this study, which may be associated with hepatocarcin-ogenesis. RAPD is an effective method for the identification and analysis of genetic alterations in HCC, and may provide new information for further evaluating the molecular mechanism of hepatocarcinogenesis. PMID:15996039

  6. DNA sequence evolution in fast evolving mitochondrial DNA nad1 exons in Geraniaceae and Plantaginaceae

    NARCIS (Netherlands)

    Bakker, F.T.; Breman, F.; Merckx, V.

    2006-01-01

    Previously, nucleotide substitution rates in mitochondrial DNA of Geraniaceae and Plantaginaceae have been shown to be exceptionally high compared with other angiosperm mtDNA lineages. It has also been shown that mtDNA introns were lost in Geraniaceae and Plantaginaceae. In this study we compile 127

  7. 2D-dynamic representation of DNA sequences as a graphical tool in bioinformatics

    Science.gov (United States)

    Bielińska-Wa̧Ż, D.; Wa̧Ż, P.

    2016-10-01

    2D-dynamic representation of DNA sequences is briefly reviewed. Some new examples of 2D-dynamic graphs which are the graphical tool of the method are shown. Using the examples of the complete genome sequences of the Zika virus it is shown that the present method can be applied for the study of the evolution of viral genomes.

  8. Sequence-specific protection of duplex DNA against restriction and methylation enzymes by pseudocomplementary PNAs

    DEFF Research Database (Denmark)

    Izvolsky, K I; Demidov, V V; Nielsen, P E

    2000-01-01

    I restriction endonuclease and dam methylase. The pcPNA-assisted protection against enzymatic methylation is more efficient when the PNA-binding site embodies the methylase-recognition site rather than overlaps it. We conclude that pcPNAs may provide the robust tools allowing to sequence-specifically manipulate...... DNA duplexes in a virtually sequence-unrestricted manner....

  9. ABI Base Recall: Automatic Correction and Ends Trimming of DNA Sequences.

    Science.gov (United States)

    Elyazghi, Zakaria; Yazouli, Loubna El; Sadki, Khalid; Radouani, Fouzia

    2017-12-01

    Automated DNA sequencers produce chromatogram files in ABI format. When viewing chromatograms, some ambiguities are shown at various sites along the DNA sequences, because the program implemented in the sequencing machine and used to call bases cannot always precisely determine the right nucleotide, especially when it is represented by either a broad peak or a set of overlaying peaks. In such cases, a letter other than A, C, G, or T is recorded, most commonly N. Thus, DNA sequencing chromatograms need manual examination: checking for mis-calls and truncating the sequence when errors become too frequent. The purpose of this paper is to develop a program allowing the automatic correction of these ambiguities. This application is a Web-based program powered by Shiny and runs under R platform for an easy exploitation. As a part of the interface, we added the automatic ends clipping option, alignment against reference sequences, and BLAST. To develop and test our tool, we collected several bacterial DNA sequences from different laboratories within Institut Pasteur du Maroc and performed both manual and automatic correction. The comparison between the two methods was carried out. As a result, we note that our program, ABI base recall, accomplishes good correction with a high accuracy. Indeed, it increases the rate of identity and coverage and minimizes the number of mismatches and gaps, hence it provides solution to sequencing ambiguities and saves biologists' time and labor.

  10. Cloning, sequencing and expression of a novel xylanase cDNA from ...

    African Journals Online (AJOL)

    A strain SH 2016, capable of producing xylanase, was isolated and identified as Aspergillus awamori, based on its physiological and biochemical characteristics as well as its ITS rDNA gene sequence analysis. A xylanase gene of 591 bp was cloned from this newly isolated A. awamori and the ORF sequence predicted a ...

  11. Studies of base pair sequence effects on DNA solvation based on all

    Indian Academy of Sciences (India)

    Detailed analyses of the sequence-dependent solvation and ion atmosphere of DNA are presented based on molecular dynamics (MD) simulations on all the 136 unique tetranucleotide steps obtained by the ABC consortium using the AMBER suite of programs. Significant sequence effects on solvation and ion localization ...

  12. Open source tools to exploit DNA sequence data from livestock species

    Science.gov (United States)

    Next-Generation Sequencing (NGS) is a recent technological development that allows researchers to rapidly determine the DNA sequence of an individual. The decrease in cost of NGS has brought the technology into the realm of practical applications in livestock genomics, where it can be used to genera...

  13. Noninvasive prenatal diagnosis of fetal trisomy 18 and trisomy 13 by maternal plasma DNA sequencing.

    NARCIS (Netherlands)

    Chen, E.Z.; Chiu, R.W.; Sun, H; Akolekar, R.; Chan, K.C.; Leung, T.Y.; Jiang, P.; Zheng, Y.W.; Lun, F.M.; Chan, L.Y.; Jin, Y.; Go, A.T.; Lau, E.T; To, W.W.; Leung, W.C.; Tang, R.Y.; Au-Yeung, S.K.; Lam, H.; Kung, Y.Y.; Zhang, X.; Vugt, J.M.G. van; Minekawa, R.; Tang, M.H.; Wang, J.; Oudejans, C.B.; Lau, T.K.; Nicolaides, K.H.; Lo, Y.M.

    2011-01-01

    Massively parallel sequencing of DNA molecules in the plasma of pregnant women has been shown to allow accurate and noninvasive prenatal detection of fetal trisomy 21. However, whether the sequencing approach is as accurate for the noninvasive prenatal diagnosis of trisomy 13 and 18 is unclear due

  14. True single-molecule DNA sequencing of a pleistocene horse bone

    DEFF Research Database (Denmark)

    Orlando, Ludovic Antoine Alexandre; Ginolhac, Aurélien; Raghavan, Maanasa

    2011-01-01

    -preserved Pleistocene horse bone using the Helicos HeliScope and Illumina GAIIx platforms, respectively. We find that the percentage of endogenous DNA sequences derived from the horse is higher among the Helicos data than Illumina data. This result indicates that the molecular biology tools used to generate sequencing...

  15. Sequence Dependent Interactions Between DNA and Single-Walled Carbon Nanotubes

    Science.gov (United States)

    Roxbury, Daniel

    It is known that single-stranded DNA adopts a helical wrap around a single-walled carbon nanotube (SWCNT), forming a water-dispersible hybrid molecule. The ability to sort mixtures of SWCNTs based on chirality (electronic species) has recently been demonstrated using special short DNA sequences that recognize certain matching SWCNTs of specific chirality. This thesis investigates the intricacies of DNA-SWCNT sequence-specific interactions through both experimental and molecular simulation studies. The DNA-SWCNT binding strengths were experimentally quantified by studying the kinetics of DNA replacement by a surfactant on the surface of particular SWCNTs. Recognition ability was found to correlate strongly with measured binding strength, e.g. DNA sequence (TAT)4 was found to bind 20 times stronger to the (6,5)-SWCNT than sequence (TAT)4T. Next, using replica exchange molecular dynamics (REMD) simulations, equilibrium structures formed by (a) single-strands and (b) multiple-strands of 12-mer oligonucleotides adsorbed on various SWCNTs were explored. A number of structural motifs were discovered in which the DNA strand wraps around the SWCNT and 'stitches' to itself via hydrogen bonding. Great variability among equilibrium structures was observed and shown to be directly influenced by DNA sequence and SWCNT type. For example, the (6,5)-SWCNT DNA recognition sequence, (TAT)4, was found to wrap in a tight single-stranded right-handed helical conformation. In contrast, DNA sequence T12 forms a beta-barrel left-handed structure on the same SWCNT. These are the first theoretical indications that DNA-based SWCNT selectivity can arise on a molecular level. In a biomedical collaboration with the Mayo Clinic, pathways for DNA-SWCNT internalization into healthy human endothelial cells were explored. Through absorbance spectroscopy, TEM imaging, and confocal fluorescence microscopy, we showed that intracellular concentrations of SWCNTs far exceeded those of the incubation

  16. Identification of two invasive Cacopsylla chinensis (Hemiptera: Psyllidae) lineages based on two mitochondrial sequences and restriction fragment length polymorphism of cytochrome oxidase I amplicon.

    Science.gov (United States)

    Lee, Hsien-Chung; Yang, Man-Miao; Yeh, Wen-Bin

    2008-08-01

    The occurrence of pear decline, a disease found in some pear (Pyrus spp.) orchards of Taiwan in recent years, is accompanied by an outbreak of Cacopsylla chinensis (Yang & Li). Two major morphological forms (summer and winter forms) with a variety of intermediate body color and two phylogenetic lineages of this psyllid have been described. The work herein used sequences of mitochondrial cytochrome oxidase I (COI) and 16S rDNA regions to delineate the genetic differentiation of this color-variable insect and to elucidate their relationship. Sequence divergence and phylogenetic analysis have shown that C. chinensis individuals could be divided into two lineages with 3.3 and 2.3% divergence of COI and 16S rDNA, respectively. All specimens from China were found to belong to lineage I. Restriction fragment length polymorphism analysis of COI with restriction enzymes AcuI, AseI, BccI, and FokI on 263 specimens of six populations from Taiwan produced two digestion patterns, which are in agreement with the two lineages described above. Both patterns could be found in each population, with most individuals belonging to lineage I and 5-21% of the individuals belonging to lineage II. Because these two lineages included summer as well as winter morphological forms, the lineage differentiation is apparently not related to morphological characters of this psyllid. Because the invasive records are not in favor of a sympatric differentiation, this psyllid is more likely introduced as different populations from countries in temperate regions.

  17. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    PRECIOUS

    2009-11-02

    Nov 2, 2009 ... basic machinery of protein synthesis and regulation, but also in various ... The genomic DNA was isolated from Giant Panda muscle tissue according to the ... for 45 s, 72°C for 2 min in the first cycle and the anneal temperature deceased 0.2°C ..... edition, Cold Spring Harbor aboratory Press. Cold Spring ...

  18. DNA fingerprinting based on simple sequence repeat (SSR ...

    African Journals Online (AJOL)

    New varieties of sugarcane are protected using morphological descriptors, which have limitations in identifying morphologically similar cultivars. Development of a reliable DNA fingerprint system for identification of new varieties would contribute greatly to the breeding of these species. Microsatellite markers are tools with ...

  19. MotifMark: Finding regulatory motifs in DNA sequences.

    Science.gov (United States)

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

    2017-07-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.

  20. Potential of DNA sequences to identify zoanthids (Cnidaria: Zoantharia).

    Science.gov (United States)

    Sinniger, Frederic; Reimer, James D; Pawlowski, Jan

    2008-12-01

    The order Zoantharia is known for its chaotic taxonomy and difficult morphological identification. One method that potentially could help for examining such troublesome taxa is DNA barcoding, which identifies species using standard molecular markers. The mitochondrial cytochrome oxidase subunit I (COI) has been utilized to great success in groups such as birds and insects; however, its applicability in many other groups is controversial. Recently, some studies have suggested that barcoding is not applicable to anthozoans. Here, we examine the use of COI and mitochondrial 16S ribosomal DNA for zoanthid identification. Despite the absence of a clear barcoding gap, our results show that for most of 54 zoanthid samples, both markers could separate samples to the species, or species group, level, particularly when easily accessible ecological or distributional data were included. Additionally, we have used the short V5 region of mt 16S rDNA to identify eight old (13 to 50 years old) museum samples. We discuss advantages and disadvantages of COI and mt 16S rDNA as barcodes for Zoantharia, and recommend that either one or both of these markers be considered for zoanthid identification in the future.