WorldWideScience

Sample records for genomic dna sequences

  1. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  2. Sequencing Intractable DNA to Close Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  3. Genomic signal processing for DNA sequence clustering.

    Science.gov (United States)

    Mendizabal-Ruiz, Gerardo; Román-Godínez, Israel; Torres-Ramos, Sulema; Salido-Ruiz, Ricardo A; Vélez-Pérez, Hugo; Morales, J Alejandro

    2018-01-01

    Genomic signal processing (GSP) methods which convert DNA data to numerical values have recently been proposed, which would offer the opportunity of employing existing digital signal processing methods for genomic data. One of the most used methods for exploring data is cluster analysis which refers to the unsupervised classification of patterns in data. In this paper, we propose a novel approach for performing cluster analysis of DNA sequences that is based on the use of GSP methods and the K-means algorithm. We also propose a visualization method that facilitates the easy inspection and analysis of the results and possible hidden behaviors. Our results support the feasibility of employing the proposed method to find and easily visualize interesting features of sets of DNA data.

  4. Next Generation DNA Sequencing and the Future of Genomic Medicine

    OpenAIRE

    Anderson, Matthew W.; Schrijver, Iris

    2010-01-01

    In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpreta...

  5. An automated annotation tool for genomic DNA sequences using

    Indian Academy of Sciences (India)

    Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated ...

  6. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  7. Spectral entropy criteria for structural segmentation in genomic DNA sequences

    International Nuclear Information System (INIS)

    Chechetkin, V.R.; Lobzin, V.V.

    2004-01-01

    The spectral entropy is calculated with Fourier structure factors and characterizes the level of structural ordering in a sequence of symbols. It may efficiently be applied to the assessment and reconstruction of the modular structure in genomic DNA sequences. We present the relevant spectral entropy criteria for the local and non-local structural segmentation in DNA sequences. The results are illustrated with the model examples and analysis of intervening exon-intron segments in the protein-coding regions

  8. Order and correlations in genomic DNA sequences. The spectral approach

    International Nuclear Information System (INIS)

    Lobzin, Vasilii V; Chechetkin, Vladimir R

    2000-01-01

    The structural analysis of genomic DNA sequences is discussed in the framework of the spectral approach, which is sufficiently universal due to the reciprocal correspondence and mutual complementarity of Fourier transform length scales. The spectral characteristics of random sequences of the same nucleotide composition possess the property of self-averaging for relatively short sequences of length M≥100-300. Comparison with the characteristics of random sequences determines the statistical significance of the structural features observed. Apart from traditional applications to the search for hidden periodicities, spectral methods are also efficient in studying mutual correlations in DNA sequences. By combining spectra for structure factors and correlation functions, not only integral correlations can be estimated but also their origin identified. Using the structural spectral entropy approach, the regularity of a sequence can be quantitatively assessed. A brief introduction to the problem is also presented and other major methods of DNA sequence analysis described. (reviews of topical problems)

  9. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...... in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H. influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions....

  10. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    RPS16 of eukaryote is a component of the 40S small ribosomal subunit encoded by RPS16 gene and is also a homolog of prokaryotic RPS9. The cDNA and genomic sequence of RPS16 was cloned successfully for the first time from the Giant Panda (Ailuropoda melanoleuca) using reverse transcription-polymerase chain ...

  11. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  12. Microsatellite DNA in genomic survey sequences and UniGenes of loblolly pine

    Science.gov (United States)

    Craig S Echt; Surya Saha; Dennis L Deemer; C Dana Nelson

    2011-01-01

    Genomic DNA sequence databases are a potential and growing resource for simple sequence repeat (SSR) marker development in loblolly pine (Pinus taeda L.). Loblolly pine also has many expressed sequence tags (ESTs) available for microsatellite (SSR) marker development. We compared loblolly pine SSR densities in genome survey sequences (GSSs) to those in non-redundant...

  13. Multi-scale coding of genomic information: From DNA sequence to genome structure and function

    International Nuclear Information System (INIS)

    Arneodo, Alain; Vaillant, Cedric; Audit, Benjamin; Argoul, Francoise; D'Aubenton-Carafa, Yves; Thermes, Claude

    2011-01-01

    Understanding how chromatin is spatially and dynamically organized in the nucleus of eukaryotic cells and how this affects genome functions is one of the main challenges of cell biology. Since the different orders of packaging in the hierarchical organization of DNA condition the accessibility of DNA sequence elements to trans-acting factors that control the transcription and replication processes, there is actually a wealth of structural and dynamical information to learn in the primary DNA sequence. In this review, we show that when using concepts, methodologies, numerical and experimental techniques coming from statistical mechanics and nonlinear physics combined with wavelet-based multi-scale signal processing, we are able to decipher the multi-scale sequence encoding of chromatin condensation-decondensation mechanisms that play a fundamental role in regulating many molecular processes involved in nuclear functions.

  14. [Whole Genome Sequencing of Human mtDNA Based on Ion Torrent PGM™ Platform].

    Science.gov (United States)

    Cao, Y; Zou, K N; Huang, J P; Ma, K; Ping, Y

    2017-08-01

    To analyze and detect the whole genome sequence of human mitochondrial DNA (mtDNA) by Ion Torrent PGM™ platform and to study the differences of mtDNA sequence in different tissues. Samples were collected from 6 unrelated individuals by forensic postmortem examination, including chest blood, hair, costicartilage, nail, skeletal muscle and oral epithelium. Amplification of whole genome sequence of mtDNA was performed by 4 pairs of primer. Libraries were constructed with Ion Shear™ Plus Reagents kit and Ion Plus Fragment Library kit. Whole genome sequencing of mtDNA was performed using Ion Torrent PGM™ platform. Sanger sequencing was used to determine the heteroplasmy positions and the mutation positions on HVⅠ region. The whole genome sequence of mtDNA from all samples were amplified successfully. Six unrelated individuals belonged to 6 different haplotypes. Different tissues in one individual had heteroplasmy difference. The heteroplasmy positions and the mutation positions on HVⅠ region were verified by Sanger sequencing. After a consistency check by the Kappa method, it was found that the results of mtDNA sequence had a high consistency in different tissues. The testing method used in present study for sequencing the whole genome sequence of human mtDNA can detect the heteroplasmy difference in different tissues, which have good consistency. The results provide guidance for the further applications of mtDNA in forensic science. Copyright© by the Editorial Department of Journal of Forensic Medicine

  15. cDNA, genomic sequence cloning and overexpression of ribosomal ...

    African Journals Online (AJOL)

    PRECIOUS

    2009-11-02

    Nov 2, 2009 ... basic machinery of protein synthesis and regulation, but also in various ... The genomic DNA was isolated from Giant Panda muscle tissue according to the ... for 45 s, 72°C for 2 min in the first cycle and the anneal temperature deceased 0.2°C ..... edition, Cold Spring Harbor aboratory Press. Cold Spring ...

  16. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution

    NARCIS (Netherlands)

    Falconer, Ester; Hills, Mark; Naumann, Ulrike; Poon, Steven S. S.; Chavez, Elizabeth A.; Sanders, Ashley D.; Zhao, Yongjun; Hirst, Martin; Lansdorp, Peter M.

    DNA rearrangements such as sister chromatid exchanges (SCEs) are sensitive indicators of genomic stress and instability, but they are typically masked by single-cell sequencing techniques. We developed Strand-seq to independently sequence parental DNA template strands from single cells, making it

  17. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing

    Directory of Open Access Journals (Sweden)

    Zdepski Anna

    2011-05-01

    Full Text Available Abstract Background High throughput sequencing (HTS technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. Results We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR. We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Conclusions Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants.

  18. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human.

    Science.gov (United States)

    Wu, Chengchao; Yao, Shixin; Li, Xinghao; Chen, Chujia; Hu, Xuehai

    2017-02-16

    DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.

  19. Functional role of a highly repetitive DNA sequence in anchorage of the mouse genome.

    Science.gov (United States)

    Neuer-Nitsche, B; Lu, X N; Werner, D

    1988-09-12

    The major portion of the eukaryotic genome consists of various categories of repetitive DNA sequences which have been studied with respect to their base compositions, organizations, copy numbers, transcription and species specificities; their biological roles, however, are still unclear. A novel quality of a highly repetitive mouse DNA sequence is described which points to a functional role: All copies (approximately 50,000 per haploid genome) of this DNA sequence reside on genomic Alu I DNA fragments each associated with nuclear polypeptides that are not released from DNA by proteinase K, SDS and phenol extraction. By this quality the repetitive DNA sequence is classified as a member of the sub-set of DNA sequences involved in tight DNA-polypeptide complexes which have been previously shown to be components of the subnuclear structure termed 'nuclear matrix'. From these results it has to be concluded that the repetitive DNA sequence characterized in this report represents or comprises a signal for a large number of site specific attachment points of the mouse genome in the nuclear matrix.

  20. Purification of High Molecular Weight Genomic DNA from Powdery Mildew for Long-Read Sequencing.

    Science.gov (United States)

    Feehan, Joanna M; Scheibel, Katherine E; Bourras, Salim; Underwood, William; Keller, Beat; Somerville, Shauna C

    2017-03-31

    The powdery mildew fungi are a group of economically important fungal plant pathogens. Relatively little is known about the molecular biology and genetics of these pathogens, in part due to a lack of well-developed genetic and genomic resources. These organisms have large, repetitive genomes, which have made genome sequencing and assembly prohibitively difficult. Here, we describe methods for the collection, extraction, purification and quality control assessment of high molecular weight genomic DNA from one powdery mildew species, Golovinomyces cichoracearum. The protocol described includes mechanical disruption of spores followed by an optimized phenol/chloroform genomic DNA extraction. A typical yield was 7 µg DNA per 150 mg conidia. The genomic DNA that is isolated using this procedure is suitable for long-read sequencing (i.e., > 48.5 kbp). Quality control measures to ensure the size, yield, and purity of the genomic DNA are also described in this method. Sequencing of the genomic DNA of the quality described here will allow for the assembly and comparison of multiple powdery mildew genomes, which in turn will lead to a better understanding and improved control of this agricultural pathogen.

  1. High-throughput sequencing of three Lemnoideae (duckweeds chloroplast genomes from total DNA.

    Directory of Open Access Journals (Sweden)

    Wenqin Wang

    Full Text Available BACKGROUND: Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. METHODS: We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. CONCLUSIONS: This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.

  2. Complete DNA sequence of the linear mitochondrial genome of the pathogenic yeast Candida parapsilosis

    DEFF Research Database (Denmark)

    Nosek, J.; Novotna, M.; Hlavatovicova, Z.

    2004-01-01

    The complete sequence of the mitochondrial DNA of the opportunistic yeast pathogen Candida parapsilosis was determined. The mitochondrial genome is represented by linear DNA molecules terminating with tandem repeats of a 738-bp unit. The number of repeats varies, thus generating a population...

  3. Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

    Science.gov (United States)

    Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

  4. Genome dynamics of short oligonucleotides: the example of bacterial DNA uptake enhancing sequences.

    Directory of Open Access Journals (Sweden)

    Mohammed Bakkali

    Full Text Available Among the many bacteria naturally competent for transformation by DNA uptake-a phenomenon with significant clinical and financial implications- Pasteurellaceae and Neisseriaceae species preferentially take up DNA containing specific short sequences. The genomic overrepresentation of these DNA uptake enhancing sequences (DUES causes preferential uptake of conspecific DNA, but the function(s behind this overrepresentation and its evolution are still a matter for discovery. Here I analyze DUES genome dynamics and evolution and test the validity of the results to other selectively constrained oligonucleotides. I use statistical methods and computer simulations to examine DUESs accumulation in Haemophilus influenzae and Neisseria gonorrhoeae genomes. I analyze DUESs sequence and nucleotide frequencies, as well as those of all their mismatched forms, and prove the dependence of DUESs genomic overrepresentation on their preferential uptake by quantifying and correlating both characteristics. I then argue that mutation, uptake bias, and weak selection against DUESs in less constrained parts of the genome combined are sufficient enough to cause DUESs accumulation in susceptible parts of the genome with no need for other DUES function. The distribution of overrepresentation values across sequences with different mismatch loads compared to the DUES suggests a gradual yet not linear molecular drive of DNA sequences depending on their similarity to the DUES. Other genomically overrepresented sequences, both pro- and eukaryotic, show similar distribution of frequencies suggesting that the molecular drive reported above applies to other frequent oligonucleotides. Rare oligonucleotides, however, seem to be gradually drawn to genomic underrepresentation, thus, suggesting a molecular drag. To my knowledge this work provides the first clear evidence of the gradual evolution of selectively constrained oligonucleotides, including repeated, palindromic and protein

  5. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution

    OpenAIRE

    Falconer, Ester; Hills, Mark; Naumann, Ulrike; Poon, Steven S. S.; Chavez, Elizabeth A.; Sanders, Ashley D.; Zhao, Yongjun; Hirst, Martin; Lansdorp, Peter M.

    2012-01-01

    DNA rearrangements such as sister chromatid exchanges (SCEs) are sensitive indicators of genomic stress and instability, but they are typically masked by single-cell sequencing techniques. We developed Strand-seq to independently sequence parental DNA template strands from single cells, making it possible to map SCEs at orders-of-magnitude greater resolution than was previously possible. On average, murine embryonic stem (mES) cells exhibit eight SCEs, which are detected at a resolution of up...

  6. Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

    Science.gov (United States)

    Shi, Jinming

    In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.

  7. DNA Extraction Protocols for Whole-Genome Sequencing in Marine Organisms.

    Science.gov (United States)

    Panova, Marina; Aronsson, Henrik; Cameron, R Andrew; Dahl, Peter; Godhe, Anna; Lind, Ulrika; Ortega-Martinez, Olga; Pereyra, Ricardo; Tesson, Sylvie V M; Wrange, Anna-Lisa; Blomberg, Anders; Johannesson, Kerstin

    2016-01-01

    The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths' different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.

  8. Rhipicephalus microplus dataset of nonredundant raw sequence reads from 454 GS FLX sequencing of Cot-selected (Cot = 660) genomic DNA

    Science.gov (United States)

    A reassociation kinetics-based approach was used to reduce the complexity of genomic DNA from the Deutsch laboratory strain of the cattle tick, Rhipicephalus microplus, to facilitate genome sequencing. Selected genomic DNA (Cot value = 660) was sequenced using 454 GS FLX technology, resulting in 356...

  9. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    Directory of Open Access Journals (Sweden)

    Tran Duc

    2010-05-01

    Full Text Available Abstract Background Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the

  10. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Smith, David R.; Lee, Robert W.; Cushman, John C.; Magnuson, Jon K.; Tran, Duc; Polle, Juergen E.

    2010-05-07

    Abstract Background: Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results: The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA) sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions: These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the development of a viable

  11. DNA Data Bank of Japan at work on genome sequence data.

    Science.gov (United States)

    Tateno, Y; Fukami-Kobayashi, K; Miyazaki, S; Sugawara, H; Gojobori, T

    1998-01-01

    We at the DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) have recently begun receiving, processing and releasing EST and genome sequence data submitted by various Japanese genome projects. The data include those for human, Arabidopsis thaliana, rice, nematode, Synechocystis sp. and Escherichia coli. Since the quantity of data is very large, we organized teams to conduct preliminary discussions with project teams about data submission and handling for release to the public. We also developed a mass submission tool to cope with a large quantity of data. In addition, to provide genome data on WWW, we developed a genome information system using Java. This system (http://mol.genes.nig.ac.jp/ecoli/) can in theory be used for any genome sequence data. These activities will facilitate processing of large quantities of EST and genome data.

  12. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Directory of Open Access Journals (Sweden)

    Can Alkan

    2007-09-01

    Full Text Available The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  13. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Science.gov (United States)

    Alkan, Can; Ventura, Mario; Archidiacono, Nicoletta; Rocchi, Mariano; Sahinalp, S Cenk; Eichler, Evan E

    2007-09-01

    The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  14. Genomic DNA Enrichment Using Sequence Capture Microarrays: a Novel Approach to Discover Sequence Nucleotide Polymorphisms (SNP) in Brassica napus L

    Science.gov (United States)

    Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619

  15. Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

    Science.gov (United States)

    West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

    2014-07-01

    The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of

  16. cDNA, genomic cloning and sequence analysis of ribosomal protein ...

    African Journals Online (AJOL)

    enoh

    2012-03-13

    Mar 13, 2012 ... cDNA and the genomic sequence of RPS4X were cloned successfully from ... S4 genes plays a role in Turner syndrome; however, this ..... Project of Educational Committee of Sichuan Province ... Molecular biology of the cell.

  17. cDNA, genomic sequence cloning and analysis of the ribosomal ...

    African Journals Online (AJOL)

    Ribosomal protein L37A (RPL37A) is a component of 60S large ribosomal subunit encoded by the RPL37A gene, which belongs to the family of ribosomal L37AE proteins, located in the cytoplasm. The complementary deoxyribonucleic acid (cDNA) and the genomic sequence of RPL37A were cloned successfully from giant ...

  18. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  19. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique.

    Science.gov (United States)

    Chechetkin, V R; Lobzin, V V

    2017-08-07

    Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA

    DEFF Research Database (Denmark)

    Poulsen, Jesper Buchhave; Lescai, Francesco; Grove, Jakob

    2016-01-01

    Stored neonatal dried blood spot (DBS) samples from neonatal screening programmes are a valuable diagnostic and research resource. Combined with information from national health registries they can be used in population-based studies of genetic diseases. DNA extracted from neonatal DBSs can...... be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA). Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject...... we analysed a neonatal DBS sample and corresponding adult whole-blood (WB) reference sample. Different DNA sample types were prepared for each of the subjects. Pilot 1: wgaDNA of 2x3.2mm neonatal DBSs (DBS_2x3.2) and raw DNA extract of the WB reference sample (WB_ref). Pilot 2: DBS_2x3.2, WB...

  1. Comparison of microbial DNA enrichment tools for metagenomic whole genome sequencing.

    Science.gov (United States)

    Thoendel, Matthew; Jeraldo, Patricio R; Greenwood-Quaintance, Kerryl E; Yao, Janet Z; Chia, Nicholas; Hanssen, Arlen D; Abdel, Matthew P; Patel, Robin

    2016-08-01

    Metagenomic whole genome sequencing for detection of pathogens in clinical samples is an exciting new area for discovery and clinical testing. A major barrier to this approach is the overwhelming ratio of human to pathogen DNA in samples with low pathogen abundance, which is typical of most clinical specimens. Microbial DNA enrichment methods offer the potential to relieve this limitation by improving this ratio. Two commercially available enrichment kits, the NEBNext Microbiome DNA Enrichment Kit and the Molzym MolYsis Basic kit, were tested for their ability to enrich for microbial DNA from resected arthroplasty component sonicate fluids from prosthetic joint infections or uninfected sonicate fluids spiked with Staphylococcus aureus. Using spiked uninfected sonicate fluid there was a 6-fold enrichment of bacterial DNA with the NEBNext kit and 76-fold enrichment with the MolYsis kit. Metagenomic whole genome sequencing of sonicate fluid revealed 13- to 85-fold enrichment of bacterial DNA using the NEBNext enrichment kit. The MolYsis approach achieved 481- to 9580-fold enrichment, resulting in 7 to 59% of sequencing reads being from the pathogens known to be present in the samples. These results demonstrate the usefulness of these tools when testing clinical samples with low microbial burden using next generation sequencing. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  3. Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data.

    Science.gov (United States)

    Al-Nakeeb, Kosai; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas

    2017-11-21

    Whole-genome sequencing (WGS) projects provide short read nucleotide sequences from nuclear and possibly organelle DNA depending on the source of origin. Mitochondrial DNA is present in animals and fungi, while plants contain DNA from both mitochondria and chloroplasts. Current techniques for separating organelle reads from nuclear reads in WGS data require full reference or partial seed sequences for assembling. Norgal (de Novo ORGAneLle extractor) avoids this requirement by identifying a high frequency subset of k-mers that are predominantly of mitochondrial origin and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences in the range from 98.5 to 99.5%. We also assembled the chloroplasts of grape vines and cucumbers using Norgal together with seed-based de novo assemblers. Norgal is a pipeline that can extract and assemble full or partial mitochondrial and chloroplast genomes from WGS short reads without prior knowledge. The program is available at: https://bitbucket.org/kosaidtu/norgal .

  4. Haematobia irritans dataset of raw sequence reads from Illumina and Pac Bio sequencing of genomic DNA

    Science.gov (United States)

    The genome of the horn fly, Haematobia irritans, was sequenced using Illumina- and Pac Bio-based protocols. Following quality filtering, the raw reads have been deposited at NCBI under the BioProject and BioSample accession numbers PRJNA30967 and SAMN07830356, respectively. The Illumina reads are un...

  5. Draft genome sequence of an elite Dura palm and whole-genome patterns of DNA variation in oil palm.

    Science.gov (United States)

    Jin, Jingjing; Lee, May; Bai, Bin; Sun, Yanwei; Qu, Jing; Rahmadsyah; Alfiko, Yuzer; Lim, Chin Huat; Suwanto, Antonius; Sugiharti, Maria; Wong, Limsoon; Ye, Jian; Chua, Nam-Hai; Yue, Gen Hua

    2016-12-01

    Oil palm is the world's leading source of vegetable oil and fat. Dura, Pisifera and Tenera are three forms of oil palm. The genome sequence of Pisifera is available whereas the Dura form has not been sequenced yet. We sequenced the genome of one elite Dura palm, and re-sequenced 17 palm genomes. The assemble genome sequence of the elite Dura tree contained 10,971 scaffolds and was 1.701 Gb in length, covering 94.49% of the oil palm genome. 36,105 genes were predicted. Re-sequencing of 17 additional palm trees identified 18.1 million SNPs. We found high genetic variation among palms from different geographical regions, but lower variation among Southeast Asian Dura and Pisifera palms. We mapped 10,000 SNPs on the linkage map of oil palm. In addition, high linkage disequilibrium (LD) was detected in the oil palms used in breeding populations of Southeast Asia, suggesting that LD mapping is likely to be practical in this important oil crop. Our data provide a valuable resource for accelerating genetic improvement and studying the mechanism underlying phenotypic variations of important oil palm traits. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  6. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments.

    Science.gov (United States)

    Dabney, Jesse; Knapp, Michael; Glocke, Isabelle; Gansauge, Marie-Theres; Weihmann, Antje; Nickel, Birgit; Valdiosera, Cristina; García, Nuria; Pääbo, Svante; Arsuaga, Juan-Luis; Meyer, Matthias

    2013-09-24

    Although an inverse relationship is expected in ancient DNA samples between the number of surviving DNA fragments and their length, ancient DNA sequencing libraries are strikingly deficient in molecules shorter than 40 bp. We find that a loss of short molecules can occur during DNA extraction and present an improved silica-based extraction protocol that enables their efficient retrieval. In combination with single-stranded DNA library preparation, this method enabled us to reconstruct the mitochondrial genome sequence from a Middle Pleistocene cave bear (Ursus deningeri) bone excavated at Sima de los Huesos in the Sierra de Atapuerca, Spain. Phylogenetic reconstructions indicate that the U. deningeri sequence forms an early diverging sister lineage to all Western European Late Pleistocene cave bears. Our results prove that authentic ancient DNA can be preserved for hundreds of thousand years outside of permafrost. Moreover, the techniques presented enable the retrieval of phylogenetically informative sequences from samples in which virtually all DNA is diminished to fragments shorter than 50 bp.

  7. Shotgun Bisulfite Sequencing of the Betula platyphylla Genome Reveals the Tree’s DNA Methylation Patterning

    Directory of Open Access Journals (Sweden)

    Chang Su

    2014-12-01

    Full Text Available DNA methylation plays a critical role in the regulation of gene expression. Most studies of DNA methylation have been performed in herbaceous plants, and little is known about the methylation patterns in tree genomes. In the present study, we generated a map of methylated cytosines at single base pair resolution for Betula platyphylla (white birch by bisulfite sequencing combined with transcriptomics to analyze DNA methylation and its effects on gene expression. We obtained a detailed view of the function of DNA methylation sequence composition and distribution in the genome of B. platyphylla. There are 34,460 genes in the whole genome of birch, and 31,297 genes are methylated. Conservatively, we estimated that 14.29% of genomic cytosines are methylcytosines in birch. Among the methylation sites, the CHH context accounts for 48.86%, and is the largest proportion. Combined transcriptome and methylation analysis showed that the genes with moderate methylation levels had higher expression levels than genes with high and low methylation. In addition, methylated genes are highly enriched for the GO subcategories of binding activities, catalytic activities, cellular processes, response to stimulus and cell death, suggesting that methylation mediates these pathways in birch trees.

  8. Alignment of Escherichia coli K12 DNA sequences to a genomic restriction map.

    Science.gov (United States)

    Rudd, K E; Miller, W; Ostell, J; Benson, D A

    1990-01-25

    We use the extensive published information describing the genome of Escherichia coli and new restriction map alignment software to align DNA sequence, genetic, and physical maps. Restriction map alignment software is used which considers restriction maps as strings analogous to DNA or protein sequences except that two values, enzyme name and DNA base address, are associated with each position on the string. The resulting alignments reveal a nearly linear relationship between the physical and genetic maps of the E. coli chromosome. Physical map comparisons with the 1976, 1980, and 1983 genetic maps demonstrate a better fit with the more recent maps. The results of these alignments are genomic kilobase coordinates, orientation and rank of the alignment that best fits the genetic data. A statistical measure based on extreme value distribution is applied to the alignments. Additional computer analyses allow us to estimate the accuracy of the published E. coli genomic restriction map, simulate rearrangements of the bacterial chromosome, and search for repetitive DNA. The procedures we used are general enough to be applicable to other genome mapping projects.

  9. Digital Droplet Multiple Displacement Amplification (ddMDA for Whole Genome Sequencing of Limited DNA Samples.

    Directory of Open Access Journals (Sweden)

    Minsoung Rhee

    Full Text Available Multiple displacement amplification (MDA is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently, the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet, ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology.

  10. Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.

    Science.gov (United States)

    Raghav, Sunil Kumar; Deplancke, Bart

    2012-01-01

    Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.

  11. DNA sequence explains seemingly disordered methylation levels in partially methylated domains of Mammalian genomes.

    Directory of Open Access Journals (Sweden)

    Dimos Gaidatzis

    2014-02-01

    Full Text Available For the most part metazoan genomes are highly methylated and harbor only small regions with low or absent methylation. In contrast, partially methylated domains (PMDs, recently discovered in a variety of cell lines and tissues, do not fit this paradigm as they show partial methylation for large portions (20%-40% of the genome. While in PMDs methylation levels are reduced on average, we found that at single CpG resolution, they show extensive variability along the genome outside of CpG islands and DNase I hypersensitive sites (DHS. Methylation levels range from 0% to 100% in a roughly uniform fashion with only little similarity between neighboring CpGs. A comparison of various PMD-containing methylomes showed that these seemingly disordered states of methylation are strongly conserved across cell types for virtually every PMD. Comparative sequence analysis suggests that DNA sequence is a major determinant of these methylation states. This is further substantiated by a purely sequence based model which can predict 31% (R(2 of the variation in methylation. The model revealed CpG density as the main driving feature promoting methylation, opposite to what has been shown for CpG islands, followed by various dinucleotides immediately flanking the CpG and a minor contribution from sequence preferences reflecting nucleosome positioning. Taken together we provide a reinterpretation for the nucleotide-specific methylation levels observed in PMDs, demonstrate their conservation across tissues and suggest that they are mainly determined by specific DNA sequence features.

  12. Genomic localization, sequence analysis, and transcription of the putative human cytomegalovirus DNA polymerase gene

    International Nuclear Information System (INIS)

    Heilbronn, T.; Jahn, G.; Buerkle, A.; Freese, U.K.; Fleckenstein, B.; Zur Hausen, H.

    1987-01-01

    The human cytomegalovirus (HCMV)-induced DNA polymerase has been well characterized biochemically and functionally, but its genomic location has not yet been assigned. To identify the coding sequence, cross-hybridization with the herpes simplex virus type 1 (HSV-1) polymerase gene was used, as suggested by the close similarity of the herpes group virus-induced DNA polymerases to the HCMV DNA polymerase. A cosmid and plasmid library of the entire HCMV genome was screened with the BamHI Q fragment of HSF-1 at different stringency conditions. One PstI-HincII restriction fragment of 850 base pairs mapping within the EcoRI M fragment of HCMV cross-hybridized at T/sub m/ - 25/degrees/C. Sequence analysis revealed one open reading frame spanning the entire sequence. The amino acid sequence showed a highly conserved domain of 133 amino acids shared with the HSV and putative Esptein-Barr virus polymerase sequences. This domain maps within the C-terminal part of the HSV polymerase gene, which has been suggested to contain part of the catalytic center of the enzyme. Transcription analysis revealed one 5.4-kilobase early transcript in the sense orientation with respect to the open reading frame identified. This transcript appears to code for the 140-kilodalton HCMV polymerase protein

  13. Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient genomes.

    Directory of Open Access Journals (Sweden)

    Andaine Seguin-Orlando

    Full Text Available Ancient DNA extracts consist of a mixture of endogenous molecules and contaminant DNA templates, often originating from environmental microbes. These two populations of templates exhibit different chemical characteristics, with the former showing depurination and cytosine deamination by-products, resulting from post-mortem DNA damage. Such chemical modifications can interfere with the molecular tools used for building second-generation DNA libraries, and limit our ability to fully characterize the true complexity of ancient DNA extracts. In this study, we first use fresh DNA extracts to demonstrate that library preparation based on adapter ligation at AT-overhangs are biased against DNA templates starting with thymine residues, contrarily to blunt-end adapter ligation. We observe the same bias on fresh DNA extracts sheared on Bioruptor, Covaris and nebulizers. This contradicts previous reports suggesting that this bias could originate from the methods used for shearing DNA. This also suggests that AT-overhang adapter ligation efficiency is affected in a sequence-dependent manner and results in an uneven representation of different genomic contexts. We then show how this bias could affect the base composition of ancient DNA libraries prepared following AT-overhang ligation, mainly by limiting the ability to ligate DNA templates starting with thymines and therefore deaminated cytosines. This results in particular nucleotide misincorporation damage patterns, deviating from the signature generally expected for authenticating ancient sequence data. Consequently, we show that models adequate for estimating post-mortem DNA damage levels must be robust to the molecular tools used for building ancient DNA libraries.

  14. Genomic organization and dynamics of repetitive DNA sequences in representatives of three Fagaceae genera.

    Science.gov (United States)

    Alves, Sofia; Ribeiro, Teresa; Inácio, Vera; Rocheta, Margarida; Morais-Cecílio, Leonor

    2012-05-01

    Oaks, chestnuts, and beeches are economically important species of the Fagaceae. To understand the relationship between these members of this family, a deep knowledge of their genome composition and organization is needed. In this work, we have isolated and characterized several AFLP fragments obtained from Quercus rotundifolia Lam. through homology searches in available databases. Genomic polymorphisms involving some of these sequences were evaluated in two species of Quercus, one of Castanea, and one of Fagus with specific primers. Comparative FISH analysis with generated sequences was performed in interphase nuclei of the four species, and the co-immunolocalization of 5-methylcytosine was also studied. Some of the sequences isolated proved to be genus-specific, while others were present in all the genera. Retroelements, either gypsy-like of the Tat/Athila clade or copia-like, are well represented, and most are dispersed in euchromatic regions of these species with no DNA methylation associated, pointing to an interspersed arrangement of these retroelements with potential gene-rich regions. A particular gypsy-sequence is dispersed in oaks and chestnut nuclei, but its confinement to chromocenters in beech evidences genome restructuring events during evolution of Fagaceae. Several sequences generated in this study proved to be good tools to comparatively study Fagaceae genome organization.

  15. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA

    DEFF Research Database (Denmark)

    Alquezar-Planas, David E; Fordyce, Sarah Louise

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure...... that the data produced is optimal. Although much of the procedure can be followed directly from the manufacturer's protocols, the key differences lie in the library preparation steps. This chapter presents an optimized protocol for the sequencing of fossil remains and museum specimens, commonly referred...

  16. DNA Breaks and End Resection Measured Genome-wide by End Sequencing.

    Science.gov (United States)

    Canela, Andres; Sridharan, Sriram; Sciascia, Nicholas; Tubbs, Anthony; Meltzer, Paul; Sleckman, Barry P; Nussenzweig, André

    2016-09-01

    DNA double-strand breaks (DSBs) arise during physiological transcription, DNA replication, and antigen receptor diversification. Mistargeting or misprocessing of DSBs can result in pathological structural variation and mutation. Here we describe a sensitive method (END-seq) to monitor DNA end resection and DSBs genome-wide at base-pair resolution in vivo. We utilized END-seq to determine the frequency and spectrum of restriction-enzyme-, zinc-finger-nuclease-, and RAG-induced DSBs. Beyond sequence preference, chromatin features dictate the repertoire of these genome-modifying enzymes. END-seq can detect at least one DSB per cell among 10,000 cells not harboring DSBs, and we estimate that up to one out of 60 cells contains off-target RAG cleavage. In addition to site-specific cleavage, we detect DSBs distributed over extended regions during immunoglobulin class-switch recombination. Thus, END-seq provides a snapshot of DNA ends genome-wide, which can be utilized for understanding genome-editing specificities and the influence of chromatin on DSB pathway choice. Published by Elsevier Inc.

  17. Direct DNA Extraction from Mycobacterium tuberculosis Frozen Stocks as a Reculture-Independent Approach to Whole-Genome Sequencing

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Zallet, J; Lillebaek, T

    2015-01-01

    Culturing before DNA extraction represents a major time-consuming step in whole-genome sequencing of slow-growing bacteria, such as Mycobacterium tuberculosis. We report a workflow to extract DNA from frozen isolates without reculturing. Prepared libraries and sequence data were comparable...... with results from recultured aliquots of the same stocks....

  18. Locus Reference Genomic sequences: An improved basis for describing human DNA variants

    KAUST Repository

    Dalgleish, Raymond; Flicek, Paul; Cunningham, Fiona; Astashyn, Alex; Tully, Raymond E; Proctor, Glenn; Chen, Yuan; McLaren, William M; Larsson, Pontus; Vaughan, Brendan W; Bé roud, Christophe; Dobson, Glen; Lehvä slaiho, Heikki; Taschner, Peter EM; den Dunnen, Johan T; Devereau, Andrew; Birney, Ewan; Brookes, Anthony J; Maglott, Donna R

    2010-01-01

    As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specifi c purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-fi le record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)- approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants aff ecting human health. Further information can be found on the LRG web site (http://www.lrg-sequence.org). 2010 Dalgleish et al.; licensee BioMed Central Ltd.

  19. Locus Reference Genomic sequences: An improved basis for describing human DNA variants

    KAUST Repository

    Dalgleish, Raymond

    2010-04-15

    As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specifi c purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-fi le record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)- approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants aff ecting human health. Further information can be found on the LRG web site (http://www.lrg-sequence.org). 2010 Dalgleish et al.; licensee BioMed Central Ltd.

  20. Human β satellite DNA: Genomic organization and sequence definition of a class of highly repetitive tandem DNA

    International Nuclear Information System (INIS)

    Waye, J.S.; Willard, H.F.

    1989-01-01

    The authors describe a class of human repetitive DNA, called β satellite, that, at a most fundamental level, exists as tandem arrays of diverged ∼68-base-pair monomer repeat units. The monomer units are organized as distinct subsets, each characterized by a multimeric higher-order repeat unit that is tandemly reiterated and represents a recent unit of amplification. They have cloned, characterized, and determined the sequence of two β satellite higher-order repeat units: one located on chromosome 9, the other on the acrocentric chromosomes (13, 14, 15, 21, and 22) and perhaps other sites in the genome. Analysis by pulsed-field gel electrophoresis reveals that these tandem arrays are localized in large domains that are marked by restriction fragment length polymorphisms. In total, β-satellite sequences comprise several million base pairs of DNA in the human genome. Analysis of this DNA family should permit insights into the nature of chromosome-specific and nonspecific modes of satellite DNA evolution and provide useful tools for probing the molecular organization and concerted evolution of the acrocentric chromosomes

  1. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.

    2011-11-15

    Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. The Author(s) 2011. Published by Oxford University Press. All rights reserved.

  2. DNA fingerprinting of Mycobacterium tuberculosis: from phage typing to whole-genome sequencing.

    Science.gov (United States)

    Schürch, Anita C; van Soolingen, Dick

    2012-06-01

    Current typing methods for Mycobacterium tuberculosis complex evolved from simple phenotypic approaches like phage typing and drug susceptibility profiling to DNA-based strain typing methods, such as IS6110-restriction fragment length polymorphisms (RFLP) and variable number of tandem repeats (VNTR) typing. Examples of the usefulness of molecular typing are source case finding and epidemiological linkage of tuberculosis (TB) cases, international transmission of MDR/XDR-TB, the discrimination between endogenous reactivation and exogenous re-infection as a cause of relapses after curative treatment of tuberculosis, the evidence of multiple M. tuberculosis infections, and the disclosure of laboratory cross-contaminations. Simultaneously, phylogenetic analyses were developed based on single nucleotide polymorphisms (SNPs), genomic deletions usually referred to as regions of difference (RDs) and spoligotyping which served both strain typing and phylogenetic analysis. National and international initiatives that rely on the application of these typing methods have brought significant insight into the molecular epidemiology of tuberculosis. However, current DNA fingerprinting methods have important limitations. They can often not distinguish between genetically closely related strains and the turn-over of these markers is variable. Moreover, the suitability of most DNA typing methods for phylogenetic reconstruction is limited as they show a high propensity of convergent evolution or misinfer genetic distances. In order to fully explore the possibilities of genotyping in the molecular epidemiology of tuberculosis and to study the phylogeny of the causative bacteria reliably, the application of whole-genome sequencing (WGS) analysis for all M. tuberculosis isolates is the optimal, although currently still a costly solution. In the last years WGS for typing of pathogens has been explored and yielded important additional information on strain diversity in comparison to the

  3. Assessing genetic diversity among Brettanomyces yeasts by DNA fingerprinting and whole-genome sequencing.

    Science.gov (United States)

    Crauwels, Sam; Zhu, Bo; Steensels, Jan; Busschaert, Pieter; De Samblanx, Gorik; Marchal, Kathleen; Willems, Kris A; Verstrepen, Kevin J; Lievens, Bart

    2014-07-01

    Brettanomyces yeasts, with the species Brettanomyces (Dekkera) bruxellensis being the most important one, are generally reported to be spoilage yeasts in the beer and wine industry due to the production of phenolic off flavors. However, B. bruxellensis is also known to be a beneficial contributor in certain fermentation processes, such as the production of certain specialty beers. Nevertheless, despite its economic importance, Brettanomyces yeasts remain poorly understood at the genetic and genomic levels. In this study, the genetic relationship between more than 50 Brettanomyces strains from all presently known species and from several sources was studied using a combination of DNA fingerprinting techniques. This revealed an intriguing correlation between the B. bruxellensis fingerprints and the respective isolation source. To further explore this relationship, we sequenced a (beneficial) beer isolate of B. bruxellensis (VIB X9085; ST05.12/22) and compared its genome sequence with the genome sequences of two wine spoilage strains (AWRI 1499 and CBS 2499). ST05.12/22 was found to be substantially different from both wine strains, especially at the level of single nucleotide polymorphisms (SNPs). In addition, there were major differences in the genome structures between the strains investigated, including the presence of large duplications and deletions. Gene content analysis revealed the presence of 20 genes which were present in both wine strains but absent in the beer strain, including many genes involved in carbon and nitrogen metabolism, and vice versa, no genes that were missing in both AWRI 1499 and CBS 2499 were found in ST05.12/22. Together, this study provides tools to discriminate Brettanomyces strains and provides a first glimpse at the genetic diversity and genome plasticity of B. bruxellensis. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  4. DNA immunoprecipitation semiconductor sequencing (DIP-SC-seq) as a rapid method to generate genome wide epigenetic signatures

    OpenAIRE

    Thomson, John P.; Fawkes, Angie; Ottaviano, Raffaele; Hunter, Jennifer M.; Shukla, Ruchi; Mjoseng, Heidi K.; Clark, Richard; Coutts, Audrey; Murphy, Lee; Meehan, Richard R.

    2015-01-01

    Modification of DNA resulting in 5-methylcytosine (5 mC) or 5-hydroxymethylcytosine (5hmC) has been shown to influence the local chromatin environment and affect transcription. Although recent advances in next generation sequencing technology allow researchers to map epigenetic modifications across the genome, such experiments are often time-consuming and cost prohibitive. Here we present a rapid and cost effective method of generating genome wide DNA modification maps utilising commercially ...

  5. Isolation and sequence analysis of the wheat B genome subtelomeric DNA

    Directory of Open Access Journals (Sweden)

    Huneau Cecile

    2009-09-01

    Full Text Available Abstract Background Telomeric and subtelomeric regions are essential for genome stability and regular chromosome replication. In this work, we have characterized the wheat BAC (bacterial artificial chromosome clones containing Spelt1 and Spelt52 sequences, which belong to the subtelomeric repeats of the B/G genomes of wheats and Aegilops species from the section Sitopsis. Results The BAC library from Triticum aestivum cv. Renan was screened using Spelt1 and Spelt52 as probes. Nine positive clones were isolated; of them, clone 2050O8 was localized mainly to the distal parts of wheat chromosomes by in situ hybridization. The distribution of the other clones indicated the presence of different types of repetitive sequences in BACs. Use of different approaches allowed us to prove that seven of the nine isolated clones belonged to the subtelomeric chromosomal regions. Clone 2050O8 was sequenced and its sequence of 119 737 bp was annotated. It is composed of 33% transposable elements (TEs, 8.2% Spelt52 (namely, the subfamily Spelt52.2 and five non-TE-related genes. DNA transposons are predominant, making up 24.6% of the entire BAC clone, whereas retroelements account for 8.4% of the clone length. The full-length CACTA transposon Caspar covers 11 666 bp, encoding a transposase and CTG-2 proteins, and this transposon accounts for 40% of the DNA transposons. The in situ hybridization data for 2050O8 derived subclones in combination with the BLAST search against wheat mapped ESTs (expressed sequence tags suggest that clone 2050O8 is located in the terminal bin 4BL-10 (0.95-1.0. Additionally, four of the predicted 2050O8 genes showed significant homology to four putative orthologous rice genes in the distal part of rice chromosome 3S and confirm the synteny to wheat 4BL. Conclusion Satellite DNA sequences from the subtelomeric regions of diploid wheat progenitor can be used for selecting the BAC clones from the corresponding regions of hexaploid wheat

  6. Isolation and sequence analysis of the wheat B genome subtelomeric DNA.

    Science.gov (United States)

    Salina, Elena A; Sergeeva, Ekaterina M; Adonina, Irina G; Shcherban, Andrey B; Afonnikov, Dmitry A; Belcram, Harry; Huneau, Cecile; Chalhoub, Boulos

    2009-09-05

    Telomeric and subtelomeric regions are essential for genome stability and regular chromosome replication. In this work, we have characterized the wheat BAC (bacterial artificial chromosome) clones containing Spelt1 and Spelt52 sequences, which belong to the subtelomeric repeats of the B/G genomes of wheats and Aegilops species from the section Sitopsis. The BAC library from Triticum aestivum cv. Renan was screened using Spelt1 and Spelt52 as probes. Nine positive clones were isolated; of them, clone 2050O8 was localized mainly to the distal parts of wheat chromosomes by in situ hybridization. The distribution of the other clones indicated the presence of different types of repetitive sequences in BACs. Use of different approaches allowed us to prove that seven of the nine isolated clones belonged to the subtelomeric chromosomal regions. Clone 2050O8 was sequenced and its sequence of 119,737 bp was annotated. It is composed of 33% transposable elements (TEs), 8.2% Spelt52 (namely, the subfamily Spelt52.2) and five non-TE-related genes. DNA transposons are predominant, making up 24.6% of the entire BAC clone, whereas retroelements account for 8.4% of the clone length. The full-length CACTA transposon Caspar covers 11,666 bp, encoding a transposase and CTG-2 proteins, and this transposon accounts for 40% of the DNA transposons. The in situ hybridization data for 2050O8 derived subclones in combination with the BLAST search against wheat mapped ESTs (expressed sequence tags) suggest that clone 2050O8 is located in the terminal bin 4BL-10 (0.95-1.0). Additionally, four of the predicted 2050O8 genes showed significant homology to four putative orthologous rice genes in the distal part of rice chromosome 3S and confirm the synteny to wheat 4BL. Satellite DNA sequences from the subtelomeric regions of diploid wheat progenitor can be used for selecting the BAC clones from the corresponding regions of hexaploid wheat chromosomes. It has been demonstrated for the first time

  7. Using physicochemical and compositional characteristics of DNA sequence for prediction of genomic signals

    KAUST Repository

    Mulamba, Pierre Abraham

    2014-12-01

    The challenge in finding genes in eukaryotic organisms using computational methods is an ongoing problem in the biology. Based on various genomic signals found in eukaryotic genomes, this problem can be divided into many different sub­‐problems such as identification of transcription start sites, translation initiation sites, splice sites, poly (A) signals, etc. Each sub-­problem deals with a particular type of genomic signals and various computational methods are used to solve each sub-­problem. Aggregating information from all these individual sub-­problems can lead to a complete annotation of a gene and its component signals. The fundamental principle of most of these computational methods is the mapping principle – building an input-­output model for the prediction of a particular genomic signal based on a set of known input signals and their corresponding output signal. The type of input signals used to build the model is an essential element in most of these computational methods. The common factor of most of these methods is that they are mainly based on the statistical analysis of the basic nucleotide sequence string composition. 4 Our study is based on a novel approach to predict genomic signals in which uniquely generated structural profiles that combine compressed physicochemical properties with topological and compositional properties of DNA sequences are used to develop machine learning predictive models. The compression of the physicochemical properties is made using principal component analysis transformation. Our ideas are evaluated through prediction models of canonical splice sites using support vector machine models. We demonstrate across several species that the proposed methodology has resulted in the most accurate splice site predictors that are publicly available or described. We believe that the approach in this study is quite general and has various applications in other biological modeling problems.

  8. DNA Barcoding: Amplification and sequence analysis of rbcl and matK genome regions in three divergent plant species

    Directory of Open Access Journals (Sweden)

    Javed Iqbal Wattoo

    2016-11-01

    Full Text Available Background: DNA barcoding is a novel method of species identification based on nucleotide diversity of conserved sequences. The establishment and refining of plant DNA barcoding systems is more challenging due to high genetic diversity among different species. Therefore, targeting the conserved nuclear transcribed regions would be more reliable for plant scientists to reveal genetic diversity, species discrimination and phylogeny. Methods: In this study, we amplified and sequenced the chloroplast DNA regions (matk+rbcl of Solanum nigrum, Euphorbia helioscopia and Dalbergia sissoo to study the functional annotation, homology modeling and sequence analysis to allow a more efficient utilization of these sequences among different plant species. These three species represent three families; Solanaceae, Euphorbiaceae and Fabaceae respectively. Biological sequence homology and divergence of amplified sequences was studied using Basic Local Alignment Tool (BLAST. Results: Both primers (matk+rbcl showed good amplification in three species. The sequenced regions reveled conserved genome information for future identification of different medicinal plants belonging to these species. The amplified conserved barcodes revealed different levels of biological homology after sequence analysis. The results clearly showed that the use of these conserved DNA sequences as barcode primers would be an accurate way for species identification and discrimination. Conclusion: The amplification and sequencing of conserved genome regions identified a novel sequence of matK in native species of Solanum nigrum. The findings of the study would be applicable in medicinal industry to establish DNA based identification of different medicinal plant species to monitor adulteration.

  9. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  10. Genomic DNA sequences from mastodon and woolly mammoth reveal deep speciation of forest and savanna elephants.

    Directory of Open Access Journals (Sweden)

    Nadin Rohland

    2010-12-01

    Full Text Available To elucidate the history of living and extinct elephantids, we generated 39,763 bp of aligned nuclear DNA sequence across 375 loci for African savanna elephant, African forest elephant, Asian elephant, the extinct American mastodon, and the woolly mammoth. Our data establish that the Asian elephant is the closest living relative of the extinct mammoth in the nuclear genome, extending previous findings from mitochondrial DNA analyses. We also find that savanna and forest elephants, which some have argued are the same species, are as or more divergent in the nuclear genome as mammoths and Asian elephants, which are considered to be distinct genera, thus resolving a long-standing debate about the appropriate taxonomic classification of the African elephants. Finally, we document a much larger effective population size in forest elephants compared with the other elephantid taxa, likely reflecting species differences in ancient geographic structure and range and differences in life history traits such as variance in male reproductive success.

  11. Long span DNA paired-end-tag (DNA-PET sequencing strategy for the interrogation of genomic structural mutations and fusion-point-guided reconstruction of amplicons.

    Directory of Open Access Journals (Sweden)

    Fei Yao

    Full Text Available Structural variations (SVs contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10-20 kb and compared their characteristics with short insert (1 kb libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.

  12. The Paramecium germline genome provides a niche for intragenic parasitic DNA: evolutionary dynamics of internal eliminated sequences.

    Science.gov (United States)

    Arnaiz, Olivier; Mathy, Nathalie; Baudry, Céline; Malinsky, Sophie; Aury, Jean-Marc; Denby Wilkes, Cyril; Garnier, Olivier; Labadie, Karine; Lauderdale, Benjamin E; Le Mouël, Anne; Marmignon, Antoine; Nowacki, Mariusz; Poulain, Julie; Prajer, Malgorzata; Wincker, Patrick; Meyer, Eric; Duharcourt, Sandra; Duret, Laurent; Bétermier, Mireille; Sperling, Linda

    2012-01-01

    Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES) from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of -45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a -10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a fraction of the

  13. The Paramecium germline genome provides a niche for intragenic parasitic DNA: evolutionary dynamics of internal eliminated sequences.

    Directory of Open Access Journals (Sweden)

    Olivier Arnaiz

    Full Text Available Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of -45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a -10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a

  14. Re-exploration of U's Triangle Brassica Species Based on Chloroplast Genomes and 45S nrDNA Sequences.

    Science.gov (United States)

    Kim, Chang-Kug; Seol, Young-Joo; Perumal, Sampath; Lee, Jonghoon; Waminal, Nomar Espinosa; Jayakodi, Murukarthick; Lee, Sang-Choon; Jin, Seungwoo; Choi, Beom-Soon; Yu, Yeisoo; Ko, Ho-Cheol; Choi, Ji-Weon; Ryu, Kyoung-Yul; Sohn, Seong-Han; Parkin, Isobel; Yang, Tae-Jin

    2018-05-09

    The concept of U's triangle, which revealed the importance of polyploidization in plant genome evolution, described natural allopolyploidization events in Brassica using three diploids [B. rapa (A genome), B. nigra (B), and B. oleracea (C)] and derived allotetraploids [B. juncea (AB genome), B. napus (AC), and B. carinata (BC)]. However, comprehensive understanding of Brassica genome evolution has not been fully achieved. Here, we performed low-coverage (2-6×) whole-genome sequencing of 28 accessions of Brassica as well as of Raphanus sativus [R genome] to explore the evolution of six Brassica species based on chloroplast genome and ribosomal DNA variations. Our phylogenomic analyses led to two main conclusions. (1) Intra-species-level chloroplast genome variations are low in the three allotetraploids (2~7 SNPs), but rich and variable in each diploid species (7~193 SNPs). (2) Three allotetraploids maintain two 45SnrDNA types derived from both ancestral species with maternal dominance. Furthermore, this study sheds light on the maternal origin of the AC chloroplast genome. Overall, this study clarifies the genetic relationships of U's triangle species based on a comprehensive genomics approach and provides important genomic resources for correlative and evolutionary studies.

  15. Complete DNA sequence of the mitochondrial genome of the treehopper Leptobelus gazella (Membracoidea: Hemiptera).

    Science.gov (United States)

    Zhao, Xing; Liang, Ai-Ping

    2016-09-01

    The first complete DNA sequence of the mitochondrial genome (mitogenome) of Leptobelus gazelle (Membracoidea: Hemiptera) is determined in this study. The circular molecule is 16,007 bp in its full length, which encodes a set of 37 genes, including 13 proteins, 2 ribosomal RNAs, 22 transfer RNAs, and contains an A + T-rich region (CR). The gene numbers, content, and organization of L. gazelle are similar to other typical metazoan mitogenomes. Twelve of the 13 PCGs are initiated with ATR methionine or ATT isoleucine codons, except the atp8 gene that uses the ATC isoleucine as start signal. Ten of the 13 PCGs have complete termination codons, either TAA (nine genes) or TAG (cytb). The remaining 3 PCGs (cox1, cox2 and nad5) have incomplete termination codons T (AA). All of the 22 tRNAs can be folded in the form of a typical clover-leaf structure. The complete mitogenome sequence data of L. gazelle is useful for the phylogenetic and biogeographic studies of the Membracoidea and Hemiptera.

  16. Determination of cDNA and genomic DNA sequences of hevamine, a chitinase from the rubber tree Hevea brasiliensis

    NARCIS (Netherlands)

    Bokma, E; Spiering, M; Chow, KS; Mulder, PPMFA; Subroto, T; Beintema, JJ

    Hevamine is a chitinase from the rubber tree Hevea brasiliensis and belongs to the family 18 glycosyl hydrolases. This paper describes the cloning of hevamine DNA and cDNA sequences. Hevamine contains a signal peptide at the N-terminus and a putative vacuolar targeting sequence at the C-terminus

  17. Complete chloroplast genome and 45S nrDNA sequences of the medicinal plant species Glycyrrhiza glabra and Glycyrrhiza uralensis.

    Science.gov (United States)

    Kang, Sang-Ho; Lee, Jeong-Hoon; Lee, Hyun Oh; Ahn, Byoung Ohg; Won, So Youn; Sohn, Seong-Han; Kim, Jung Sun

    2017-10-06

    Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from these plants are widely used as natural sweeteners because of their much greater sweetness than sucrose. In this study, the three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of these two licorice species and an interspecific hybrid are presented. The chloroplast genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and 127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes, including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type. The complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA. We identified simple sequence repeat and tandem repeat sequences. We also developed four reliable markers for analysis of Glycyrrhiza diversity authentication.

  18. Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

    Directory of Open Access Journals (Sweden)

    Maley Carlo C

    2008-10-01

    Full Text Available Abstract Background Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12 genomes. Virtually all possible (> 98% 12 bp oligomers appear in vertebrate genomes while 98% to D. melanogaster (12–17 bp, C. elegans (11–17 bp, A. thaliana (11–17 bp, S. cerevisiae (10–16 bp and E. coli (9–15 bp. Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect

  19. Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

    Science.gov (United States)

    Liu, Zhandong; Venkatesh, Santosh S; Maley, Carlo C

    2008-01-01

    Background Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while 98% to < 2% of possible oligomers in D. melanogaster (12–17 bp), C. elegans (11–17 bp), A. thaliana (11–17 bp), S. cerevisiae (10–16 bp) and E. coli (9–15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to

  20. Proceedings of the relevance of mass spectrometry to DNA sequence determination: Research needs for the Human Genome Program

    Energy Technology Data Exchange (ETDEWEB)

    Edmonds, C.G.; Smith, R.D. (Pacific Northwest Lab., Richland, WA (USA)); Smith, L.M. (Wisconsin Univ., Madison, WI (USA))

    1990-11-01

    A workshop was sponsored for the US Department of Energy (DOE), Office of Health and Environmental Research by Pacific Northwest Laboratory, April 4--5, 1990, in Seattle, Washington, to examine the potential role of mass spectrometry in the joint DOE/National Institutes of Health (NIH) Human Genome Program. The workshop was occasioned by recent developments in mass spectrometry that are providing new levels for selectivity, sensitivity, and, in particular, new methods of ionization appropriate for large biopolymers such as DNA. During discussions, three general mass spectrometric approaches to the determination of DNA sequence were considered: (1) the mass spectrometric detection of isotopic labels from DNA sequencing mixtures separated using gel electrophoresis, (2) the direct mass spectrometric analysis from direct ionization of unfractionated sequencing mixtures where the measured mass of the constituents functions to identify and order the base sequence (replacing separation by gel electrophoresis), and (3) an approach in which a single highly charged molecular ion of a large DNA segment produced is rapidly sequenced in an ion cyclotron resonance ion trap. The consensus of the workshop was that, on the basis of the new developments, mass spectrometry has the potential to provide the substantial increases in sequencing speed required for the Human Genome Program. 66 refs., 3 tabs.

  1. Inspecting Targeted Deep Sequencing of Whole Genome Amplified DNA Versus Fresh DNA for Somatic Mutation Detection: A Genetic Study in Myelodysplastic Syndrome Patients.

    Science.gov (United States)

    Palomo, Laura; Fuster-Tormo, Francisco; Alvira, Daniel; Ademà, Vera; Armengol, María Pilar; Gómez-Marzo, Paula; de Haro, Nuri; Mallo, Mar; Xicoy, Blanca; Zamora, Lurdes; Solé, Francesc

    2017-08-01

    Whole genome amplification (WGA) has become an invaluable method for preserving limited samples of precious stock material and has been used during the past years as an alternative tool to increase the amount of DNA before library preparation for next-generation sequencing. Myelodysplastic syndromes (MDS) are a group of clonal hematopoietic stem cell disorders characterized by presenting somatic mutations in several myeloid-related genes. In this work, targeted deep sequencing has been performed on four paired fresh DNA and WGA DNA samples from bone marrow of MDS patients, to assess the feasibility of using WGA DNA for detecting somatic mutations. The results of this study highlighted that, in general, the sequencing and alignment statistics of fresh DNA and WGA DNA samples were similar. However, after variant calling and when considering variants detected at all frequencies, there was a high level of discordance between fresh DNA and WGA DNA (overall, a higher number of variants was detected in WGA DNA). After proper filtering, a total of three somatic mutations were detected in the cohort. All somatic mutations detected in fresh DNA were also identified in WGA DNA and validated by whole exome sequencing.

  2. Norgal: Extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

    DEFF Research Database (Denmark)

    Al-Nakeeb, Kosai Ali Ahmed; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas

    2017-01-01

    and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences...

  3. Whole-genome amplified DNA from stored dried blood spots is reliable in high resolution melting curve and sequencing analysis

    DEFF Research Database (Denmark)

    Winkel, Bo G; Hollegaard, Mads V; Olesen, Morten S

    2011-01-01

    BACKGROUND: The use of dried blood spots (DBS) samples in genomic workup has been limited by the relative low amounts of genomic DNA (gDNA) they contain. It remains to be proven that whole genome amplified DNA (wgaDNA) from stored DBS samples, constitutes a reliable alternative to gDNA.We wanted...

  4. Appraising the relevance of DNA copy number loss and gain in prostate cancer using whole genome DNA sequence data.

    Directory of Open Access Journals (Sweden)

    Niedzica Camacho

    2017-09-01

    Full Text Available A variety of models have been proposed to explain regions of recurrent somatic copy number alteration (SCNA in human cancer. Our study employs Whole Genome DNA Sequence (WGS data from tumor samples (n = 103 to comprehensively assess the role of the Knudson two hit genetic model in SCNA generation in prostate cancer. 64 recurrent regions of loss and gain were detected, of which 28 were novel, including regions of loss with more than 15% frequency at Chr4p15.2-p15.1 (15.53%, Chr6q27 (16.50% and Chr18q12.3 (17.48%. Comprehensive mutation screens of genes, lincRNA encoding sequences, control regions and conserved domains within SCNAs demonstrated that a two-hit genetic model was supported in only a minor proportion of recurrent SCNA losses examined (15/40. We found that recurrent breakpoints and regions of inversion often occur within Knudson model SCNAs, leading to the identification of ZNF292 as a target gene for the deletion at 6q14.3-q15 and NKX3.1 as a two-hit target at 8p21.3-p21.2. The importance of alterations of lincRNA sequences was illustrated by the identification of a novel mutational hotspot at the KCCAT42, FENDRR, CAT1886 and STCAT2 loci at the 16q23.1-q24.3 loss. Our data confirm that the burden of SCNAs is predictive of biochemical recurrence, define nine individual regions that are associated with relapse, and highlight the possible importance of ion channel and G-protein coupled-receptor (GPCR pathways in cancer development. We concluded that a two-hit genetic model accounts for about one third of SCNA indicating that mechanisms, such haploinsufficiency and epigenetic inactivation, account for the remaining SCNA losses.

  5. Detecting DNA double-stranded breaks in mammalian genomes by linear amplification-mediated high-throughput genome-wide translocation sequencing.

    Science.gov (United States)

    Hu, Jiazhi; Meyers, Robin M; Dong, Junchao; Panchakshari, Rohit A; Alt, Frederick W; Frock, Richard L

    2016-05-01

    Unbiased, high-throughput assays for detecting and quantifying DNA double-stranded breaks (DSBs) across the genome in mammalian cells will facilitate basic studies of the mechanisms that generate and repair endogenous DSBs. They will also enable more applied studies, such as those to evaluate the on- and off-target activities of engineered nucleases. Here we describe a linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) method for the detection of genome-wide 'prey' DSBs via their translocation in cultured mammalian cells to a fixed 'bait' DSB. Bait-prey junctions are cloned directly from isolated genomic DNA using LAM-PCR and unidirectionally ligated to bridge adapters; subsequent PCR steps amplify the single-stranded DNA junction library in preparation for Illumina Miseq paired-end sequencing. A custom bioinformatics pipeline identifies prey sequences that contribute to junctions and maps them across the genome. LAM-HTGTS differs from related approaches because it detects a wide range of broken end structures with nucleotide-level resolution. Familiarity with nucleic acid methods and next-generation sequencing analysis is necessary for library generation and data interpretation. LAM-HTGTS assays are sensitive, reproducible, relatively inexpensive, scalable and straightforward to implement with a turnaround time of <1 week.

  6. Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq).

    Science.gov (United States)

    Langley, Alexander R; Gräf, Stefan; Smith, James C; Krude, Torsten

    2016-12-01

    Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Directory of Open Access Journals (Sweden)

    Sarah M Hykin

    Full Text Available For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles, attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp. We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens

  8. Fixing Formalin: A Method to Recover Genomic-Scale DNA Sequence Data from Formalin-Fixed Museum Specimens Using High-Throughput Sequencing.

    Science.gov (United States)

    Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A

    2015-01-01

    For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for

  9. Complete sequence analysis of 18S rDNA based on genomic DNA extraction from individual Demodex mites (Acari: Demodicidae).

    Science.gov (United States)

    Zhao, Ya-E; Xu, Ji-Ru; Hu, Li; Wu, Li-Ping; Wang, Zheng-Hang

    2012-05-01

    The study for the first time attempted to accomplish 18S ribosomal DNA (rDNA) complete sequence amplification and analysis for three Demodex species (Demodex folliculorum, Demodex brevis and Demodex canis) based on gDNA extraction from individual mites. The mites were treated by DNA Release Additive and Hot Start II DNA Polymerase so as to promote mite disruption and increase PCR specificity. Determination of D. folliculorum gDNA showed that the gDNA yield reached the highest at 1 mite, tending to descend with the increase of mite number. The individual mite gDNA was successfully used for 18S rDNA fragment (about 900 bp) amplification examination. The alignments of 18S rDNA complete sequences of individual mite samples and those of pooled mite samples ( ≥ 1000mites/sample) showed over 97% identities for each species, indicating that the gDNA extracted from a single individual mite was as satisfactory as that from pooled mites for PCR amplification. Further pairwise sequence analyses showed that average divergence, genetic distance, transition/transversion or phylogenetic tree could not effectively identify the three Demodex species, largely due to the differentiation in the D. canis isolates. It can be concluded that the individual Demodex mite gDNA can satisfy the molecular study of Demodex. 18S rDNA complete sequence is suitable for interfamily identification in Cheyletoidea, but whether it is suitable for intrafamily identification cannot be confirmed until the ascertainment of the types of Demodex mites parasitizing in dogs. Copyright © 2012 Elsevier Inc. All rights reserved.

  10. Next-Generation Sequencing Reveals the Impact of Repetitive DNA Across Phylogenetically Closely Related Genomes of Orobanchaceae

    Science.gov (United States)

    Piednoël, Mathieu; Aberer, Andre J.; Schneeweiss, Gerald M.; Macas, Jiri; Novak, Petr; Gundlach, Heidrun; Temsch, Eva M.; Renner, Susanne S.

    2013-01-01

    We used next-generation sequencing to characterize the genomes of nine species of Orobanchaceae of known phylogenetic relationships, different life forms, and including a polyploid species. The study species are the autotrophic, nonparasitic Lindenbergia philippensis, the hemiparasitic Schwalbea americana, and seven nonphotosynthetic parasitic species of Orobanche (Orobanche crenata, Orobanche cumana, Orobanche gracilis (tetraploid), and Orobanche pancicii) and Phelipanche (Phelipanche lavandulacea, Phelipanche purpurea, and Phelipanche ramosa). Ty3/Gypsy elements comprise 1.93%–28.34% of the nine genomes and Ty1/Copia elements comprise 8.09%–22.83%. When compared with L. philippensis and S. americana, the nonphotosynthetic species contain higher proportions of repetitive DNA sequences, perhaps reflecting relaxed selection on genome size in parasitic organisms. Among the parasitic species, those in the genus Orobanche have smaller genomes but higher proportions of repetitive DNA than those in Phelipanche, mostly due to a diversification of repeats and an accumulation of Ty3/Gypsy elements. Genome downsizing in the tetraploid O. gracilis probably led to sequence loss across most repeat types. PMID:22723303

  11. Genomic Analysis of a Marine Bacterium: Bioinformatics for Comparison, Evaluation, and Interpretation of DNA Sequences

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-01-01

    Full Text Available A total of five highly related strains of an unidentified marine bacterium were analyzed through their short genome sequences (AM260709–AM260713. Genome-to-Genome Distance (GGDC showed high similarity to Pseudoalteromonas haloplanktis (X67024. The generated unique Quick Response (QR codes indicated no identity to other microbial species or gene sequences. Chaos Game Representation (CGR showed the number of bases concentrated in the area. Guanine residues were highest in number followed by cytosine. Frequency of Chaos Game Representation (FCGR indicated that CC and GG blocks have higher frequency in the sequence from the evaluated marine bacterium strains. Maximum GC content for the marine bacterium strains ranged 53-54%. The use of QR codes, CGR, FCGR, and GC dataset helped in identifying and interpreting short genome sequences from specific isolates. A phylogenetic tree was constructed with the bootstrap test (1000 replicates using MEGA6 software. Principal Component Analysis (PCA was carried out using EMBL-EBI MUSCLE program. Thus, generated genomic data are of great assistance for hierarchical classification in Bacterial Systematics which combined with phenotypic features represents a basic procedure for a polyphasic approach on unambiguous bacterial isolate taxonomic classification.

  12. Detection of short repeated genomic sequences on metaphase chromosomes using padlock probes and target primed rolling circle DNA synthesis

    Directory of Open Access Journals (Sweden)

    Stougaard Magnus

    2007-11-01

    Full Text Available Abstract Background In situ detection of short sequence elements in genomic DNA requires short probes with high molecular resolution and powerful specific signal amplification. Padlock probes can differentiate single base variations. Ligated padlock probes can be amplified in situ by rolling circle DNA synthesis and detected by fluorescence microscopy, thus enhancing PRINS type reactions, where localized DNA synthesis reports on the position of hybridization targets, to potentially reveal the binding of single oligonucleotide-size probe molecules. Such a system has been presented for the detection of mitochondrial DNA in fixed cells, whereas attempts to apply rolling circle detection to metaphase chromosomes have previously failed, according to the literature. Methods Synchronized cultured cells were fixed with methanol/acetic acid to prepare chromosome spreads in teflon-coated diagnostic well-slides. Apart from the slide format and the chromosome spreading everything was done essentially according to standard protocols. Hybridization targets were detected in situ with padlock probes, which were ligated and amplified using target primed rolling circle DNA synthesis, and detected by fluorescence labeling. Results An optimized protocol for the spreading of condensed metaphase chromosomes in teflon-coated diagnostic well-slides was developed. Applying this protocol we generated specimens for target primed rolling circle DNA synthesis of padlock probes recognizing a 40 nucleotide sequence in the male specific repetitive satellite I sequence (DYZ1 on the Y-chromosome and a 32 nucleotide sequence in the repetitive kringle IV domain in the apolipoprotein(a gene positioned on the long arm of chromosome 6. These targets were detected with good efficiency, but the efficiency on other target sites was unsatisfactory. Conclusion Our aim was to test the applicability of the method used on mitochondrial DNA to the analysis of nuclear genomes, in particular as

  13. Human Genome Research: Decoding DNA

    Science.gov (United States)

    dropdown arrow Site Map A-Z Index Menu Synopsis Human Genome Research: Decoding DNA Resources with of the DNA double helix during April 2003. James D. Watson, Francis Crick, and Maurice Wilkins were company Celera announced the completion of a "working draft" reference DNA sequence of the human

  14. Characterization of the dsDNA prophage sequences in the genome of Neisseria gonorrhoeae and visualization of productive bacteriophage

    Directory of Open Access Journals (Sweden)

    Maugel Timothy K

    2007-07-01

    Full Text Available Abstract Background Bioinformatic analysis of the genome sequence of Neisseria gonorrhoeae revealed the presence of nine probable prophage islands. The distribution, conservation and function of many of these sequences, and their ability to produce bacteriophage particles are unknown. Results Our analysis of the genomic sequence of FA1090 identified five genomic regions (NgoΦ1 – 5 that are related to dsDNA lysogenic phage. The genetic content of the dsDNA prophage sequences were examined in detail and found to contain blocks of genes encoding for proteins homologous to proteins responsible for phage DNA replication, structural proteins and proteins responsible for phage assembly. The DNA sequences from NgoΦ1, NgoΦ2 and NgoΦ3 contain some significant regions of identity. A unique region of NgoΦ2 showed very high similarity with the Pseudomonas aeruginosa generalized transducing phage F116. Comparative analysis at the nucleotide and protein levels suggests that the sequences of NgoΦ1 and NgoΦ2 encode functionally active phages, while NgoΦ3, NgoΦ4 and NgoΦ5 encode incomplete genomes. Expression of the NgoΦ1 and NgoΦ2 repressors in Escherichia coli inhibit the growth of E. coli and the propagation of phage λ. The NgoΦ2 repressor was able to inhibit transcription of N. gonorrhoeae genes and Haemophilus influenzae HP1 phage promoters. The holin gene of NgoΦ1 (identical to that encoded by NgoΦ2, when expressed in E. coli, could serve as substitute for the phage λ s gene. We were able to detect the presence of the DNA derived from NgoΦ1 in the cultures of N. gonorrhoeae. Electron microscopy analysis of culture supernatants revealed the presence of multiple forms of bacteriophage particles. Conclusion These data suggest that the genes similar to dsDNA lysogenic phage present in the gonococcus are generally conserved in this pathogen and that they are able to regulate the expression of other neisserial genes. Since phage particles were

  15. Genomic organization and developmental fate of adjacent repeated sequences in a foldback DNA clone of Tetrahymena thermophila

    International Nuclear Information System (INIS)

    Tschunko, A.H.; Loechel, R.H.; McLaren, N.C.; Allen, S.L.

    1987-01-01

    DNA sequence elimination and rearrangement occurs during the development of somatic cell lineages of eukaryotes and was first discovered over a century ago. However, the significance and mechanism of chromatin elimination are not understood. DNA elimination also occurs during the development of the somatic macronucleus from the germinal micronucleus in unicellular ciliated protozoa such as Tetrahymena thermophila. In this study foldback DNA from the micronucleus was used as a probe to isolate ten clones. All of those tested (4/4) contained sequences that were repetitive in the micronucleus and rearranged in the macronucleus. Inverted repeated sequences were present in one clone. This clone, pTtFBl, was subjected to a detailed analysis of its developmental fate. Subregions were subcloned and used as probes against Southern blots of micronuclear and macronuclear DNA. DNA was labeled with [ 33 P]-labeled dATP. The authors found that all subregions defined repeated sequence families in the micronuclear genome. A minimum of four different families was defined, two of which are retained in the macronucleus and two of which are completely eliminated. The inverted repeat family is retained with little rearrangement. Two of the families, defined by subregions that do not contain parts of the inverted repeat are totally eliminated during macronuclear development-and contain open reading frames. The significance of retained inverted repeats to the process of elimination is discussed

  16. Genome-wide DNA methylation maps in follicular lymphoma cells determined by methylation-enriched bisulfite sequencing.

    Directory of Open Access Journals (Sweden)

    Jeong-Hyeon Choi

    Full Text Available BACKGROUND: Follicular lymphoma (FL is a form of non-Hodgkin's lymphoma (NHL that arises from germinal center (GC B-cells. Despite the significant advances in immunotherapy, FL is still not curable. Beyond transcriptional profiling and genomics datasets, there currently is no epigenome-scale dataset or integrative biology approach that can adequately model this disease and therefore identify novel mechanisms and targets for successful prevention and treatment of FL. METHODOLOGY/PRINCIPAL FINDINGS: We performed methylation-enriched genome-wide bisulfite sequencing of FL cells and normal CD19(+ B-cells using 454 sequencing technology. The methylated DNA fragments were enriched with methyl-binding proteins, treated with bisulfite, and sequenced using the Roche-454 GS FLX sequencer. The total number of bases covered in the human genome was 18.2 and 49.3 million including 726,003 and 1.3 million CpGs in FL and CD19(+ B-cells, respectively. 11,971 and 7,882 methylated regions of interest (MRIs were identified respectively. The genome-wide distribution of these MRIs displayed significant differences between FL and normal B-cells. A reverse trend in the distribution of MRIs between the promoter and the gene body was observed in FL and CD19(+ B-cells. The MRIs identified in FL cells also correlated well with transcriptomic data and ChIP-on-Chip analyses of genome-wide histone modifications such as tri-methyl-H3K27, and tri-methyl-H3K4, indicating a concerted epigenetic alteration in FL cells. CONCLUSIONS/SIGNIFICANCE: This study is the first to provide a large scale and comprehensive analysis of the DNA methylation sequence composition and distribution in the FL epigenome. These integrated approaches have led to the discovery of novel and frequent targets of aberrant epigenetic alterations. The genome-wide bisulfite sequencing approach developed here can be a useful tool for profiling DNA methylation in clinical samples.

  17. Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains

    Directory of Open Access Journals (Sweden)

    Bharti Arvind K

    2008-12-01

    Full Text Available Abstract Background Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR and methylation spanning linker libraries (MSLL. These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends. Results A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the SalI MSLL libraries being the most highly enriched (31% align to an EST contig, while the HMPR clones exhibited exceptional depletion of repetitive DNA (to ~11%. These two techniques were compared with other gene-enrichment methods, and shown to be complementary. Conclusion MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of

  18. Repair of oxidative DNA base damage in the host genome influences the HIV integration site sequence preference.

    Directory of Open Access Journals (Sweden)

    Geoffrey R Bennett

    Full Text Available Host base excision repair (BER proteins that repair oxidative damage enhance HIV infection. These proteins include the oxidative DNA damage glycosylases 8-oxo-guanine DNA glycosylase (OGG1 and mutY homolog (MYH as well as DNA polymerase beta (Polβ. While deletion of oxidative BER genes leads to decreased HIV infection and integration efficiency, the mechanism remains unknown. One hypothesis is that BER proteins repair the DNA gapped integration intermediate. An alternative hypothesis considers that the most common oxidative DNA base damages occur on guanines. The subtle consensus sequence preference at HIV integration sites includes multiple G:C base pairs surrounding the points of joining. These observations suggest a role for oxidative BER during integration targeting at the nucleotide level. We examined the hypothesis that BER repairs a gapped integration intermediate by measuring HIV infection efficiency in Polβ null cell lines complemented with active site point mutants of Polβ. A DNA synthesis defective mutant, but not a 5'dRP lyase mutant, rescued HIV infection efficiency to wild type levels; this suggested Polβ DNA synthesis activity is not necessary while 5'dRP lyase activity is required for efficient HIV infection. An alternate hypothesis that BER events in the host genome influence HIV integration site selection was examined by sequencing integration sites in OGG1 and MYH null cells. In the absence of these 8-oxo-guanine specific glycosylases the chromatin elements of HIV integration site selection remain the same as in wild type cells. However, the HIV integration site sequence preference at G:C base pairs is altered at several positions in OGG1 and MYH null cells. Inefficient HIV infection in the absence of oxidative BER proteins does not appear related to repair of the gapped integration intermediate; instead oxidative damage repair may participate in HIV integration site preference at the sequence level.

  19. Complete DNA sequences of the plastid genomes of two parasitic flowering plant species, Cuscuta reflexa and Cuscuta gronovii.

    Science.gov (United States)

    Funk, Helena T; Berg, Sabine; Krupinska, Karin; Maier, Uwe G; Krause, Kirsten

    2007-08-22

    The holoparasitic plant genus Cuscuta comprises species with photosynthetic capacity and functional chloroplasts as well as achlorophyllous and intermediate forms with restricted photosynthetic activity and degenerated chloroplasts. Previous data indicated significant differences with respect to the plastid genome coding capacity in different Cuscuta species that could correlate with their photosynthetic activity. In order to shed light on the molecular changes accompanying the parasitic lifestyle, we sequenced the plastid chromosomes of the two species Cuscuta reflexa and Cuscuta gronovii. Both species are capable of performing photosynthesis, albeit with varying efficiencies. Together with the plastid genome of Epifagus virginiana, an achlorophyllous parasitic plant whose plastid genome has been sequenced, these species represent a series of progression towards total dependency on the host plant, ranging from reduced levels of photosynthesis in C. reflexa to a restricted photosynthetic activity and degenerated chloroplasts in C. gronovii to an achlorophyllous state in E. virginiana. The newly sequenced plastid genomes of C. reflexa and C. gronovii reveal that the chromosome structures are generally very similar to that of non-parasitic plants, although a number of species-specific insertions, deletions (indels) and sequence inversions were identified. However, we observed a gradual adaptation of the plastid genome to the different degrees of parasitism. The changes are particularly evident in C. gronovii and include (a) the parallel losses of genes for the subunits of the plastid-encoded RNA polymerase and the corresponding promoters from the plastid genome, (b) the first documented loss of the gene for a putative splicing factor, MatK, from the plastid genome and (c) a significant reduction of RNA editing. Overall, the comparative genomic analysis of plastid DNA from parasitic plants indicates a bias towards a simplification of the plastid gene expression

  20. Complete DNA sequences of the plastid genomes of two parasitic flowering plant species, Cuscuta reflexa and Cuscuta gronovii

    Directory of Open Access Journals (Sweden)

    Maier Uwe G

    2007-08-01

    Full Text Available Abstract Background The holoparasitic plant genus Cuscuta comprises species with photosynthetic capacity and functional chloroplasts as well as achlorophyllous and intermediate forms with restricted photosynthetic activity and degenerated chloroplasts. Previous data indicated significant differences with respect to the plastid genome coding capacity in different Cuscuta species that could correlate with their photosynthetic activity. In order to shed light on the molecular changes accompanying the parasitic lifestyle, we sequenced the plastid chromosomes of the two species Cuscuta reflexa and Cuscuta gronovii. Both species are capable of performing photosynthesis, albeit with varying efficiencies. Together with the plastid genome of Epifagus virginiana, an achlorophyllous parasitic plant whose plastid genome has been sequenced, these species represent a series of progression towards total dependency on the host plant, ranging from reduced levels of photosynthesis in C. reflexa to a restricted photosynthetic activity and degenerated chloroplasts in C. gronovii to an achlorophyllous state in E. virginiana. Results The newly sequenced plastid genomes of C. reflexa and C. gronovii reveal that the chromosome structures are generally very similar to that of non-parasitic plants, although a number of species-specific insertions, deletions (indels and sequence inversions were identified. However, we observed a gradual adaptation of the plastid genome to the different degrees of parasitism. The changes are particularly evident in C. gronovii and include (a the parallel losses of genes for the subunits of the plastid-encoded RNA polymerase and the corresponding promoters from the plastid genome, (b the first documented loss of the gene for a putative splicing factor, MatK, from the plastid genome and (c a significant reduction of RNA editing. Conclusion Overall, the comparative genomic analysis of plastid DNA from parasitic plants indicates a bias towards

  1. Toward allotetraploid cotton genome assembly: integration of a high-density molecular genetic linkage map with DNA sequence information

    Science.gov (United States)

    2012-01-01

    Background Cotton is the world’s most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. Results In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. Conclusion This study will serve as a valuable genomic resource

  2. Alternative splicing of human elastin mRNA indicated by sequence analysis of cloned genomic and complementary DNA

    International Nuclear Information System (INIS)

    Indik, Z.; Yeh, H.; Ornstein-goldstein, N.; Sheppard, P.; Anderson, N.; Rosenbloom, J.C.; Peltonen, L.; Rosenbloom, J.

    1987-01-01

    Poly(A) + RNA, isolated from a single 7-mo fetal human aorta, was used to synthesize cDNA by the RNase H method, and the cDNA was inserted into λgt10. Recombinant phage containing elastin sequences were identified by hybridization with cloned, exon-containing fragments of the human elastin gene. Three clones containing inserts of 3.3, 2.7, and 2.3 kilobases were selected for further analysis. Three overlapping clones containing 17.8 kilobases of the human elastin gene were also isolated from genomic libraries. Complete sequence analysis of the six clones demonstrated that: (i) the cDNA encompassed the entire translated portion of the mRNA encoding 786 amino acids, including several unusual hydrophilic amino acid sequences not previously identified in porcine tropoelastin, (ii) exons encoding either hydrophobic or crosslinking domains in the protein alternated in the gene, and (iii) a great abundance of Alu repetitive sequences occurred throughout the introns. The data also indicated substantial alternative splicing of the mRNA. These results suggest the potential for significant variation in the precise molecular structure of the elastic fiber in the human population

  3. Interspecies hybridization on DNA resequencing microarrays: efficiency of sequence recovery and accuracy of SNP detection in human, ape, and codfish mitochondrial DNA genomes sequenced on a human-specific MitoChip

    Directory of Open Access Journals (Sweden)

    Carr Steven M

    2007-09-01

    Full Text Available Abstract Background Iterative DNA "resequencing" on oligonucleotide microarrays offers a high-throughput method to measure intraspecific biodiversity, one that is especially suited to SNP-dense gene regions such as vertebrate mitochondrial (mtDNA genomes. However, costs of single-species design and microarray fabrication are prohibitive. A cost-effective, multi-species strategy is to hybridize experimental DNAs from diverse species to a common microarray that is tiled with oligonucleotide sets from multiple, homologous reference genomes. Such a strategy requires that cross-hybridization between the experimental DNAs and reference oligos from the different species not interfere with the accurate recovery of species-specific data. To determine the pattern and limits of such interspecific hybridization, we compared the efficiency of sequence recovery and accuracy of SNP identification by a 15,452-base human-specific microarray challenged with human, chimpanzee, gorilla, and codfish mtDNA genomes. Results In the human genome, 99.67% of the sequence was recovered with 100.0% accuracy. Accuracy of SNP identification declines log-linearly with sequence divergence from the reference, from 0.067 to 0.247 errors per SNP in the chimpanzee and gorilla genomes, respectively. Efficiency of sequence recovery declines with the increase of the number of interspecific SNPs in the 25b interval tiled by the reference oligonucleotides. In the gorilla genome, which differs from the human reference by 10%, and in which 46% of these 25b regions contain 3 or more SNP differences from the reference, only 88% of the sequence is recoverable. In the codfish genome, which differs from the reference by > 30%, less than 4% of the sequence is recoverable, in short islands ≥ 12b that are conserved between primates and fish. Conclusion Experimental DNAs bind inefficiently to homologous reference oligonucleotide sets on a re-sequencing microarray when their sequences differ by

  4. K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features

    Directory of Open Access Journals (Sweden)

    Aaron Sievers

    2017-04-01

    Full Text Available In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4 on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs, which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST and the

  5. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Directory of Open Access Journals (Sweden)

    Mark R Wilson

    Full Text Available Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS to Salmonella subspecies enterica serotype Tennessee (S. Tennessee to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana, which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs, suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts

  6. Novel genomes and genome constitutions identified by GISH and 5S rDNA and knotted1 genomic sequences in the genus Setaria.

    Science.gov (United States)

    Zhao, Meicheng; Zhi, Hui; Doust, Andrew N; Li, Wei; Wang, Yongfang; Li, Haiquan; Jia, Guanqing; Wang, Yongqiang; Zhang, Ning; Diao, Xianmin

    2013-04-11

    The Setaria genus is increasingly of interest to researchers, as its two species, S. viridis and S. italica, are being developed as models for understanding C4 photosynthesis and plant functional genomics. The genome constitution of Setaria species has been studied in the diploid species S. viridis, S. adhaerans and S. grisebachii, where three genomes A, B and C were identified respectively. Two allotetraploid species, S. verticillata and S. faberi, were found to have AABB genomes, and one autotetraploid species, S. queenslandica, with an AAAA genome, has also been identified. The genomes and genome constitutions of most other species remain unknown, even though it was thought there are approximately 125 species in the genus distributed world-wide. GISH was performed to detect the genome constitutions of Eurasia species of S. glauca, S. plicata, and S. arenaria, with the known A, B and C genomes as probes. No or very poor hybridization signal was detected indicating that their genomes are different from those already described. GISH was also performed reciprocally between S. glauca, S. plicata, and S. arenaria genomes, but no hybridization signals between each other were found. The two sets of chromosomes of S. lachnea both hybridized strong signals with only the known C genome of S. grisebachii. Chromosomes of Qing 9, an accession formerly considered as S. viridis, hybridized strong signal only to B genome of S. adherans. Phylogenetic trees constructed with 5S rDNA and knotted1 markers, clearly classify the samples in this study into six clusters, matching the GISH results, and suggesting that the F genome of S. arenaria is basal in the genus. Three novel genomes in the Setaria genus were identified and designated as genome D (S. glauca), E (S. plicata) and F (S. arenaria) respectively. The genome constitution of tetraploid S. lachnea is putatively CCC'C'. Qing 9 is a B genome species indigenous to China and is hypothesized to be a newly identified species. The

  7. Characterization of new Schistosoma mansoni microsatellite loci in sequences obtained from public DNA databases and microsatellite enriched genomic libraries

    Directory of Open Access Journals (Sweden)

    Rodrigues NB

    2002-01-01

    Full Text Available In the last decade microsatellites have become one of the most useful genetic markers used in a large number of organisms due to their abundance and high level of polymorphism. Microsatellites have been used for individual identification, paternity tests, forensic studies and population genetics. Data on microsatellite abundance comes preferentially from microsatellite enriched libraries and DNA sequence databases. We have conducted a search in GenBank of more than 16,000 Schistosoma mansoni ESTs and 42,000 BAC sequences. In addition, we obtained 300 sequences from CA and AT microsatellite enriched genomic libraries. The sequences were searched for simple repeats using the RepeatMasker software. Of 16,022 ESTs, we detected 481 (3% sequences that contained 622 microsatellites (434 perfect, 164 imperfect and 24 compounds. Of the 481 ESTs, 194 were grouped in 63 clusters containing 2 to 15 ESTs per cluster. Polymorphisms were observed in 16 clusters. The 287 remaining ESTs were orphan sequences. Of the 42,017 BAC end sequences, 1,598 (3.8% contained microsatellites (2,335 perfect, 287 imperfect and 79 compounds. The 1,598 BAC end sequences 80 were grouped into 17 clusters containing 3 to 17 BAC end sequences per cluster. Microsatellites were present in 67 out of 300 sequences from microsatellite enriched libraries (55 perfect, 38 imperfect and 15 compounds. From all of the observed loci 55 were selected for having the longest perfect repeats and flanking regions that allowed the design of primers for PCR amplification. Additionally we describe two new polymorphic microsatellite loci.

  8. DNA Sequencing by Capillary Electrophoresis

    Science.gov (United States)

    Karger, Barry L.; Guttman, Andras

    2009-01-01

    Sequencing of human and other genomes has been at the center of interest in the biomedical field over the past several decades and is now leading toward an era of personalized medicine. During this time, DNA sequencing methods have evolved from the labor intensive slab gel electrophoresis, through automated multicapillary electrophoresis systems using fluorophore labeling with multispectral imaging, to the “next generation” technologies of cyclic array, hybridization based, nanopore and single molecule sequencing. Deciphering the genetic blueprint and follow-up confirmatory sequencing of Homo sapiens and other genomes was only possible by the advent of modern sequencing technologies that was a result of step by step advances with a contribution of academics, medical personnel and instrument companies. While next generation sequencing is moving ahead at break-neck speed, the multicapillary electrophoretic systems played an essential role in the sequencing of the Human Genome, the foundation of the field of genomics. In this prospective, we wish to overview the role of capillary electrophoresis in DNA sequencing based in part of several of our articles in this journal. PMID:19517496

  9. SigWin-detector: a Grid-enabled workflow for discovering enriched windows of genomic features related to DNA sequences.

    Science.gov (United States)

    Inda, Márcia A; van Batenburg, Marinus F; Roos, Marco; Belloum, Adam S Z; Vasunin, Dmitry; Wibisono, Adianto; van Kampen, Antoine H C; Breit, Timo M

    2008-08-08

    Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To perform in silico experimentation conveniently with this genomics data, biologists need tools to process and compare datasets routinely and explore the obtained results interactively. The complexity of such experimentation requires these tools to be based on an e-Science approach, hence generic, modular, and reusable. A virtual laboratory environment with workflows, workflow management systems, and Grid computation are therefore essential. Here we apply an e-Science approach to develop SigWin-detector, a workflow-based tool that can detect significantly enriched windows of (genomic) features in a (DNA) sequence in a fast and reproducible way. For proof-of-principle, we utilize a biological use case to detect regions of increased and decreased gene expression (RIDGEs and anti-RIDGEs) in human transcriptome maps. We improved the original method for RIDGE detection by replacing the costly step of estimation by random sampling with a faster analytical formula for computing the distribution of the null hypothesis being tested and by developing a new algorithm for computing moving medians. SigWin-detector was developed using the WS-VLAM workflow management system and consists of several reusable modules that are linked together in a basic workflow. The configuration of this basic workflow can be adapted to satisfy the requirements of the specific in silico experiment. As we show with the results from analyses in the biological use case on RIDGEs, SigWin-detector is an efficient and reusable Grid-based tool for discovering windows enriched for features of a particular type in any sequence of values. Thus, SigWin-detector provides the proof-of-principle for the modular e-Science based concept

  10. Performance Evaluation of NIPT in Detection of Chromosomal Copy Number Variants Using Low-Coverage Whole-Genome Sequencing of Plasma DNA

    DEFF Research Database (Denmark)

    Liu, Hongtai; Gao, Ya; Hu, Zhiyang

    2016-01-01

    , including 33 CNVs samples and 886 normal samples from September 1, 2011 to May 31, 2013, were enrolled in this study. The samples were randomly rearranged and blindly sequenced by low-coverage (about 7M reads) whole-genome sequencing of plasma DNA. Fetal CNVs were detected by Fetal Copy-number Analysis...

  11. DNA sequencing conference, 2

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, R.M. [Georgetown Univ., Kennedy Inst. of Ethics, Washington, DC (United States); Venter, J.C. [National Inst. of Neurological Disorders and Strokes, Bethesda, MD (United States); Gilbert, W. [Harvard Univ., Cambridge, MA (United States); Mulligan, J. [Stanford Univ., CA (United States); Mansfield, B.K. [Oak Ridge National Lab., TN (United States)

    1991-06-19

    This conference focused on DNA sequencing, genetic linkage mapping, physical mapping, informatics and bioethics. Several were used to study this sequencing and mapping. This article also discusses computer hardware and software aiding in the mapping of genes.

  12. Reconstruction of putative DNA virus from endogenous rice tungro bacilliform virus-like sequences in the rice genome: implications for integration and evolution

    Directory of Open Access Journals (Sweden)

    Kishima Yuji

    2004-10-01

    Full Text Available Abstract Background Plant genomes contain various kinds of repetitive sequences such as transposable elements, microsatellites, tandem repeats and virus-like sequences. Most of them, with the exception of virus-like sequences, do not allow us to trace their origins nor to follow the process of their integration into the host genome. Recent discoveries of virus-like sequences in plant genomes led us to set the objective of elucidating the origin of the repetitive sequences. Endogenous rice tungro bacilliform virus (RTBV-like sequences (ERTBVs have been found throughout the rice genome. Here, we reconstructed putative virus structures from RTBV-like sequences in the rice genome and characterized to understand evolutionary implication, integration manner and involvements of endogenous virus segments in the corresponding disease response. Results We have collected ERTBVs from the rice genomes. They contain rearranged structures and no intact ORFs. The identified ERTBV segments were shown to be phylogenetically divided into three clusters. For each phylogenetic cluster, we were able to make a consensus alignment for a circular virus-like structure carrying two complete ORFs. Comparisons of DNA and amino acid sequences suggested the closely relationship between ERTBV and RTBV. The Oryza AA-genome species vary in the ERTBV copy number. The species carrying low-copy-number of ERTBV segments have been reported to be extremely susceptible to RTBV. The DNA methylation state of the ERTBV sequences was correlated with their copy number in the genome. Conclusions These ERTBV segments are unlikely to have functional potential as a virus. However, these sequences facilitate to establish putative virus that provided information underlying virus integration and evolutionary relationship with existing virus. Comparison of ERTBV among the Oryza AA-genome species allowed us to speculate a possible role of endogenous virus segments against its related disease.

  13. Next generation DNA sequencing technology delivers valuable genetic markers for the genomic orphan legume species, Bituminaria bituminosa

    Directory of Open Access Journals (Sweden)

    Pazos-Navarro María

    2011-12-01

    Full Text Available Abstract Background Bituminaria bituminosa is a perennial legume species from the Canary Islands and Mediterranean region that has potential as a drought-tolerant pasture species and as a source of pharmaceutical compounds. Three botanical varieties have previously been identified in this species: albomarginata, bituminosa and crassiuscula. B. bituminosa can be considered a genomic 'orphan' species with very few genomic resources available. New DNA sequencing technologies provide an opportunity to develop high quality molecular markers for such orphan species. Results 432,306 mRNA molecules were sampled from a leaf transcriptome of a single B. bituminosa plant using Roche 454 pyrosequencing, resulting in an average read length of 345 bp (149.1 Mbp in total. Sequences were assembled into 3,838 isotigs/contigs representing putatively unique gene transcripts. Gene ontology descriptors were identified for 3,419 sequences. Raw sequence reads containing simple sequence repeat (SSR motifs were identified, and 240 primer pairs flanking these motifs were designed. Of 87 primer pairs developed this way, 75 (86.2% successfully amplified primarily single fragments by PCR. Fragment analysis using 20 primer pairs in 79 accessions of B. bituminosa detected 130 alleles at 21 SSR loci. Genetic diversity analyses confirmed that variation at these SSR loci accurately reflected known taxonomic relationships in original collections of B. bituminosa and provided additional evidence that a division of the botanical variety bituminosa into two according to geographical origin (Mediterranean region and Canary Islands may be appropriate. Evidence of cross-pollination was also found between botanical varieties within a B. bituminosa breeding programme. Conclusions B. bituminosa can no longer be considered a genomic orphan species, having now a large (albeit incomplete repertoire of expressed gene sequences that can serve as a resource for future genetic studies. This

  14. Pooled-DNA sequencing identifies genomic regions of selection in Nigerian isolates of Plasmodium falciparum.

    Science.gov (United States)

    Oyebola, Kolapo M; Idowu, Emmanuel T; Olukosi, Yetunde A; Awolola, Taiwo S; Amambua-Ngwa, Alfred

    2017-06-29

    The burden of falciparum malaria is especially high in sub-Saharan Africa. Differences in pressure from host immunity and antimalarial drugs lead to adaptive changes responsible for high level of genetic variations within and between the parasite populations. Population-specific genetic studies to survey for genes under positive or balancing selection resulting from drug pressure or host immunity will allow for refinement of interventions. We performed a pooled sequencing (pool-seq) of the genomes of 100 Plasmodium falciparum isolates from Nigeria. We explored allele-frequency based neutrality test (Tajima's D) and integrated haplotype score (iHS) to identify genes under selection. Fourteen shared iHS regions that had at least 2 SNPs with a score > 2.5 were identified. These regions code for genes that were likely to have been under strong directional selection. Two of these genes were the chloroquine resistance transporter (CRT) on chromosome 7 and the multidrug resistance 1 (MDR1) on chromosome 5. There was a weak signature of selection in the dihydrofolate reductase (DHFR) gene on chromosome 4 and MDR5 genes on chromosome 13, with only 2 and 3 SNPs respectively identified within the iHS window. We observed strong selection pressure attributable to continued chloroquine and sulfadoxine-pyrimethamine use despite their official proscription for the treatment of uncomplicated malaria. There was also a major selective sweep on chromosome 6 which had 32 SNPs within the shared iHS region. Tajima's D of circumsporozoite protein (CSP), erythrocyte-binding antigen (EBA-175), merozoite surface proteins - MSP3 and MSP7, merozoite surface protein duffy binding-like (MSPDBL2) and serine repeat antigen (SERA-5) were 1.38, 1.29, 0.73, 0.84 and 0.21, respectively. We have demonstrated the use of pool-seq to understand genomic patterns of selection and variability in P. falciparum from Nigeria, which bears the highest burden of infections. This investigation identified known

  15. cDNA, genomic sequence cloning and overexpression of ribosomal protein S25 gene (RPS25) from the Giant Panda.

    Science.gov (United States)

    Hao, Yan-Zhe; Hou, Wan-Ru; Hou, Yi-Ling; Du, Yu-Jie; Zhang, Tian; Peng, Zheng-Song

    2009-11-01

    RPS25 is a component of the 40S small ribosomal subunit encoded by RPS25 gene, which is specific to eukaryotes. Studies in reference to RPS25 gene from animals were handful. The Giant Panda (Ailuropoda melanoleuca), known as a "living fossil", are increasingly concerned by the world community. Studies on RPS25 of the Giant Panda could provide scientific data for inquiring into the hereditary traits of the gene and formulating the protective strategy for the Giant Panda. The cDNA of the RPS25 cloned from Giant Panda is 436 bp in size, containing an open reading frame of 378 bp encoding 125 amino acids. The length of the genomic sequence is 1,992 bp, which was found to possess four exons and three introns. Alignment analysis indicated that the nucleotide sequence of the coding sequence shows a high homology to those of Homo sapiens, Bos taurus, Mus musculus and Rattus norvegicus as determined by Blast analysis, 92.6, 94.4, 89.2 and 91.5%, respectively. Primary structure analysis revealed that the molecular weight of the putative RPS25 protein is 13.7421 kDa with a theoretical pI 10.12. Topology prediction showed there is one N-glycosylation site, one cAMP and cGMP-dependent protein kinase phosphorylation site, two Protein kinase C phosphorylation sites and one Tyrosine kinase phosphorylation site in the RPS25 protein of the Giant Panda. The RPS25 gene was overexpressed in E. coli BL21 and Western Blotting of the RPS25 protein was also done. The results indicated that the RPS25 gene can be really expressed in E. coli and the RPS25 protein fusioned with the N-terminally his-tagged form gave rise to the accumulation of an expected 17.4 kDa polypeptide. The cDNA and the genomic sequence of RPS25 were cloned successfully for the first time from the Giant Panda using RT-PCR technology and Touchdown-PCR, respectively, which were both sequenced and analyzed preliminarily; then the cDNA of the RPS25 gene was overexpressed in E. coli BL21 and immunoblotted, which is the first

  16. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  17. DNA sequence modeling based on context trees

    NARCIS (Netherlands)

    Kusters, C.J.; Ignatenko, T.; Roland, J.; Horlin, F.

    2015-01-01

    Genomic sequences contain instructions for protein and cell production. Therefore understanding and identification of biologically and functionally meaningful patterns in DNA sequences is of paramount importance. Modeling of DNA sequences in its turn can help to better understand and identify such

  18. Development of a Method to Implement Whole-Genome Bisulfite Sequencing of cfDNA from Cancer Patients and a Mouse Tumor Model

    Directory of Open Access Journals (Sweden)

    Elaine C. Maggi

    2018-01-01

    Full Text Available The goal of this study was to develop a method for whole genome cell-free DNA (cfDNA methylation analysis in humans and mice with the ultimate goal to facilitate the identification of tumor derived DNA methylation changes in the blood. Plasma or serum from patients with pancreatic neuroendocrine tumors or lung cancer, and plasma from a murine model of pancreatic adenocarcinoma was used to develop a protocol for cfDNA isolation, library preparation and whole-genome bisulfite sequencing of ultra low quantities of cfDNA, including tumor-specific DNA. The protocol developed produced high quality libraries consistently generating a conversion rate >98% that will be applicable for the analysis of human and mouse plasma or serum to detect tumor-derived changes in DNA methylation.

  19. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  20. Next-Generation Sequencing of Genomic DNA Fragments Bound to a Transcription Factor in Vitro Reveals Its Regulatory Potential

    Directory of Open Access Journals (Sweden)

    Yukio Kurihara

    2014-12-01

    Full Text Available Several transcription factors (TFs coordinate to regulate expression of specific genes at the transcriptional level. In Arabidopsis thaliana it is estimated that approximately 10% of all genes encode TFs or TF-like proteins. It is important to identify target genes that are directly regulated by TFs in order to understand the complete picture of a plant’s transcriptome profile. Here, we investigate the role of the LONG HYPOCOTYL5 (HY5 transcription factor that acts as a regulator of photomorphogenesis. We used an in vitro genomic DNA binding assay coupled with immunoprecipitation and next-generation sequencing (gDB-seq instead of the in vivo chromatin immunoprecipitation (ChIP-based methods. The results demonstrate that the HY5-binding motif predicted here was similar to the motif reported previously and that in vitro HY5-binding loci largely overlapped with the HY5-targeted candidate genes identified in previous ChIP-chip analysis. By combining these results with microarray analysis, we identified hundreds of HY5-binding genes that were differentially expressed in hy5. We also observed delayed induction of some transcripts of HY5-binding genes in hy5 mutants in response to blue-light exposure after dark treatment. Thus, an in vitro gDNA-binding assay coupled with sequencing is a convenient and powerful method to bridge the gap between identifying TF binding potential and establishing function.

  1. G-quadruplex DNA sequences are evolutionarily conserved and associated with distinct genomic features in Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    John A Capra

    2010-07-01

    Full Text Available G-quadruplex DNA is a four-stranded DNA structure formed by non-Watson-Crick base pairing between stacked sets of four guanines. Many possible functions have been proposed for this structure, but its in vivo role in the cell is still largely unresolved. We carried out a genome-wide survey of the evolutionary conservation of regions with the potential to form G-quadruplex DNA structures (G4 DNA motifs across seven yeast species. We found that G4 DNA motifs were significantly more conserved than expected by chance, and the nucleotide-level conservation patterns suggested that the motif conservation was the result of the formation of G4 DNA structures. We characterized the association of conserved and non-conserved G4 DNA motifs in Saccharomyces cerevisiae with more than 40 known genome features and gene classes. Our comprehensive, integrated evolutionary and functional analysis confirmed the previously observed associations of G4 DNA motifs with promoter regions and the rDNA, and it identified several previously unrecognized associations of G4 DNA motifs with genomic features, such as mitotic and meiotic double-strand break sites (DSBs. Conserved G4 DNA motifs maintained strong associations with promoters and the rDNA, but not with DSBs. We also performed the first analysis of G4 DNA motifs in the mitochondria, and surprisingly found a tenfold higher concentration of the motifs in the AT-rich yeast mitochondrial DNA than in nuclear DNA. The evolutionary conservation of the G4 DNA motif and its association with specific genome features supports the hypothesis that G4 DNA has in vivo functions that are under evolutionary constraint.

  2. New sequence-based data on the relative DNA contents of chromosomes in the normal male and female human diploid genomes for radiation molecular cytogenetics

    Directory of Open Access Journals (Sweden)

    Repin Mikhail V

    2009-06-01

    Full Text Available Abstract Background The objective of this work is to obtain the correct relative DNA contents of chromosomes in the normal male and female human diploid genomes for the use at FISH analysis of radiation-induced chromosome aberrations. Results The relative DNA contents of chromosomes in the male and female human diploid genomes have been calculated from the publicly available international Human Genome Project data. New sequence-based data on the relative DNA contents of human chromosomes were compared with the data recommended by the International Atomic Energy Agency in 2001. The differences in the values of the relative DNA contents of chromosomes obtained by using different approaches for 15 human chromosomes, mainly for large chromosomes, were below 2%. For the chromosomes 13, 17, 20 and 22 the differences were above 5%. Conclusion New sequence-based data on the relative DNA contents of chromosomes in the normal male and female human diploid genomes were obtained. This approach, based on the genome sequence, can be recommended for the use in radiation molecular cytogenetics.

  3. Intermittency as a universal characteristic of the complete chromosome DNA sequences of eukaryotes: From protozoa to human genomes

    Science.gov (United States)

    Rybalko, S.; Larionov, S.; Poptsova, M.; Loskutov, A.

    2011-10-01

    Large-scale dynamical properties of complete chromosome DNA sequences of eukaryotes are considered. Using the proposed deterministic models with intermittency and symbolic dynamics we describe a wide spectrum of large-scale patterns inherent in these sequences, such as segmental duplications, tandem repeats, and other complex sequence structures. It is shown that the recently discovered gene number balance on the strands is not of a random nature, and certain subsystems of a complete chromosome DNA sequence exhibit the properties of deterministic chaos.

  4. Purification of high molecular weight genomic DNA from powdery mildew for long-read sequencing

    Science.gov (United States)

    The powdery mildew fungi are a group of economically important fungal plant pathogens. Relatively little is known about the molecular biology and genetics of these pathogens, in part due to a lack of well-developed genetic and genomic resources. These organisms have large, repetitive genomes, which ...

  5. SigWin-detector: a Grid-enabled workflow for discovering enriched windows of genomic features related to DNA sequences

    Directory of Open Access Journals (Sweden)

    Wibisono Adianto

    2008-08-01

    Full Text Available Abstract Background Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To perform in silico experimentation conveniently with this genomics data, biologists need tools to process and compare datasets routinely and explore the obtained results interactively. The complexity of such experimentation requires these tools to be based on an e-Science approach, hence generic, modular, and reusable. A virtual laboratory environment with workflows, workflow management systems, and Grid computation are therefore essential. Findings Here we apply an e-Science approach to develop SigWin-detector, a workflow-based tool that can detect significantly enriched windows of (genomic features in a (DNA sequence in a fast and reproducible way. For proof-of-principle, we utilize a biological use case to detect regions of increased and decreased gene expression (RIDGEs and anti-RIDGEs in human transcriptome maps. We improved the original method for RIDGE detection by replacing the costly step of estimation by random sampling with a faster analytical formula for computing the distribution of the null hypothesis being tested and by developing a new algorithm for computing moving medians. SigWin-detector was developed using the WS-VLAM workflow management system and consists of several reusable modules that are linked together in a basic workflow. The configuration of this basic workflow can be adapted to satisfy the requirements of the specific in silico experiment. Conclusion As we show with the results from analyses in the biological use case on RIDGEs, SigWin-detector is an efficient and reusable Grid-based tool for discovering windows enriched for features of a particular type in any sequence of values. Thus, SigWin-detector provides the

  6. Sequencing of whole plastid genomes and nuclear ribosomal DNA of Diospyros species (Ebenaceae) endemic to New Caledonia: many species, little divergence.

    Science.gov (United States)

    Turner, Barbara; Paun, Ovidiu; Munzinger, Jérôme; Chase, Mark W; Samuel, Rosabelle

    2016-06-01

    Some plant groups, especially on islands, have been shaped by strong ancestral bottlenecks and rapid, recent radiation of phenotypic characters. Single molecular markers are often not informative enough for phylogenetic reconstruction in such plant groups. Whole plastid genomes and nuclear ribosomal DNA (nrDNA) are viewed by many researchers as sources of information for phylogenetic reconstruction of groups in which expected levels of divergence in standard markers are low. Here we evaluate the usefulness of these data types to resolve phylogenetic relationships among closely related Diospyros species. Twenty-two closely related Diospyros species from New Caledonia were investigated using whole plastid genomes and nrDNA data from low-coverage next-generation sequencing (NGS). Phylogenetic trees were inferred using maximum parsimony, maximum likelihood and Bayesian inference on separate plastid and nrDNA and combined matrices. The plastid and nrDNA sequences were, singly and together, unable to provide well supported phylogenetic relationships among the closely related New Caledonian Diospyros species. In the nrDNA, a 6-fold greater percentage of parsimony-informative characters compared with plastid DNA was found, but the total number of informative sites was greater for the much larger plastid DNA genomes. Combining the plastid and nuclear data improved resolution. Plastid results showed a trend towards geographical clustering of accessions rather than following taxonomic species. In plant groups in which multiple plastid markers are not sufficiently informative, an investigation at the level of the entire plastid genome may also not be sufficient for detailed phylogenetic reconstruction. Sequencing of complete plastid genomes and nrDNA repeats seems to clarify some relationships among the New Caledonian Diospyros species, but the higher percentage of parsimony-informative characters in nrDNA compared with plastid DNA did not help to resolve the phylogenetic tree

  7. Gomphid DNA sequence data

    Data.gov (United States)

    U.S. Environmental Protection Agency — DNA sequence data for several genetic loci. This dataset is not publicly accessible because: It's already publicly available on GenBank. It can be accessed through...

  8. A Rapid and Reproducible Genomic DNA Extraction Protocol for Sequence-Based Identification of Archaea, Bacteria, Cyanobacteria, Diatoms, Fungi, and Green Algae

    OpenAIRE

    Farkhondeh Saba; Moslem Papizadeh; Javad Khansha; Mahshid Sedghi; Mehrnoosh Rasooli; Mohammad Ali Amoozegar; Mohammad Reza Soudi; Seyed Abolhassan Shahzadeh Fazeli

    2016-01-01

    Background:  Sequence-based identification of various microorganisms including Archaea, Bacteria, Cyanobacteria, Diatoms, Fungi, and green algae necessitates an efficient and reproducible genome extraction procedure though which a pure template DNA is yielded and it can be used in polymerase chain reactions (PCR). Considering the fact that DNA extraction from these microorganisms is time consuming and laborious, we developed and standardized a safe, rapid and inexpensive miniprep protocol. Me...

  9. Integrated view of genome structure and sequence of a single DNA molecule in a nanofluidic device

    DEFF Research Database (Denmark)

    Marie, Rodolphe; Pedersen, Jonas Nyvold; L. V. Bauer, David

    2013-01-01

    as well as unique structural variation. Following its mapping, a molecule of interest was rescued fromthe chip;amplified and localized to a chromosome by FISH; and interrogated down to 1-bp resolution with a commercial sequencer, thereby reconciling haplotype-phased chromosome substructure with sequence....

  10. Isolation and sequence characterization of DNA-A genome of a new begomovirus strain associated with severe leaf curling symptoms of Jatropha curcas L.

    KAUST Repository

    Chauhan, Sushma

    2018-04-22

    Begomoviruses belong to the family Geminiviridae are associated with several disease symptoms, such as mosaic and leaf curling in Jatropha curcas. The molecular characterization of these viral strains will help in developing management strategies to control the disease. In this study, J. curcas that was infected with begomovirus and showed acute leaf curling symptoms were identified. DNA-A segment from pathogenic viral strain was isolated and sequenced. The sequenced genome was assembled and characterized in detail. The full-length DNA-A sequence was covered by primer walking. The genome sequence showed the general organization of DNA-A from begomovirus by the distribution of ORFs in both viral and anti-viral strands. The genome size ranged from 2844 bp–2852 bp. Three strains with minor nucleotide variations were identified, and a phylogenetic analysis was performed by comparing the DNA-A segments from other reported begomovirus isolates. The maximum sequence similarity was observed with Euphorbia yellow mosaic virus (FN435995). In the phylogenetic tree, no clustering was observed with previously reported begomovirus strains isolated from J. curcas host. The strains isolated in this study belong to new begomoviral strain that elicits symptoms of leaf curling in J. curcas. The results indicate that the probable origin of the strains is from Jatropha mosaic virus infecting J. gassypifolia. The strains isolated in this study are referred as Jatropha curcas leaf curl India virus (JCLCIV) based on the major symptoms exhibited by host J. curcas.

  11. Effect of Wortmannin on the repair profiles of DNA double-strand breaks in the whole genome and in interstitial telomeric sequences of Chinese hamster cells

    International Nuclear Information System (INIS)

    Losada, Raquel; Rivero, Maria Teresa; Slijepcevic, Predrag; Goyanes, Vicente; Fernandez, Jose Luis

    2005-01-01

    The DNA breakage detection-fluorescence in situ hybridization (DBD-FISH) procedure was applied to analyze the effect of Wortmannin (WM) in the rejoining kinetics of ionizing radiation-induced DNA double-strand breaks (DSBs) in the whole genome and in the long interstitial telomeric repeat sequence (ITRS) blocks from Chinese hamster cell lines. The results indicate that the ITRS blocks from wild-type Chinese hamster cell lines, CHO9 and V79B, exhibit a slower initial rejoining rate of ionizing radiation-induced DSBs than the genome overall. Neither Rad51C nor the catalytic subunit of DNA-dependent protein kinase (DNA-PKcs) activities, involved in homologous recombination (HR) and in non-homologous end-joining (NHEJ) pathways of DSB repair respectively, influenced the rejoining kinetics within ITRS in contrast to DNA sequences in the whole genome. Nevertheless, DSB removal rate within ITRS was decreased in the absence of Ku86 activity, though at a lower affectation level than in the whole genome, thus homogenizing both rejoining kinetics rates. WM treatment slowed down the DSB rejoining kinetics rate in ITRS, this effect being more pronounced in the whole genome, resulting in a similar pattern to that of the Ku86 deficient cells. In fact, no WM effect was detected in the Ku86 deficient Chinese hamster cells, so probably WM does not add further impairment in DSB rejoining than that resulted as a consequence of absence of Ku activity. The same slowing effect was also observed after treatment of Rad51C and DNA-PKcs defective hamster cells by WM, suggesting that: (1) there is no potentiation of the HR when the NHEJ is impaired by WM, either in the whole genome or in the ITRS, and (2) that this impairment may probably involve more targets than DNA-PKcs. These results suggest that there is an intragenomic heterogeneity in DSB repair, as well as in the effect of WM on this process

  12. Repetitive DNA in the pea (Pisum sativum L. genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

    Directory of Open Access Journals (Sweden)

    Navrátilová Alice

    2007-11-01

    Full Text Available Abstract Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum. Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data

  13. Dragon polya spotter: Predictor of poly(A) motifs within human genomic DNA sequences

    KAUST Repository

    Kalkatawi, Manal M.; Rangkuti, Farania; Schramm, Michael C.; Jankovic, Boris R.; Kamau, Allan; Chowdhary, Rajesh; Archer, John A.C.; Bajic, Vladimir B.

    2011-01-01

    . These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity

  14. A DNA-based pattern classifier with in vitro learning and associative recall for genomic characterization and biosensing without explicit sequence knowledge.

    Science.gov (United States)

    Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo

    2014-01-01

    Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could

  15. Complete DNA sequence of the linear mitochondrial genome of the pathogenic yeast Candida parapsilosis

    Czech Academy of Sciences Publication Activity Database

    Nosek, J.; Novotná, Marcela; Hlavaticová, Z.; Ussery, D. W.; Fajkus, Jiří; Tomáška, L.

    2004-01-01

    Roč. 272, č. 2 (2004), s. 173-180 ISSN 1617-4615 Grant - others:Howard Hughes Medical Institute(US) 55000327; VEGA MŠ SR(SK) 1/9153/02; VEGA MŠ SR(SK) 1/0006/03; APVT(SK) 20-003902; Fogarty International NIH(US) 1-R03-TW05654-01 Institutional research plan: CEZ:AV0Z5004920 Keywords : Candida parapsilosis * linear mitochondrial DNA * telomeric circles (t-circles) Subject RIV: BO - Biophysics Impact factor: 2.371, year: 2004

  16. Foundations for a syntatic pattern recognition system for genomic DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Searles, D.B.

    1993-03-01

    The goal of the proposed work is the creation of a software system that will perform sophisticated pattern recognition and related functions at a level of abstraction and with expressive power beyond current general-purpose pattern-matching systems for biological sequences; and with a more uniform language, environment, and graphical user interface, and with greater flexibility, extensibility, embeddability, and ability to incorporate other algorithms, than current special-purpose analytic software.

  17. Isolation and characterization of 5S rDNA sequences in catfishes genome (Heptapteridae and Pseudopimelodidae): perspectives for rDNA studies in fish by C0t method.

    Science.gov (United States)

    Gouveia, Juceli Gonzalez; Wolf, Ivan Rodrigo; de Moraes-Manécolo, Vivian Patrícia Oliveira; Bardella, Vanessa Belline; Ferracin, Lara Munique; Giuliano-Caetano, Lucia; da Rosa, Renata; Dias, Ana Lúcia

    2016-12-01

    Sequences of 5S ribosomal RNA (rRNA) are extensively used in fish cytogenomic studies, once they have a flexible organization at the chromosomal level, showing inter- and intra-specific variation in number and position in karyotypes. Sequences from the genome of Imparfinis schubarti (Heptapteridae) were isolated, aiming to understand the organization of 5S rDNA families in the fish genome. The isolation of 5S rDNA from the genome of I. schubarti was carried out by reassociation kinetics (C 0 t) and PCR amplification. The obtained sequences were cloned for the construction of a micro-library. The obtained clones were sequenced and hybridized in I. schubarti and Microglanis cottoides (Pseudopimelodidae) for chromosome mapping. An analysis of the sequence alignments with other fish groups was accomplished. Both methods were effective when using 5S rDNA for hybridization in I. schubarti genome. However, the C 0 t method enabled the use of a complete 5S rRNA gene, which was also successful in the hybridization of M. cottoides. Nevertheless, this gene was obtained only partially by PCR. The hybridization results and sequence analyses showed that intact 5S regions are more appropriate for the probe operation, due to conserved structure and motifs. This study contributes to a better understanding of the organization of multigene families in catfish's genomes.

  18. Genome-wide DNA polymorphism in the indica rice varieties RGD-7S and Taifeng B as revealed by whole genome re-sequencing.

    Science.gov (United States)

    Fu, Chong-Yun; Liu, Wu-Ge; Liu, Di-Lin; Li, Ji-Hua; Zhu, Man-Shan; Liao, Yi-Long; Liu, Zhen-Rong; Zeng, Xue-Qin; Wang, Feng

    2016-03-01

    Next-generation sequencing technologies provide opportunities to further understand genetic variation, even within closely related cultivars. We performed whole genome resequencing of two elite indica rice varieties, RGD-7S and Taifeng B, whose F1 progeny showed hybrid weakness and hybrid vigor when grown in the early- and late-cropping seasons, respectively. Approximately 150 million 100-bp pair-end reads were generated, which covered ∼86% of the rice (Oryza sativa L. japonica 'Nipponbare') reference genome. A total of 2,758,740 polymorphic sites including 2,408,845 SNPs and 349,895 InDels were detected in RGD-7S and Taifeng B, respectively. Applying stringent parameters, we identified 961,791 SNPs and 46,640 InDels between RGD-7S and Taifeng B (RGD-7S/Taifeng B). The density of DNA polymorphisms was 256.8 SNPs and 12.5 InDels per 100 kb for RGD-7S/Taifeng B. Copy number variations (CNVs) were also investigated. In RGD-7S, 1989 of 2727 CNVs were overlapped in 218 genes, and 1231 of 2010 CNVs were annotated in 175 genes in Taifeng B. In addition, we verified a subset of InDels in the interval of hybrid weakness genes, Hw3 and Hw4, and obtained some polymorphic InDel markers, which will provide a sound foundation for cloning hybrid weakness genes. Analysis of genomic variations will also contribute to understanding the genetic basis of hybrid weakness and heterosis.

  19. Entropic fluctuations in DNA sequences

    Science.gov (United States)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  20. Mining olive genome through library sequencing and bioinformatics ...

    African Journals Online (AJOL)

    As one of the initial steps of olive (Olea europaea L.) genome analysis, a small insert genomic DNA library was constructed (digesting olive genomic DNA with SmaI and cloning the digestion products into pUC19 vector) and randomly picked 83 colonies were sequenced. Analysis of the insert sequences revealed 12 clones ...

  1. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Science.gov (United States)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  2. The Complete Mitochondrial DNA Sequence of Scenedesmus obliquus Reflects an Intermediate Stage in the Evolution of the Green Algal Mitochondrial Genome

    Science.gov (United States)

    Nedelcu, Aurora M.; Lee, Robert W.; Lemieux, Claude; Gray, Michael W.; Burger, Gertraud

    2000-01-01

    Two distinct mitochondrial genome types have been described among the green algal lineages investigated to date: a reduced–derived, Chlamydomonas-like type and an ancestral, Prototheca-like type. To determine if this unexpected dichotomy is real or is due to insufficient or biased sampling and to define trends in the evolution of the green algal mitochondrial genome, we sequenced and analyzed the mitochondrial DNA (mtDNA) of Scenedesmus obliquus. This genome is 42,919 bp in size and encodes 42 conserved genes (i.e., large and small subunit rRNA genes, 27 tRNA and 13 respiratory protein-coding genes), four additional free-standing open reading frames with no known homologs, and an intronic reading frame with endonuclease/maturase similarity. No 5S rRNA or ribosomal protein-coding genes have been identified in Scenedesmus mtDNA. The standard protein-coding genes feature a deviant genetic code characterized by the use of UAG (normally a stop codon) to specify leucine, and the unprecedented use of UCA (normally a serine codon) as a signal for termination of translation. The mitochondrial genome of Scenedesmus combines features of both green algal mitochondrial genome types: the presence of a more complex set of protein-coding and tRNA genes is shared with the ancestral type, whereas the lack of 5S rRNA and ribosomal protein-coding genes as well as the presence of fragmented and scrambled rRNA genes are shared with the reduced–derived type of mitochondrial genome organization. Furthermore, the gene content and the fragmentation pattern of the rRNA genes suggest that this genome represents an intermediate stage in the evolutionary process of mitochondrial genome streamlining in green algae. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF204057.] PMID:10854413

  3. Chloroplast DNA sequence of the green alga Oedogonium cardiacum (Chlorophyceae: Unique genome architecture, derived characters shared with the Chaetophorales and novel genes acquired through horizontal transfer

    Directory of Open Access Journals (Sweden)

    Lemieux Claude

    2008-06-01

    Full Text Available Abstract Background To gain insight into the branching order of the five main lineages currently recognized in the green algal class Chlorophyceae and to expand our understanding of chloroplast genome evolution, we have undertaken the sequencing of chloroplast DNA (cpDNA from representative taxa. The complete cpDNA sequences previously reported for Chlamydomonas (Chlamydomonadales, Scenedesmus (Sphaeropleales, and Stigeoclonium (Chaetophorales revealed tremendous variability in their architecture, the retention of only few ancestral gene clusters, and derived clusters shared by Chlamydomonas and Scenedesmus. Unexpectedly, our recent phylogenies inferred from these cpDNAs and the partial sequences of three other chlorophycean cpDNAs disclosed two major clades, one uniting the Chlamydomonadales and Sphaeropleales (CS clade and the other uniting the Oedogoniales, Chaetophorales and Chaetopeltidales (OCC clade. Although molecular signatures provided strong support for this dichotomy and for the branching of the Oedogoniales as the earliest-diverging lineage of the OCC clade, more data are required to validate these phylogenies. We describe here the complete cpDNA sequence of Oedogonium cardiacum (Oedogoniales. Results Like its three chlorophycean homologues, the 196,547-bp Oedogonium chloroplast genome displays a distinctive architecture. This genome is one of the most compact among photosynthetic chlorophytes. It has an atypical quadripartite structure, is intron-rich (17 group I and 4 group II introns, and displays 99 different conserved genes and four long open reading frames (ORFs, three of which are clustered in the spacious inverted repeat of 35,493 bp. Intriguingly, two of these ORFs (int and dpoB revealed high similarities to genes not usually found in cpDNA. At the gene content and gene order levels, the Oedogonium genome most closely resembles its Stigeoclonium counterpart. Characters shared by these chlorophyceans but missing in members

  4. Genomic sequencing in clinical trials

    OpenAIRE

    Mestan, Karen K; Ilkhanoff, Leonard; Mouli, Samdeep; Lin, Simon

    2011-01-01

    Abstract Human genome sequencing is the process by which the exact order of nucleic acid base pairs in the 24 human chromosomes is determined. Since the completion of the Human Genome Project in 2003, genomic sequencing is rapidly becoming a major part of our translational research efforts to understand and improve human health and disease. This article reviews the current and future directions of clinical research with respect to genomic sequencing, a technology that is just beginning to fin...

  5. The complete chloroplast genome sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact gene organization and a biased distribution of genes on the two DNA strands

    Science.gov (United States)

    de Cambiaire, Jean-Charles; Otis, Christian; Lemieux, Claude; Turmel, Monique

    2006-01-01

    Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. While the basal position of the Prasinophyceae is well established, the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae (UTC) remains uncertain. The five complete chloroplast DNA (cpDNA) sequences currently available for representatives of these classes display considerable variability in overall structure, gene content, gene density, intron content and gene order. Among these genomes, that of the chlorophycean green alga Chlamydomonas reinhardtii has retained the least ancestral features. The two single-copy regions, which are separated from one another by the large inverted repeat (IR), have similar sizes, rather than unequal sizes, and differ radically in both gene contents and gene organizations relative to the single-copy regions of prasinophyte and ulvophyte cpDNAs. To gain insights into the various changes that underwent the chloroplast genome during the evolution of chlorophycean green algae, we have sequenced the cpDNA of Scenedesmus obliquus, a member of a distinct chlorophycean lineage. Results The 161,452 bp IR-containing genome of Scenedesmus features single-copy regions of similar sizes, encodes 96 genes, i.e. only two additional genes (infA and rpl12) relative to its Chlamydomonas homologue and contains seven group I and two group II introns. It is clearly more compact than the four UTC algal cpDNAs that have been examined so far, displays the lowest proportion of short repeats among these algae and shows a stronger bias in clustering of genes on the same DNA strand compared to Chlamydomonas cpDNA. Like the latter genome, Scenedesmus cpDNA displays only a few ancestral gene clusters. The two chlorophycean genomes share 11 gene clusters that are not found in previously sequenced trebouxiophyte and ulvophyte cpDNAs as well as a few genes that have an unusual structure; however, their single-copy regions differ

  6. Next-Generation Sequencing Reveals the Impact of Repetitive DNA Across Phylogenetically Closely Related Genomes of Orobanchaceae

    Czech Academy of Sciences Publication Activity Database

    Piednoël, M.; Aberer, A.J.; Schneeweiss, G. M.; Macas, Jiří; Novák, Petr; Gundlach, H.; Temsch, E.M.; Renner, S.S.

    2012-01-01

    Roč. 29, č. 11 (2012), s. 3601-3611 ISSN 0737-4038 Institutional research plan: CEZ:AV0Z50510513 Institutional support: RVO:60077344 Keywords : next-generation sequencing * polyploidy * genome size * Ty3/Gypsy * transposable elements Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 10.353, year: 2012

  7. Molecular cloning of a human glycophorin B cDNA: nucleotide sequence and genomic relationship to glycophorin A

    International Nuclear Information System (INIS)

    Siebert, P.D.; Fukuda, M.

    1987-01-01

    The authors describe the isolation and nucleotide sequence of a human glycophorin B cDNA. The cDNA was identified by differential hybridization of synthetic oligonucleotide probes to a human erythroleukemic cell line (K562) cDNA library constructed in phage vector λgt10. The nucleotide sequence of the glycophorin B cDNA was compared with that of a previously cloned glycophorin A cDNA. The nucleotide sequences encoding the NH 2 -terminal leader peptide and first 26 amino acids of the two proteins are nearly identical. This homologous region is followed by areas specific to either glycophorin A or B and a number of small regions of homology, which in turn are followed by a very homologous region encoding the presumed membrane-spanning portion of the proteins. They used RNA blot hybridization with both cDNA and synthetic oligonucleotide probes to prove our previous hypothesis that glycophorin B is encoded by a single 0.5- to 0.6-kb mRNA and to show that glycophorins A and B are negatively and coordinately regulated by a tumor-promoting phorbol ester, phorbol 12-myristate 13-acetate. They established the intron/exon structure of the glycophorin A and B genes by oligonucleotide mapping; the results suggest a complex evolution of the glycophorin genes

  8. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  9. Mitochondrial DNA sequence evolution in shorebird populations

    NARCIS (Netherlands)

    Wenink, P.W.

    1994-01-01

    This thesis describes the global molecular population structure of two shorebird species, in particular of the dunlin, Calidris alpina, by means of comparative sequence analysis of the most variable part of the mitochondrial DNA (mtDNA) genome. There are several reasons

  10. Isolation and sequence characterization of DNA-A genome of a new begomovirus strain associated with severe leaf curling symptoms of Jatropha curcas L.

    Science.gov (United States)

    Chauhan, Sushma; Rahman, Hifzur; Mastan, Shaik G; Pamidimarri, D V N Sudheer; Reddy, Muppala P

    2018-07-20

    Begomoviruses belong to the family Geminiviridae are associated with several disease symptoms, such as mosaic and leaf curling in Jatropha curcas. The molecular characterization of these viral strains will help in developing management strategies to control the disease. In this study, J. curcas that was infected with begomovirus and showed acute leaf curling symptoms were identified. DNA-A segment from pathogenic viral strain was isolated and sequenced. The sequenced genome was assembled and characterized in detail. The full-length DNA-A sequence was covered by primer walking. The genome sequence showed the general organization of DNA-A from begomovirus by the distribution of ORFs in both viral and anti-viral strands. The genome size ranged from 2844 bp-2852 bp. Three strains with minor nucleotide variations were identified, and a phylogenetic analysis was performed by comparing the DNA-A segments from other reported begomovirus isolates. The maximum sequence similarity was observed with Euphorbia yellow mosaic virus (FN435995). In the phylogenetic tree, no clustering was observed with previously reported begomovirus strains isolated from J. curcas host. The strains isolated in this study belong to new begomoviral strain that elicits symptoms of leaf curling in J. curcas. The results indicate that the probable origin of the strains is from Jatropha mosaic virus infecting J. gassypifolia. The strains isolated in this study are referred as Jatropha curcas leaf curl India virus (JCLCIV) based on the major symptoms exhibited by host J. curcas. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. A Rapid and Reproducible Genomic DNA Extraction Protocol for Sequence-Based Identification of Archaea, Bacteria, Cyanobacteria, Diatoms, Fungi, and Green Algae

    Directory of Open Access Journals (Sweden)

    Farkhondeh Saba

    2017-01-01

    Full Text Available Background:  Sequence-based identification of various microorganisms including Archaea, Bacteria, Cyanobacteria, Diatoms, Fungi, and green algae necessitates an efficient and reproducible genome extraction procedure though which a pure template DNA is yielded and it can be used in polymerase chain reactions (PCR. Considering the fact that DNA extraction from these microorganisms is time consuming and laborious, we developed and standardized a safe, rapid and inexpensive miniprep protocol. Methods:  According to our results, amplification of various genomic regions including SSU, LSU, ITS, β-tubulin, actin, RPB2, and EF-1 resulted in a reproducible and efficient DNA extraction from a wide range of microorganisms yielding adequate pure genomic material for reproducible PCR-amplifications. Results:   This method relies on a temporary shock of increased concentrations of detergent which can be applied concomitant with multiple freeze-thaws to yield sufficient amount of DNA for PCR amplification of multiple or single fragments(s of the genome. As an advantage, the recipe seems very flexible, thus, various optional steps can be included depending on the samples used.Conclusion:   Having the needed flexibility in each step, this protocol is applicable on a very wide range of samples. Hence, various steps can be included depending on the desired quantity and quality.

  12. A Rapid and Reproducible Genomic DNA Extraction Protocol for Sequence-Based Identification of Archaea, Bacteria, Cyanobacteria, Diatoms, Fungi, and Green Algae

    Directory of Open Access Journals (Sweden)

    Farkhondeh Saba

    2016-09-01

    Full Text Available Background:  Sequence-based identification of various microorganisms including Archaea, Bacteria, Cyanobacteria, Diatoms, Fungi, and green algae necessitates an efficient and reproducible genome extraction procedure though which a pure template DNA is yielded and it can be used in polymerase chain reactions (PCR. Considering the fact that DNA extraction from these microorganisms is time consuming and laborious, we developed and standardized a safe, rapid and inexpensive miniprep protocol. Methods:  According to our results, amplification of various genomic regions including SSU, LSU, ITS, β-tubulin, actin, RPB2, and EF-1 resulted in a reproducible and efficient DNA extraction from a wide range of microorganisms yielding adequate pure genomic material for reproducible PCR-amplifications. Results:   This method relies on a temporary shock of increased concentrations of detergent which can be applied concomitant with multiple freeze-thaws to yield sufficient amount of DNA for PCR amplification of multiple or single fragments(s of the genome. As an advantage, the recipe seems very flexible, thus, various optional steps can be included depending on the samples used.Conclusion:   Having the needed flexibility in each step, this protocol is applicable on a very wide range of samples. Hence, various steps can be included depending on the desired quantity and quality.

  13. Full mitochondrial genome sequences of two endemic Philippine hornbill species (Aves: Bucerotidae) provide evidence for pervasive mitochondrial DNA recombination.

    Science.gov (United States)

    Sammler, Svenja; Bleidorn, Christoph; Tiedemann, Ralph

    2011-01-14

    Although nowaday it is broadly accepted that mitochondrial DNA (mtDNA) may undergo recombination, the frequency of such recombination remains controversial. Its estimation is not straightforward, as recombination under homoplasmy (i.e., among identical mt genomes) is likely to be overlooked. In species with tandem duplications of large mtDNA fragments the detection of recombination can be facilitated, as it can lead to gene conversion among duplicates. Although the mechanisms for concerted evolution in mtDNA are not fully understood yet, recombination rates have been estimated from "one per speciation event" down to 850 years or even "during every replication cycle". Here we present the first complete mt genome of the avian family Bucerotidae, i.e., that of two Philippine hornbills, Aceros waldeni and Penelopides panini. The mt genomes are characterized by a tandemly duplicated region encompassing part of cytochrome b, 3 tRNAs, NADH6, and the control region. The duplicated fragments are identical to each other except for a short section in domain I and for the length of repeat motifs in domain III of the control region. Due to the heteroplasmy with regard to the number of these repeat motifs, there is some size variation in both genomes; with around 21,657 bp (A. waldeni) and 22,737 bp (P. panini), they significantly exceed the hitherto longest known avian mt genomes, that of the albatrosses. We discovered concerted evolution between the duplicated fragments within individuals. The existence of differences between individuals in coding genes as well as in the control region, which are maintained between duplicates, indicates that recombination apparently occurs frequently, i.e., in every generation. The homogenised duplicates are interspersed by a short fragment which shows no sign of recombination. We hypothesize that this region corresponds to the so-called Replication Fork Barrier (RFB), which has been described from the chicken mitochondrial genome. As this RFB

  14. Full mitochondrial genome sequences of two endemic Philippine hornbill species (Aves: Bucerotidae provide evidence for pervasive mitochondrial DNA recombination

    Directory of Open Access Journals (Sweden)

    Bleidorn Christoph

    2011-01-01

    Full Text Available Abstract Background Although nowaday it is broadly accepted that mitochondrial DNA (mtDNA may undergo recombination, the frequency of such recombination remains controversial. Its estimation is not straightforward, as recombination under homoplasmy (i.e., among identical mt genomes is likely to be overlooked. In species with tandem duplications of large mtDNA fragments the detection of recombination can be facilitated, as it can lead to gene conversion among duplicates. Although the mechanisms for concerted evolution in mtDNA are not fully understood yet, recombination rates have been estimated from "one per speciation event" down to 850 years or even "during every replication cycle". Results Here we present the first complete mt genome of the avian family Bucerotidae, i.e., that of two Philippine hornbills, Aceros waldeni and Penelopides panini. The mt genomes are characterized by a tandemly duplicated region encompassing part of cytochrome b, 3 tRNAs, NADH6, and the control region. The duplicated fragments are identical to each other except for a short section in domain I and for the length of repeat motifs in domain III of the control region. Due to the heteroplasmy with regard to the number of these repeat motifs, there is some size variation in both genomes; with around 21,657 bp (A. waldeni and 22,737 bp (P. panini, they significantly exceed the hitherto longest known avian mt genomes, that of the albatrosses. We discovered concerted evolution between the duplicated fragments within individuals. The existence of differences between individuals in coding genes as well as in the control region, which are maintained between duplicates, indicates that recombination apparently occurs frequently, i.e., in every generation. Conclusions The homogenised duplicates are interspersed by a short fragment which shows no sign of recombination. We hypothesize that this region corresponds to the so-called Replication Fork Barrier (RFB, which has been

  15. Whole genome DNA methylation: beyond genes silencing

    OpenAIRE

    Tirado-Magallanes, Roberto; Rebbani, Khadija; Lim, Ricky; Pradhan, Sriharsa; Benoukraf, Touati

    2016-01-01

    The combination of DNA bisulfite treatment with high-throughput sequencing technologies has enabled investigation of genome-wide DNA methylation at near base pair level resolution, far beyond that of the kilobase-long canonical CpG islands that initially revealed the biological relevance of this covalent DNA modification. The latest high-resolution studies have revealed a role for very punctual DNA methylation in chromatin plasticity, gene regulation and splicing. Here, we aim to outline the ...

  16. Karyotype divergence and spreading of 5S rDNA sequences between genomes of two species: darter and emerald gobies ( Ctenogobius , Gobiidae).

    Science.gov (United States)

    Lima-Filho, P A; Bertollo, L A C; Cioffi, M B; Costa, G W W F; Molina, W F

    2014-01-01

    Karyotype analyses of the cryptobenthic marine species Ctenogobius boleosoma and C. smaragdus were performed by means of classical and molecular cytogenetics, including physical mapping of the multigene 18S and 5S rDNA families. C. boleosoma has 2n = 44 chromosomes (2 submetacentrics + 42 acrocentrics; FN = 46) with a single chromosome pair each carrying 18S and 5S ribosomal sites; whereas C. smaragdus has 2n = 48 chromosomes (2 submetacentrics + 46 acrocentrics; FN = 50), also with a single pair bearing 18S rDNA, but an extensive increase in the number of GC-rich 5S rDNA sites in 21 chromosome pairs. The highly divergent karyotypes among Ctenogobius species contrast with observations in several other marine fish groups, demonstrating an accelerated rate of chromosomal evolution mediated by both chromosomal rearrangements and the extensive dispersion of 5S rDNA sequences in the genome. © 2014 S. Karger AG, Basel.

  17. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko; Tanaka, Tsuyoshi; Ohyanagi, Hajime; Hsing, Yue-Ie C.; Itoh, Takeshi

    2018-01-01

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  18. Genome Sequences of Oryza Species

    KAUST Repository

    Kumagai, Masahiko

    2018-02-14

    This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.

  19. The complete chloroplast DNA sequence of the green alga Oltmannsiellopsis viridis reveals a distinctive quadripartite architecture in the chloroplast genome of early diverging ulvophytes

    Directory of Open Access Journals (Sweden)

    Lemieux Claude

    2006-02-01

    Full Text Available Abstract Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. The basal position of the Prasinophyceae has been well documented, but the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae is currently debated. The four complete chloroplast DNA (cpDNA sequences presently available for representatives of these classes have revealed extensive variability in overall structure, gene content, intron composition and gene order. The chloroplast genome of Pseudendoclonium (Ulvophyceae, in particular, is characterized by an atypical quadripartite architecture that deviates from the ancestral type by a large inverted repeat (IR featuring an inverted rRNA operon and a small single-copy (SSC region containing 14 genes normally found in the large single-copy (LSC region. To gain insights into the nature of the events that led to the reorganization of the chloroplast genome in the Ulvophyceae, we have determined the complete cpDNA sequence of Oltmannsiellopsis viridis, a representative of a distinct, early diverging lineage. Results The 151,933 bp IR-containing genome of Oltmannsiellopsis differs considerably from Pseudendoclonium and other chlorophyte cpDNAs in intron content and gene order, but shares close similarities with its ulvophyte homologue at the levels of quadripartite architecture, gene content and gene density. Oltmannsiellopsis cpDNA encodes 105 genes, contains five group I introns, and features many short dispersed repeats. As in Pseudendoclonium cpDNA, the rRNA genes in the IR are transcribed toward the single copy region featuring the genes typically found in the ancestral LSC region, and the opposite single copy region harbours genes characteristic of both the ancestral SSC and LSC regions. The 52 genes that were transferred from the ancestral LSC to SSC region include 12 of those observed in Pseudendoclonium cpDNA. Surprisingly, the overall gene organization of

  20. Chromatid interchanges at intrachromosomal telomeric DNA sequences

    International Nuclear Information System (INIS)

    Fernandez, J.L.; Vazquez-Gundin, F.; Bilbao, A.; Gosalvez, J.; Goyanes, V.

    1997-01-01

    Chinese hamster Don cells were exposed to X-rays, mitomycin C and teniposide (VM-26) to induce chromatid exchanges (quadriradials and triradials). After fluorescence in situ hybridization (FISH) of telomere sequences it was found that interstitial telomere-like DNA sequence arrays presented around five times more breakage-rearrangements than the genome overall. This high recombinogenic capacity was independent of the clastogen, suggesting that this susceptibility is not related to the initial mechanisms of DNA damage. (author)

  1. Detecting exact breakpoints of deletions with diversity in hepatitis B viral genomic DNA from next-generation sequencing data.

    Science.gov (United States)

    Cheng, Ji-Hong; Liu, Wen-Chun; Chang, Ting-Tsung; Hsieh, Sun-Yuan; Tseng, Vincent S

    2017-10-01

    Many studies have suggested that deletions of Hepatitis B Viral (HBV) are associated with the development of progressive liver diseases, even ultimately resulting in hepatocellular carcinoma (HCC). Among the methods for detecting deletions from next-generation sequencing (NGS) data, few methods considered the characteristics of virus, such as high evolution rates and high divergence among the different HBV genomes. Sequencing high divergence HBV genome sequences using the NGS technology outputs millions of reads. Thus, detecting exact breakpoints of deletions from these big and complex data incurs very high computational cost. We proposed a novel analytical method named VirDelect (Virus Deletion Detect), which uses split read alignment base to detect exact breakpoint and diversity variable to consider high divergence in single-end reads data, such that the computational cost can be reduced without losing accuracy. We use four simulated reads datasets and two real pair-end reads datasets of HBV genome sequence to verify VirDelect accuracy by score functions. The experimental results show that VirDelect outperforms the state-of-the-art method Pindel in terms of accuracy score for all simulated datasets and VirDelect had only two base errors even in real datasets. VirDelect is also shown to deliver high accuracy in analyzing the single-end read data as well as pair-end data. VirDelect can serve as an effective and efficient bioinformatics tool for physiologists with high accuracy and efficient performance and applicable to further analysis with characteristics similar to HBV on genome length and high divergence. The software program of VirDelect can be downloaded at https://sourceforge.net/projects/virdelect/. Copyright © 2017. Published by Elsevier Inc.

  2. Repeated DNA sequences in fungi

    Energy Technology Data Exchange (ETDEWEB)

    Dutta, S K

    1974-11-01

    Several fungal species, representatives of all broad groups like basidiomycetes, ascomycetes and phycomycetes, were examined for the nature of repeated DNA sequences by DNA:DNA reassociation studies using hydroxyapatite chromatography. All of the fungal species tested contained 10 to 20 percent repeated DNA sequences. There are approximately 100 to 110 copies of repeated DNA sequences of approximately 4 x 10/sup 7/ daltons piece size of each. Repeated DNA sequence homoduplexes showed on average 5/sup 0/C difference of T/sub e/50 (temperature at which 50 percent duplexes dissociate) values from the corresponding homoduplexes of unfractionated whole DNA. It is suggested that a part of repetitive sequences in fungi constitutes mitochondrial DNA and a part of it constitutes nuclear DNA. (auth)

  3. Genome, transcriptome and methylome sequencing of a primitively eusocial wasp reveal a greatly reduced DNA methylation system in a social insect.

    Science.gov (United States)

    Standage, Daniel S; Berens, Ali J; Glastad, Karl M; Severin, Andrew J; Brendel, Volker P; Toth, Amy L

    2016-04-01

    Comparative genomics of social insects has been intensely pursued in recent years with the goal of providing insights into the evolution of social behaviour and its underlying genomic and epigenomic basis. However, the comparative approach has been hampered by a paucity of data on some of the most informative social forms (e.g. incipiently and primitively social) and taxa (especially members of the wasp family Vespidae) for studying social evolution. Here, we provide a draft genome of the primitively eusocial model insect Polistes dominula, accompanied by analysis of caste-related transcriptome and methylome sequence data for adult queens and workers. Polistes dominula possesses a fairly typical hymenopteran genome, but shows very low genomewide GC content and some evidence of reduced genome size. We found numerous caste-related differences in gene expression, with evidence that both conserved and novel genes are related to caste differences. Most strikingly, these -omics data reveal a major reduction in one of the major epigenetic mechanisms that has been previously suggested to be important for caste differences in social insects: DNA methylation. Along with a conspicuous loss of a key gene associated with environmentally responsive DNA methylation (the de novo DNA methyltransferase Dnmt3), these wasps have greatly reduced genomewide methylation to almost zero. In addition to providing a valuable resource for comparative analysis of social insect evolution, our integrative -omics data for this important behavioural and evolutionary model system call into question the general importance of DNA methylation in caste differences and evolution in social insects. © 2016 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  4. Protecting genomic sequence anonymity with generalization lattices.

    Science.gov (United States)

    Malin, B A

    2005-01-01

    Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual's identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k-1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.

  5. Bacterial identification and subtyping using DNA microarray and DNA sequencing.

    Science.gov (United States)

    Al-Khaldi, Sufian F; Mossoba, Magdi M; Allard, Marc M; Lienau, E Kurt; Brown, Eric D

    2012-01-01

    The era of fast and accurate discovery of biological sequence motifs in prokaryotic and eukaryotic cells is here. The co-evolution of direct genome sequencing and DNA microarray strategies not only will identify, isotype, and serotype pathogenic bacteria, but also it will aid in the discovery of new gene functions by detecting gene expressions in different diseases and environmental conditions. Microarray bacterial identification has made great advances in working with pure and mixed bacterial samples. The technological advances have moved beyond bacterial gene expression to include bacterial identification and isotyping. Application of new tools such as mid-infrared chemical imaging improves detection of hybridization in DNA microarrays. The research in this field is promising and future work will reveal the potential of infrared technology in bacterial identification. On the other hand, DNA sequencing by using 454 pyrosequencing is so cost effective that the promise of $1,000 per bacterial genome sequence is becoming a reality. Pyrosequencing technology is a simple to use technique that can produce accurate and quantitative analysis of DNA sequences with a great speed. The deposition of massive amounts of bacterial genomic information in databanks is creating fingerprint phylogenetic analysis that will ultimately replace several technologies such as Pulsed Field Gel Electrophoresis. In this chapter, we will review (1) the use of DNA microarray using fluorescence and infrared imaging detection for identification of pathogenic bacteria, and (2) use of pyrosequencing in DNA cluster analysis to fingerprint bacterial phylogenetic trees.

  6. Targeted sequencing of plant genomes

    Science.gov (United States)

    Mark D. Huynh

    2014-01-01

    Next-generation sequencing (NGS) has revolutionized the field of genetics by providing a means for fast and relatively affordable sequencing. With the advancement of NGS, wholegenome sequencing (WGS) has become more commonplace. However, sequencing an entire genome is still not cost effective or even beneficial in all cases. In studies that do not require a whole-...

  7. Complete Genome Sequences of 44 Arthrobacter Phages.

    Science.gov (United States)

    Klyczek, Karen K; Jacobs-Sera, Deborah; Adair, Tamarah L; Adams, Sandra D; Ball, Sarah L; Benjamin, Robert C; Bonilla, J Alfred; Breitenberger, Caroline A; Daniels, Charles J; Gaffney, Bobby L; Harrison, Melinda; Hughes, Lee E; King, Rodney A; Krukonis, Gregory P; Lopez, A Javier; Monsen-Collar, Kirsten; Pizzorno, Marie C; Rinehart, Claire A; Staples, Amanda K; Stowe, Emily L; Garlena, Rebecca A; Russell, Daniel A; Cresawn, Steven G; Pope, Welkin H; Hatfull, Graham F

    2018-02-01

    We report here the complete genome sequences of 44 phages infecting Arthrobacter sp. strain ATCC 21022. These phages have double-stranded DNA genomes with sizes ranging from 15,680 to 70,707 bp and G+C contents from 45.1% to 68.5%. All three tail types (belonging to the families Siphoviridae , Myoviridae , and Podoviridae ) are represented. Copyright © 2018 Klyczek et al.

  8. Second generation DNA sequencing of the mitogenome of the Chinstrap penguin and comparative genomics of Antarctic penguins.

    Science.gov (United States)

    Subramanian, Sankar; Lingala, Syamala Gowri; Swaminathan, Siva; Huynen, Leon; Lambert, David

    2014-08-01

    The complete mitochondrial genome of the Chinstrap penguin (Pygoscelis antarcticus) was sequenced and compared with other penguin mitogenomes. The genome is 15,972 bp in length with the number and order of protein coding genes and RNAs being very similar to that of other known penguin mitogenomes. Comparative nucleotide analysis showed the Chinstrap mitogenome shares 94% homology with the mitogenome of its sister species, Pygoscelis adelie (Adélie penguin). Divergence at nonsynonymous nucleotide positions was found to be up to 23 times less than that observed in synonymous positions of protein coding genes, suggesting high selection constraints. The complete mitogenome data will be useful for genetic and evolutionary studies of penguins.

  9. Towards clinical molecular diagnosis of inherited cardiac conditions: a comparison of bench-top genome DNA sequencers.

    Directory of Open Access Journals (Sweden)

    Xinzhong Li

    Full Text Available Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations.We evaluated two next-generation sequencing (NGS platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm, and sequenced on the MiSeq (Illumina and Ion Torrent PGM (Life Technologies. Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%. At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs, with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS.MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias

  10. Genome shotgun sequencing and development of microsatellite ...

    African Journals Online (AJOL)

    Analysis of the gerbera genome DNA ('Raon') general library showed that sequences of (AT), (AG), (AAG) and (AAT) repeats appeared most often, whereas (AC), (AAC) and (ACC) were the least frequent. Primer pairs were designed for 80 loci. Only eight primer pairs produced reproducible polymorphic bands in the 28 ...

  11. Levenshtein error-correcting barcodes for multiplexed DNA sequencing

    NARCIS (Netherlands)

    Buschmann, Tilo; Bystrykh, Leonid V.

    2013-01-01

    Background: High-throughput sequencing technologies are improving in quality, capacity and costs, providing versatile applications in DNA and RNA research. For small genomes or fraction of larger genomes, DNA samples can be mixed and loaded together on the same sequencing track. This so-called

  12. cDNA, genomic sequence cloning, and overexpression of EIF1 from the giant panda (Ailuropoda Melanoleuca) and the black bear (Ursus Thibetanus Mupinensis).

    Science.gov (United States)

    Hou, Wan-ru; Tang, Yun; Hou, Yi-ling; Song, Yan; Zhang, Tian; Wu, Guang-fu

    2010-07-01

    Eukaryotic initiation factor (eIF) EIF1 is a universally conserved translation factor that is involved in translation initiation site selection. The cDNA and the genomic sequences of EIF1 were cloned successfully from the giant panda (Ailuropoda melanoleuca) and the black bear (Ursus thibetanus mupinensis) using reverse transcription polymerase chain reaction (RT-PCR) technology and touchdown-polymerase chain reaction, respectively. The cDNAs of the EIF1 cloned from the giant panda and the black bear are 418 bp in size, containing an open reading frame (ORF) of 342 bp encoding 113 amino acids. The length of the genomic sequence of the giant panda is 1909 bp, which contains four exons and three introns. The length of the genomic sequence of the black bear is 1897 bp, which also contains four exons and three introns. Sequence alignment indicates a high degree of homology to those of Homo sapiens, Mus musculus, Rattus norvegicus, and Bos Taurus at both amino acid and DNA levels. Topology prediction shows there are one N-glycosylation site, two Casein kinase II phosphorylation sites, and a Amidation site in the EIF1 protein of the giant panda and black bear. In addition, there is a protein kinase C phosphorylation site in EIF1 of the giant panda. The giant panda and the black bear EIF1 genes were overexpressed in E. coli BL21. The results indicated that the both EIF1 fusion proteins with the N-terminally His-tagged form gave rise to the accumulation of two expected 19 kDa polypeptide. The expression products obtained could be used to purify the proteins and study their function further.

  13. Puzzling sequences: studying microbial genomes from 'Ötzi'

    International Nuclear Information System (INIS)

    Rattei, T.

    2012-01-01

    Ancient remains, and mummies in particular, are of central value for archaeological research. The Tyrolean iceman “Ötzi” was conserved in a glacier of the Ötztal Alps about 5000 years ago. Aside from morphological and phenotypical classification, the determination of DNA sequences and the subsequent genome analyses have been first applied to mitochondrial DNA and then been extended to genomic DNA. Typically also ancient microbial DNA is sequenced. These sequences allow the identification of pathogens as well as studying the evolution of microorganisms. The talk will explain the metagenomic aspects of the “Ötzi” genome project and discuss the first results. (author)

  14. Biosensors for DNA sequence detection

    Science.gov (United States)

    Vercoutere, Wenonah; Akeson, Mark

    2002-01-01

    DNA biosensors are being developed as alternatives to conventional DNA microarrays. These devices couple signal transduction directly to sequence recognition. Some of the most sensitive and functional technologies use fibre optics or electrochemical sensors in combination with DNA hybridization. In a shift from sequence recognition by hybridization, two emerging single-molecule techniques read sequence composition using zero-mode waveguides or electrical impedance in nanoscale pores.

  15. Genome-wide DNA methylation sequencing reveals miR-663a is a novel epimutation candidate in CIMP-high endometrial cancer.

    Science.gov (United States)

    Yanokura, Megumi; Banno, Kouji; Adachi, Masataka; Aoki, Daisuke; Abe, Kuniya

    2017-06-01

    Aberrant DNA methylation is widely observed in many cancers. Concurrent DNA methylation of multiple genes occurs in endometrial cancer and is referred to as the CpG island methylator phenotype (CIMP). However, the features and causes of CIMP-positive endometrial cancer are not well understood. To investigate DNA methylation features characteristic to CIMP-positive endometrial cancer, we first classified samples from 25 patients with endometrial cancer based on the methylation status of three genes, i.e. MLH1, CDH1 (E-cadherin) and APC: CIMP-high (CIMP-H, 2/25, 8.0%), CIMP-low (CIMP-L, 7/25, 28.0%) and CIMP-negative (CIMP(-), 16/25, 64.0%). We then selected two samples each from CIMP-H and CIMP(-) classes, and analyzed DNA methylation status of both normal (peripheral blood cells: PBCs) and cancer tissues by genome-wide, targeted bisulfite sequencing. Genomes of the CIMP-H cancer tissues were significantly hypermethylated compared to those of the CIMP(-). Surprisingly, in normal tissues of the CIMP-H patients, promoter region of the miR-663a locus is hypermethylated relative to CIMP(-) samples. Consistent with this finding, miR-663a expression was lower in the CIMP-H PBCs than in the CIMP(-) PBCs. The same region of the miR663a locus is found to be highly methylated in cancer tissues of both CIMP-H and CIMP(-) cases. This is the first report showing that aberrant DNA methylation of the miR-663a promoter can occur in normal tissue of the cancer patients, suggesting a possible link between this epigenetic abnormality and endometrial cancer. This raises the possibility that the hypermethylation of the miR-663a promoter represents an epimutation associated with the CIMP-H endometrial cancers. Based on these findings, relationship of the aberrant DNA methylation and CIMP-H phenotype is discussed.

  16. From Sequence to Morphology - Long-Range Correlations in Complete Sequenced Genomes

    NARCIS (Netherlands)

    T.A. Knoch (Tobias)

    2004-01-01

    textabstractThe largely unresolved sequential organization, i.e. the relations within DNA sequences, and its connection to the three-dimensional organization of genomes was investigated by correlation analyses of completely sequenced chromosomes from Viroids, Archaea, Bacteria, Arabidopsis

  17. Whitefly (Bemisia tabaci genome project: analysis of sequenced clones from egg, instar, and adult (viruliferous and non-viruliferous cDNA libraries

    Directory of Open Access Journals (Sweden)

    Czosnek Henryk

    2006-04-01

    Full Text Available Abstract Background The past three decades have witnessed a dramatic increase in interest in the whitefly Bemisia tabaci, owing to its nature as a taxonomically cryptic species, the damage it causes to a large number of herbaceous plants because of its specialized feeding in the phloem, and to its ability to serve as a vector of plant viruses. Among the most important plant viruses to be transmitted by B. tabaci are those in the genus Begomovirus (family, Geminiviridae. Surprisingly, little is known about the genome of this whitefly. The haploid genome size for male B. tabaci has been estimated to be approximately one billion bp by flow cytometry analysis, about five times the size of the fruitfly Drosophila melanogaster. The genes involved in whitefly development, in host range plasticity, and in begomovirus vector specificity and competency, are unknown. Results To address this general shortage of genomic sequence information, we have constructed three cDNA libraries from non-viruliferous whiteflies (eggs, immature instars, and adults and two from adult insects that fed on tomato plants infected by two geminiviruses: Tomato yellow leaf curl virus (TYLCV and Tomato mottle virus (ToMoV. In total, the sequence of 18,976 clones was determined. After quality control, and removal of 5,542 clones of mitochondrial origin 9,110 sequences remained which included 3,843 singletons and 1,017 contigs. Comparisons with public databases indicated that the libraries contained genes involved in cellular and developmental processes. In addition, approximately 1,000 bases aligned with the genome of the B. tabaci endosymbiotic bacterium Candidatus Portiera aleyrodidarum, originating primarily from the egg and instar libraries. Apart from the mitochondrial sequences, the longest and most abundant sequence encodes vitellogenin, which originated from whitefly adult libraries, indicating that much of the gene expression in this insect is directed toward the production

  18. Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery

    Directory of Open Access Journals (Sweden)

    Kirkness Ewen

    2006-10-01

    Full Text Available Abstract Background Population genetic studies of dogs have so far mainly been based on analysis of mitochondrial DNA, describing only the history of female dogs. To get a picture of the male history, as well as a second independent marker, there is a need for studies of biallelic Y-chromosome polymorphisms. However, there are no biallelic polymorphisms reported, and only 3200 bp of non-repetitive dog Y-chromosome sequence deposited in GenBank, necessitating the identification of dog Y chromosome sequence and the search for polymorphisms therein. The genome has been only partially sequenced for one male dog, disallowing mapping of the sequence into specific chromosomes. However, by comparing the male genome sequence to the complete female dog genome sequence, candidate Y-chromosome sequence may be identified by exclusion. Results The male dog genome sequence was analysed by Blast search against the human genome to identify sequences with a best match to the human Y chromosome and to the female dog genome to identify those absent in the female genome. Candidate sequences were then tested for male specificity by PCR of five male and five female dogs. 32 sequences from the male genome, with a total length of 24 kbp, were identified as male specific, based on a match to the human Y chromosome, absence in the female dog genome and male specific PCR results. 14437 bp were then sequenced for 10 male dogs originating from Europe, Southwest Asia, Siberia, East Asia, Africa and America. Nine haplotypes were found, which were defined by 14 substitutions. The genetic distance between the haplotypes indicates that they originate from at least five wolf haplotypes. There was no obvious trend in the geographic distribution of the haplotypes. Conclusion We have identified 24159 bp of dog Y-chromosome sequence to be used for population genetic studies. We sequenced 14437 bp in a worldwide collection of dogs, identifying 14 SNPs for future SNP analyses, and

  19. RANDNA: a random DNA sequence generator.

    Science.gov (United States)

    Piva, Francesco; Principato, Giovanni

    2006-01-01

    Monte Carlo simulations are useful to verify the significance of data. Genomic regularities, such as the nucleotide correlations or the not uniform distribution of the motifs throughout genomic or mature mRNA sequences, exist and their significance can be checked by means of the Monte Carlo test. The test needs good quality random sequences in order to work, moreover they should have the same nucleotide distribution as the sequences in which the regularities have been found. Random DNA sequences are also useful to estimate the background score of an alignment, that is a threshold below which the resulting score is merely due to chance. We have developed RANDNA, a free software which allows to produce random DNA or RNA sequences setting both their length and the percentage of nucleotide composition. Sequences having the same nucleotide distribution of exonic, intronic or intergenic sequences can be generated. Its graphic interface makes it possible to easily set the parameters that characterize the sequences being produced and saved in a text format file. The pseudo-random number generator function of Borland Delphi 6 is used, since it guarantees a good randomness, a long cycle length and a high speed. We have checked the quality of sequences generated by the software, by means of well-known tests, both by themselves and versus genuine random sequences. We show the good quality of the generated sequences. The software, complete with examples and documentation, is freely available to users from: http://www.introni.it/en/software.

  20. On site DNA barcoding by nanopore sequencing.

    Directory of Open Access Journals (Sweden)

    Michele Menegon

    Full Text Available Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet's biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities.

  1. Whole-genome sequencing of veterinary pathogens

    DEFF Research Database (Denmark)

    Ronco, Troels

    -electrophoresis and single-locus sequencing has been widely used to characterize such types of veterinary pathogens. However, DNA sequencing techniques have become fast and cost effective in recent years and whole-genome sequencing data provide a much higher discriminative power and reproducibility than any...... genetic background. This indicates that dairy cows can be natural carriers of S. aureus subtypes that in certain cases lead to CM. A group of isolates that mostly belonged to ST151 carried three pathogenicity islands that were primarily found in this group. The prevalence of resistance genes was generally...

  2. The Sequenced Angiosperm Genomes and Genome Databases.

    Science.gov (United States)

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  3. Genome-Wide Footprints of Pig Domestication and Selection Revealed through Massive Parallel Sequencing of Pooled DNA

    NARCIS (Netherlands)

    Amaral, A.J.; Ferretti, L.; Megens, H.J.W.C.; Crooijmans, R.P.M.A.; Nie, H.; Ramos-Onsins, S.E.; Perez-Enciso, M.; Schook, L.B.; Groenen, M.A.M.

    2011-01-01

    Background Artificial selection has caused rapid evolution in domesticated species. The identification of selection footprints across domesticated genomes can contribute to uncover the genetic basis of phenotypic diversity. Methodology/Main Findings Genome wide footprints of pig domestication and

  4. SigWin-detector: A Grid-enabled workflow for discovering enriched windows of genomic features related to DNA sequences

    NARCIS (Netherlands)

    Inda, M.A.; van Batenburg, M.F.; Roos, M.; Belloum, A.S.Z.; Vasunin, D.; Wibisono, A.; van Kampen, A.H.C.; Breit, T.M.

    2008-01-01

    Background: Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To

  5. SigWin-detector: a Grid-enabled workflow for discovering enriched windows of genomic features related to DNA sequences

    NARCIS (Netherlands)

    Inda, Marcia A.; van Batenburg, Marinus F.; Roos, Marco; Belloum, Adam Sz; Vasunin, Dmitry; Wibisono, Adianto; van Kampen, Antoine H. C.; Breit, Timo M.

    2008-01-01

    ABSTRACT: BACKGROUND: Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome

  6. Insights from 20 years of bacterial genome sequencing

    DEFF Research Database (Denmark)

    Land, Miriam; Hauser, Loren; Jun, Se-Ran

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along...... the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative...... genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling...

  7. Detection of Non-Amplified Genomic DNA

    CERN Document Server

    Corradini, Roberto

    2012-01-01

    This book offers a state-of-the-art overview on non amplified DNA detection methods and provides chemists, biochemists, biotechnologists and material scientists with an introduction to these methods. In fact all these fields have dedicated resources to the problem of nucleic acid detection, each contributing with their own specific methods and concepts. This book will explain the basic principles of the different non amplified DNA detection methods available, highlighting their respective advantages and limitations. The importance of non-amplified DNA sequencing technologies will be also discussed. Non-amplified DNA detection can be achieved by adopting different techniques. Such techniques have allowed the commercialization of innovative platforms for DNA detection that are expected to break into the DNA diagnostics market. The enhanced sensitivity required for the detection of non amplified genomic DNA has prompted new strategies that can achieve ultrasensitivity by combining specific materials with specifi...

  8. Salmo salar and Esox lucius full-length cDNA sequences reveal changes in evolutionary pressures on a post-tetraploidization genome

    Directory of Open Access Journals (Sweden)

    Holt Robert A

    2010-04-01

    Full Text Available Abstract Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar, but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate.

  9. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    Energy Technology Data Exchange (ETDEWEB)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  10. Gene Discovery through Genomic Sequencing of Brucella abortus

    OpenAIRE

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposit...

  11. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences

    Directory of Open Access Journals (Sweden)

    Shairul Izan

    2017-08-01

    Full Text Available Whole Genome Shotgun (WGS sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb, Aegilops tauschii (4 Gb and Paphiopedilum henryanum (25 Gb. We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.

  12. Special Issue: Next Generation DNA Sequencing

    Directory of Open Access Journals (Sweden)

    Paul Richardson

    2010-10-01

    Full Text Available Next Generation Sequencing (NGS refers to technologies that do not rely on traditional dideoxy-nucleotide (Sanger sequencing where labeled DNA fragments are physically resolved by electrophoresis. These new technologies rely on different strategies, but essentially all of them make use of real-time data collection of a base level incorporation event across a massive number of reactions (on the order of millions versus 96 for capillary electrophoresis for instance. The major commercial NGS platforms available to researchers are the 454 Genome Sequencer (Roche, Illumina (formerly Solexa Genome analyzer, the SOLiD system (Applied Biosystems/Life Technologies and the Heliscope (Helicos Corporation. The techniques and different strategies utilized by these platforms are reviewed in a number of the papers in this special issue. These technologies are enabling new applications that take advantage of the massive data produced by this next generation of sequencing instruments. [...

  13. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  14. Duplication in DNA Sequences

    Science.gov (United States)

    Ito, Masami; Kari, Lila; Kincaid, Zachary; Seki, Shinnosuke

    The duplication and repeat-deletion operations are the basis of a formal language theoretic model of errors that can occur during DNA replication. During DNA replication, subsequences of a strand of DNA may be copied several times (resulting in duplications) or skipped (resulting in repeat-deletions). As formal language operations, iterated duplication and repeat-deletion of words and languages have been well studied in the literature. However, little is known about single-step duplications and repeat-deletions. In this paper, we investigate several properties of these operations, including closure properties of language families in the Chomsky hierarchy and equations involving these operations. We also make progress toward a characterization of regular languages that are generated by duplicating a regular language.

  15. A DNA Structure-Based Bionic Wavelet Transform and Its Application to DNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Fei Chen

    2003-01-01

    Full Text Available DNA sequence analysis is of great significance for increasing our understanding of genomic functions. An important task facing us is the exploration of hidden structural information stored in the DNA sequence. This paper introduces a DNA structure-based adaptive wavelet transform (WT – the bionic wavelet transform (BWT – for DNA sequence analysis. The symbolic DNA sequence can be separated into four channels of indicator sequences. An adaptive symbol-to-number mapping, determined from the structural feature of the DNA sequence, was introduced into WT. It can adjust the weight value of each channel to maximise the useful energy distribution of the whole BWT output. The performance of the proposed BWT was examined by analysing synthetic and real DNA sequences. Results show that BWT performs better than traditional WT in presenting greater energy distribution. This new BWT method should be useful for the detection of the latent structural features in future DNA sequence analysis.

  16. Adenoviral DNA replication: DNA sequences and enzymes required for initiation in vitro

    International Nuclear Information System (INIS)

    Stillman, B.W.; Tamanoi, F.

    1983-01-01

    In this paper evidence is provided that the 140,000-dalton DNA polymerase is encoded by the adenoviral genome and is required for the initiation of DNA replication in vitro. The DNA sequences in the template DNA that are required for the initiation of replication have also been identified, using both plasmid DNAs and synthetic oligodeoxyribonucleotides. 48 references, 7 figures, 1 table

  17. Graphene nanodevices for DNA sequencing

    NARCIS (Netherlands)

    Heerema, S.J.; Dekker, C.

    2016-01-01

    Fast, cheap, and reliable DNA sequencing could be one of the most disruptive innovations of this decade, as it will pave the way for personalized medicine. In pursuit of such technology, a variety of nanotechnology-based approaches have been explored and established, including sequencing with

  18. Complete mitochondrial DNA sequences of the Victoria tilapia (Oreochromis variabilis) and Redbelly Tilapia (Tilapia zilli): genome characterization and phylogeny analysis.

    Science.gov (United States)

    Kinaro, Zachary Omambia; Xue, Liangyi; Volatiana, Josies Ancella

    2016-07-01

    The Cichlid fishes have played an important role in evolutionary biology, population studies and aquaculture industry with East African species representing a model suited for studying adaptive radiation and speciation for cichlid genome projects in which closely related genomes are fast emerging presenting questions on phenotype-genotype relations. The complete mitochondrial genomes presented here are for two closely related but eco-morphologically distinct Lake Victoria basin cichlids, Oreochromis variabilis, an endangered native species and Tilapia zilli, an invasive species, both of which are important economic fishes in local areas. The complete mitochondrial genomes determined for O. variabilis and T. zilli are 16 626 and 16,619 bp, respectively. Both the mitogenomes contain 13 protein-coding genes, 22 tRNAs, 2 rRNAs and a non-coding control region, which are typical of vertebrate mitogenomes. Phylogenetic analyses of the two species revealed that though both lie within family Cichlidae, they are remotely related.

  19. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Science.gov (United States)

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  20. Getting complete genomes from complex samples using nanopore sequencing

    DEFF Research Database (Denmark)

    Kirkegaard, Rasmus Hansen; Karst, Søren Michael; Albertsen, Mads

    Background Short read DNA sequencing and metagenomic binning workflows have made it possible to extract bacterial genome bins from environmental microbial samples containing hundreds to thousands of different species. However, these genome bins often do not represent complete genomes......, as they are mostly fragmented, incomplete and often contaminated with foreign DNA. The value of these `draft genomes` have limited, lasting value to the scientific community, as gene synteny is broken and there is some uncertainty of what is missing1. The genetic material most often missed is important multi......-copy and/or conserved marker genes such as the 16S rRNA gene, as sequence micro-heterogeneity prevents assembly of these genes in the de novo assembly. However, long read sequencing technologies are emerging promising an end to fragmented genome assemblies2. Experimental design We extracted DNA from a full...

  1. Investigating the prehistory of Tungusic peoples of Siberia and the Amur-Ussuri region with complete mtDNA genome sequences and Y-chromosomal markers.

    Science.gov (United States)

    Duggan, Ana T; Whitten, Mark; Wiebe, Victor; Crawford, Michael; Butthof, Anne; Spitsyn, Victor; Makarov, Sergey; Novgorodov, Innokentiy; Osakovsky, Vladimir; Pakendorf, Brigitte

    2013-01-01

    Evenks and Evens, Tungusic-speaking reindeer herders and hunter-gatherers, are spread over a wide area of northern Asia, whereas their linguistic relatives the Udegey, sedentary fishermen and hunter-gatherers, are settled to the south of the lower Amur River. The prehistory and relationships of these Tungusic peoples are as yet poorly investigated, especially with respect to their interactions with neighbouring populations. In this study, we analyse over 500 complete mtDNA genome sequences from nine different Evenk and even subgroups as well as their geographic neighbours from Siberia and their linguistic relatives the Udegey from the Amur-Ussuri region in order to investigate the prehistory of the Tungusic populations. These data are supplemented with analyses of Y-chromosomal haplogroups and STR haplotypes in the Evenks, Evens, and neighbouring Siberian populations. We demonstrate that whereas the North Tungusic Evenks and Evens show evidence of shared ancestry both in the maternal and in the paternal line, this signal has been attenuated by genetic drift and differential gene flow with neighbouring populations, with isolation by distance further shaping the maternal genepool of the Evens. The Udegey, in contrast, appear quite divergent from their linguistic relatives in the maternal line, with a mtDNA haplogroup composition characteristic of populations of the Amur-Ussuri region. Nevertheless, they show affinities with the Evenks, indicating that they might be the result of admixture between local Amur-Ussuri populations and Tungusic populations from the north.

  2. Investigating the Prehistory of Tungusic Peoples of Siberia and the Amur-Ussuri Region with Complete mtDNA Genome Sequences and Y-chromosomal Markers

    Science.gov (United States)

    Duggan, Ana T.; Whitten, Mark; Wiebe, Victor; Crawford, Michael; Butthof, Anne; Spitsyn, Victor; Makarov, Sergey; Novgorodov, Innokentiy; Osakovsky, Vladimir; Pakendorf, Brigitte

    2013-01-01

    Evenks and Evens, Tungusic-speaking reindeer herders and hunter-gatherers, are spread over a wide area of northern Asia, whereas their linguistic relatives the Udegey, sedentary fishermen and hunter-gatherers, are settled to the south of the lower Amur River. The prehistory and relationships of these Tungusic peoples are as yet poorly investigated, especially with respect to their interactions with neighbouring populations. In this study, we analyse over 500 complete mtDNA genome sequences from nine different Evenk and even subgroups as well as their geographic neighbours from Siberia and their linguistic relatives the Udegey from the Amur-Ussuri region in order to investigate the prehistory of the Tungusic populations. These data are supplemented with analyses of Y-chromosomal haplogroups and STR haplotypes in the Evenks, Evens, and neighbouring Siberian populations. We demonstrate that whereas the North Tungusic Evenks and Evens show evidence of shared ancestry both in the maternal and in the paternal line, this signal has been attenuated by genetic drift and differential gene flow with neighbouring populations, with isolation by distance further shaping the maternal genepool of the Evens. The Udegey, in contrast, appear quite divergent from their linguistic relatives in the maternal line, with a mtDNA haplogroup composition characteristic of populations of the Amur-Ussuri region. Nevertheless, they show affinities with the Evenks, indicating that they might be the result of admixture between local Amur-Ussuri populations and Tungusic populations from the north. PMID:24349531

  3. Investigating the prehistory of Tungusic peoples of Siberia and the Amur-Ussuri region with complete mtDNA genome sequences and Y-chromosomal markers.

    Directory of Open Access Journals (Sweden)

    Ana T Duggan

    Full Text Available Evenks and Evens, Tungusic-speaking reindeer herders and hunter-gatherers, are spread over a wide area of northern Asia, whereas their linguistic relatives the Udegey, sedentary fishermen and hunter-gatherers, are settled to the south of the lower Amur River. The prehistory and relationships of these Tungusic peoples are as yet poorly investigated, especially with respect to their interactions with neighbouring populations. In this study, we analyse over 500 complete mtDNA genome sequences from nine different Evenk and even subgroups as well as their geographic neighbours from Siberia and their linguistic relatives the Udegey from the Amur-Ussuri region in order to investigate the prehistory of the Tungusic populations. These data are supplemented with analyses of Y-chromosomal haplogroups and STR haplotypes in the Evenks, Evens, and neighbouring Siberian populations. We demonstrate that whereas the North Tungusic Evenks and Evens show evidence of shared ancestry both in the maternal and in the paternal line, this signal has been attenuated by genetic drift and differential gene flow with neighbouring populations, with isolation by distance further shaping the maternal genepool of the Evens. The Udegey, in contrast, appear quite divergent from their linguistic relatives in the maternal line, with a mtDNA haplogroup composition characteristic of populations of the Amur-Ussuri region. Nevertheless, they show affinities with the Evenks, indicating that they might be the result of admixture between local Amur-Ussuri populations and Tungusic populations from the north.

  4. Whole-genome methylation caller designed for methyl- DNA ...

    African Journals Online (AJOL)

    etchie

    2013-02-20

    Feb 20, 2013 ... Our method uses a single-CpG-resolution, whole-genome methylation ... Key words: Methyl-DNA immunoprecipitation, next-generation sequencing, ...... methylation is prevalent in embryonic stem cells andmaybe mediated.

  5. Genomic Sequencing of Single Microbial Cells from Environmental Samples

    Energy Technology Data Exchange (ETDEWEB)

    Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

    2008-02-01

    Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

  6. Sequence analysis of Leukemia DNA

    Science.gov (United States)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  7. Microbial genome sequencing using optical mapping and Illumina sequencing

    Science.gov (United States)

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  8. The sequence of the Helicoverpa armigera single nucleocapsid nucleopolyhedrovirus genome

    NARCIS (Netherlands)

    Chen, X.; IJkel, W.F.J.; Tarchini, R.; Sun, X.; Sandbrink, H.; Wang, H.; Peters, S.; Zuidema, D.; Klein Lankhorst, R.; Vlak, J.M.; Hu, Z.

    2001-01-01

    The nucleotide sequence of the Helicoverpa armigera single-nucleocapsid nucleopolyhedrovirus (HaSNPV) DNA genome was determined and analysed. The circular genome encompasses 131 403 bp, has a G C content of 39.1 molnd contains five homologous regions with a unique pattern of repeats.

  9. Fractals in DNA sequence analysis

    Institute of Scientific and Technical Information of China (English)

    Yu Zu-Guo(喻祖国); Vo Anh; Gong Zhi-Min(龚志民); Long Shun-Chao(龙顺潮)

    2002-01-01

    Fractal methods have been successfully used to study many problems in physics, mathematics, engineering, finance,and even in biology. There has been an increasing interest in unravelling the mysteries of DNA; for example, how can we distinguish coding and noncoding sequences, and the problems of classification and evolution relationship of organisms are key problems in bioinformatics. Although much research has been carried out by taking into consideration the long-range correlations in DNA sequences, and the global fractal dimension has been used in these works by other people, the models and methods are somewhat rough and the results are not satisfactory. In recent years, our group has introduced a time series model (statistical point of view) and a visual representation (geometrical point of view)to DNA sequence analysis. We have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. In this paper, we introduce these fractal models and methods and the results of DNA sequence analysis.

  10. Foundations for a syntatic pattern recognition system for genomic DNA sequences. [Annual] report, 1 December 1991--31 March 1993

    Energy Technology Data Exchange (ETDEWEB)

    Searles, D.B.

    1993-03-01

    The goal of the proposed work is the creation of a software system that will perform sophisticated pattern recognition and related functions at a level of abstraction and with expressive power beyond current general-purpose pattern-matching systems for biological sequences; and with a more uniform language, environment, and graphical user interface, and with greater flexibility, extensibility, embeddability, and ability to incorporate other algorithms, than current special-purpose analytic software.

  11. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  12. Toward a Better Compression for DNA Sequences Using Huffman Encoding.

    Science.gov (United States)

    Al-Okaily, Anas; Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

    2017-04-01

    Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).

  13. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  14. Oxford Nanopore MinION Sequencing and Genome Assembly

    Directory of Open Access Journals (Sweden)

    Hengyun Lu

    2016-10-01

    Full Text Available The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS technology. The third-generation sequencing (TGS technology, led by Pacific Biosciences (PacBio, is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT. MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the genomics community. While de novo genome assemblies can be cheaply produced from SGS data, assembly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.

  15. Draft Genome Sequence of "Terrisporobacter othiniensis" Isolated from a Blood Culture from a Human Patient

    DEFF Research Database (Denmark)

    Lund, Lars Christian; Sydenham, Thomas Vognbjerg; Høgh, Silje Vermedal

    2015-01-01

    "Terrisporobacter othiniensis" (proposed species) was isolated from a blood culture. Genomic DNA was sequenced using a MiSeq benchtop sequencer (Illumina) and assembled using the SPAdes genome assembler. This resulted in a draft genome sequence comprising 3,980,019 bp in 167 contigs containing 3...

  16. Rapid Multiplex Small DNA Sequencing on the MinION Nanopore Sequencing Platform

    Directory of Open Access Journals (Sweden)

    Shan Wei

    2018-05-01

    Full Text Available Real-time sequencing of short DNA reads has a wide variety of clinical and research applications including screening for mutations, target sequences and aneuploidy. We recently demonstrated that MinION, a nanopore-based DNA sequencing device the size of a USB drive, could be used for short-read DNA sequencing. In this study, an ultra-rapid multiplex library preparation and sequencing method for the MinION is presented and applied to accurately test normal diploid and aneuploidy samples’ genomic DNA in under three hours, including library preparation and sequencing. This novel method shows great promise as a clinical diagnostic test for applications requiring rapid short-read DNA sequencing.

  17. Fast and secure retrieval of DNA sequences

    NARCIS (Netherlands)

    2014-01-01

    Sequence models are retrieved from a sequences index. The sequence models model DNA or RNA sequences stored in a database, and each comprises a finite memory tree source model and parameters for the finite memory tree source model. One or more DNA or RNA sequences stored in the database are

  18. Genome-wide DNA polymorphism analyses using VariScan

    Directory of Open Access Journals (Sweden)

    Vilella Albert J

    2006-09-01

    Full Text Available Abstract Background DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. Results We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i exhaustive population-genetic analyses including those based on the coalescent theory; ii analysis adapted to the shallow data generated by the high-throughput genome projects; iii use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v visualization of the results integrated with current genome annotations in commonly available genome browsers. Conclusion VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.

  19. Scrutinizing virus genome termini by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Shasha Li

    Full Text Available Analysis of genomic terminal sequences has been a major step in studies on viral DNA replication and packaging mechanisms. However, traditional methods to study genome termini are challenging due to the time-consuming protocols and their inefficiency where critical details are lost easily. Recent advances in next generation sequencing (NGS have enabled it to be a powerful tool to study genome termini. In this study, using NGS we sequenced one iridovirus genome and twenty phage genomes and confirmed for the first time that the high frequency sequences (HFSs found in the NGS reads are indeed the terminal sequences of viral genomes. Further, we established a criterion to distinguish the type of termini and the viral packaging mode. We also obtained additional terminal details such as terminal repeats, multi-termini, asymmetric termini. With this approach, we were able to simultaneously detect details of the genome termini as well as obtain the complete sequence of bacteriophage genomes. Theoretically, this application can be further extended to analyze larger and more complicated genomes of plant and animal viruses. This study proposed a novel and efficient method for research on viral replication, packaging, terminase activity, transcription regulation, and metabolism of the host cell.

  20. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  1. Somatic DNA recombination yielding circular DNA and deletion of a genomic region in embryonic brain

    International Nuclear Information System (INIS)

    Maeda, Toyoki; Chijiiwa, Yoshiharu; Tsuji, Hideo; Sakoda, Saburo; Tani, Kenzaburo; Suzuki, Tomokazu

    2004-01-01

    In this study, a mouse genomic region is identified that undergoes DNA rearrangement and yields circular DNA in brain during embryogenesis. External region-directed inverse polymerase chain reaction on circular DNA extracted from late embryonic brain tissue repeatedly detected DNA of this region containing recombination joints. Wide-range genomic PCR and digestion-circularization PCR analysis showed this region underwent recombination accompanied with deletion of intervening sequences, including the circularized regions. This region was mapped by fluorescence in situ hybridization to C1 on mouse chromosome 16, where no gene and no physiological DNA rearrangement had been identified. DNA sequence in the region has segmental homology to an orthologous region on human chromosome 3q.13. These observations demonstrated somatic DNA recombination yielding genomic deletions in brain during embryogenesis

  2. DNA Replication Profiling Using Deep Sequencing.

    Science.gov (United States)

    Saayman, Xanita; Ramos-Pérez, Cristina; Brown, Grant W

    2018-01-01

    Profiling of DNA replication during progression through S phase allows a quantitative snap-shot of replication origin usage and DNA replication fork progression. We present a method for using deep sequencing data to profile DNA replication in S. cerevisiae.

  3. Towards Point-of-Care Diagnosis of Pulmonary Tuberculosis and Drug Susceptibility Testing by Whole Genome Sequencing of DNA Isolated from Sputum

    Directory of Open Access Journals (Sweden)

    Kayzad S. Nilgiriwala

    2017-12-01

    Full Text Available Preliminary screening of pulmonary tuberculosis (TB in India still relies on sputum microscopy, which has low sensitivity leading to high rate of false negatives. Moreover, conventional phenotypic drug susceptibility testing (DST is conducted over a period of weeks leading to delays in correct treatment. Next generation sequencing technologies (Illumina and Oxford Nanopore have made it possible to sequence miniscule amount of DNA and generate enough data within a day for detecting specific microbes and their DST profile. Sputum samples from two pulmonary TB patients were processed by decontamination and DNA was isolated from the decontaminated sputum sediments. The isolated DNA was used for sequencing by Illumina and by MinION (Oxford Nanopore Technologies. The sequence data was used to diagnose TB and to determine the DST profiles for the first- and second-line drugs by Mykrobe Predictor. Validation was conducted by sequencing DNA (by Illumina isolated from pure growth culture from both the samples individually. DNA sequencing data (for both, Ilumina and MinION from one of the sputum samples indicated the presence of Mycobacterium tuberculosis (M. tb resistant to streptomycin, isoniazid, rifampicin and ethambutol and its lineage was predicted to be Beijing East Asia. The second sample indicated the presence of M. tb sensitive to the first- and second-line drugs by MinION and showed minor resistance call only to rifampicin by Illumina. Lineage of the second sample was predicted to be East Africa Indian Ocean, whereas Illumina data indicated it to be Delhi Central Asia. The two samples were correctly diagnosed for the presence of M. tb in the sputum DNA. Their DST profiles and lineage were also successfully determined from both the sequencing platforms (with minor discrepancies paving the way towards diagnosis and DST of TB from DNA isolated from sputum samples at point-of-care. Nanopore sequencing currently requires skilled personnel for DNA

  4. DNA Polymerases Drive DNA Sequencing-by-Synthesis Technologies: Both Past and Present

    Directory of Open Access Journals (Sweden)

    Cheng-Yao eChen

    2014-06-01

    Full Text Available Next-generation sequencing (NGS technologies have revolutionized modern biological and biomedical research. The engines responsible for this innovation are DNA polymerases; they catalyze the biochemical reaction for deriving template sequence information. In fact, DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. E. coli DNA polymerase I proteolytic (Klenow fragment was originally utilized in Sanger's dideoxy chain terminating DNA sequencing chemistry. From these humble beginnings followed an explosion of organism-specific, genome sequence information accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic bacteria/archaea were modified and tested in today's standard capillary electrophoresis (CE and NGS sequencing platforms. These enzymes were selected for their efficient incorporation of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage ⱷ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphate-labeled nucleoside polyphosphates. Furthermore, ⱷ29 enzyme has also been utilized in emerging DNA sequencing technologies including nanopore-, and protein-transistor-based sequencing. DNA polymerase is, and will continue to be, a crucial component of sequencing technologies.

  5. Snake Genome Sequencing: Results and Future Prospects.

    Science.gov (United States)

    Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

    2016-12-01

    Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  6. Snake Genome Sequencing: Results and Future Prospects

    Directory of Open Access Journals (Sweden)

    Harald M. I. Kerkkamp

    2016-12-01

    Full Text Available Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  7. Whole-genome methylation caller designed for methyl- DNA ...

    African Journals Online (AJOL)

    etchie

    2013-02-20

    Feb 20, 2013 ... Key words: Methyl-DNA immunoprecipitation, next-generation sequencing, Hidden ... its response to environmental cues. .... have a great potential to become the most cost-effective ... hg18 reference genome (set to 0 if not present in retrieved reads). ..... DNA methylation patterns and epigenetic memory.

  8. Genomic diversity of Mycobacterium tuberculosis Beijing strains isolated in Tuscany, Italy, based on large sequence deletions, SNPs in putative DNA repair genes and MIRU-VNTR polymorphisms.

    Science.gov (United States)

    Garzelli, Carlo; Lari, Nicoletta; Rindi, Laura

    2016-03-01

    The Beijing genotype of Mycobacterium tuberculosis is cause of global concern as it is rapidly spreading worldwide, is considered hypervirulent, and is most often associated to massive spread of MDR/XDR TB, although these epidemiological or pathological properties have not been confirmed for all strains and in all geographic settings. In this paper, to gain new insights into the biogeographical heterogeneity of the Beijing family, we investigated a global sample of Beijing strains (22% from Italian-born, 78% from foreign-born patients) by determining large sequence polymorphism of regions RD105, RD181, RD150 and RD142, single nucleotide polymorphism of putative DNA repair genes mutT4 and mutT2 and MIRU-VNTR profiles based on 11 discriminative loci. We found that, although our sample of Beijing strains showed a considerable genomic heterogeneity, yielding both ancient and recent phylogenetic strains, the prevalent successful Beijing subsets were characterized by deletions of RD105 and RD181 and by one nucleotide substitution in one or both mutT genes. MIRU-VNTR analysis revealed 47 unique patterns and 9 clusters including a total of 33 isolates (41% of total isolates); the relatively high proportion of Italian-born Beijing TB patients, often occurring in mixed clusters, supports the possibility of an ongoing cross-transmission of the Beijing genotype to autochthonous population. High rates of extra-pulmonary localization and drug-resistance, particularly MDR, frequently reported for Beijing strains in other settings, were not observed in our survey. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. [An intriguing model for 5S rDNA sequences dispersion in the genome of freshwater stingray Potamotrygon motoro (Chondrichthyes: Potamotrygonidae)].

    Science.gov (United States)

    Cruz, V P; Oliveira, C; Foresti, F

    2015-01-01

    5S rDNA genes of the stingray Potamotrygon motoro were PCR replicated, purified, cloned and sequenced. Two distinct classes of segments of different sizes were obtained. The smallest, with 342 bp units, was classified as class I, and the largest, with 1900 bp units, was designated as class II. Alignment with the consensus sequences for both classes showed changes in a few bases in the 5S rDNA genes. TATA-like sequences were detected in the nontranscribed spacer (NTS) regions of class I and a microsatellite (GCT) 10 sequence was detected in the NTS region of class II. The results obtained can help to understand the molecular organization of ribosomal genes and the mechanism of gene dispersion.

  10. Poincaré recurrences of DNA sequences

    Science.gov (United States)

    Frahm, K. M.; Shepelyansky, D. L.

    2012-01-01

    We analyze the statistical properties of Poincaré recurrences of Homo sapiens, mammalian, and other DNA sequences taken from the Ensembl Genome data base with up to 15 billion base pairs. We show that the probability of Poincaré recurrences decays in an algebraic way with the Poincaré exponent β≈4 even if the oscillatory dependence is well pronounced. The correlations between recurrences decay with an exponent ν≈0.6 that leads to an anomalous superdiffusive walk. However, for Homo sapiens sequences, with the largest available statistics, the diffusion coefficient converges to a finite value on distances larger than one million base pairs. We argue that the approach based on Poncaré recurrences determines new proximity features between different species and sheds a new light on their evolution history.

  11. An integrated semiconductor device enabling non-optical genome sequencing.

    Science.gov (United States)

    Rothberg, Jonathan M; Hinz, Wolfgang; Rearick, Todd M; Schultz, Jonathan; Mileski, William; Davey, Mel; Leamon, John H; Johnson, Kim; Milgrew, Mark J; Edwards, Matthew; Hoon, Jeremy; Simons, Jan F; Marran, David; Myers, Jason W; Davidson, John F; Branting, Annika; Nobile, John R; Puc, Bernard P; Light, David; Clark, Travis A; Huber, Martin; Branciforte, Jeffrey T; Stoner, Isaac B; Cawley, Simon E; Lyons, Michael; Fu, Yutao; Homer, Nils; Sedova, Marina; Miao, Xin; Reed, Brian; Sabina, Jeffrey; Feierstein, Erika; Schorn, Michelle; Alanjary, Mohammad; Dimalanta, Eileen; Dressman, Devin; Kasinskas, Rachel; Sokolsky, Tanya; Fidanza, Jacqueline A; Namsaraev, Eugeni; McKernan, Kevin J; Williams, Alan; Roth, G Thomas; Bustillo, James

    2011-07-20

    The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing the ions produced by template-directed DNA polymerase synthesis using all-natural nucleotides on this massively parallel semiconductor-sensing device or ion chip. The ion chip contains ion-sensitive, field-effect transistor-based sensors in perfect register with 1.2 million wells, which provide confinement and allow parallel, simultaneous detection of independent sequencing reactions. Use of the most widely used technology for constructing integrated circuits, the complementary metal-oxide semiconductor (CMOS) process, allows for low-cost, large-scale production and scaling of the device to higher densities and larger array sizes. We show the performance of the system by sequencing three bacterial genomes, its robustness and scalability by producing ion chips with up to 10 times as many sensors and sequencing a human genome.

  12. Complete mitochondrial DNA sequences of the threadfin cichlid (Petrochromis trewavasae and the blunthead cichlid (Tropheus moorii and patterns of mitochondrial genome evolution in cichlid fishes.

    Directory of Open Access Journals (Sweden)

    Christoph Fischer

    Full Text Available The cichlid fishes of the East African Great Lakes represent a model especially suited to study adaptive radiation and speciation. With several African cichlid genome projects being in progress, a promising set of closely related genomes is emerging, which is expected to serve as a valuable data base to solve questions on genotype-phenotype relations. The mitochondrial (mt genomes presented here are the first results of the assembly and annotation process for two closely related but eco-morphologically highly distinct Lake Tanganyika cichlids, Petrochromis trewavasae and Tropheus moorii. The genomic sequences comprise 16,588 bp (P. trewavasae and 16,590 bp (T. moorii, and exhibit the typical mitochondrial structure, with 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, and a non-coding control region. Analyses confirmed that the two species are very closely related with an overall sequence similarity of 96%. We analyzed the newly generated sequences in the phylogenetic context of 21 published labroid fish mitochondrial genomes. Consistent with other vertebrates, the D-loop region was found to evolve faster than protein-coding genes, which in turn are followed by the rRNAs; the tRNAs vary greatly in the rate of sequence evolution, but on average evolve the slowest. Within the group of coding genes, ND6 evolves most rapidly. Codon usage is similar among examined cichlid tribes and labroid families; although a slight shift in usage patterns down the gene tree could be observed. Despite having a clearly different nucleotide composition, ND6 showed a similar codon usage. C-terminal ends of Cox1 exhibit variations, where the varying number of amino acids is related to the structure of the obtained phylogenetic tree. This variation may be of functional relevance for Cox1 synthesis.

  13. Using nanopore sequencing to get complete genomes from complex samples

    DEFF Research Database (Denmark)

    Kirkegaard, Rasmus Hansen; Karst, Søren Michael; Nielsen, Per Halkjær

    The advantages of “next generation sequencing” has come at the cost of genome finishing. The dominant sequencing technology provides short reads of 150-300 bp, which has made genome assembly very difficult as the reads do not span important repeat regions. Genomes have thus been added...... to the databases as fragmented assemblies and not as finished contigs that resemble the chromosomes in which the DNA is organised within the cells. This is especially troublesome for genomes derived from complex metagenome sequencing. Databases with incomplete genomes can lead to false conclusions about...... the absence of genes and functional predictions of the organisms. Furthermore, it is common that repetitive elements and marker genes such as the 16S rRNA gene are missing completely from these genome bins. Using nanopore long reads, we demonstrate that it is possible to span these regions and make complete...

  14. Perspectives of Integrative Cancer Genomics in Next Generation Sequencing Era

    Directory of Open Access Journals (Sweden)

    So Mee Kwon

    2012-06-01

    Full Text Available The explosive development of genomics technologies including microarrays and next generation sequencing (NGS has provided comprehensive maps of cancer genomes, including the expression of mRNAs and microRNAs, DNA copy numbers, sequence variations, and epigenetic changes. These genome-wide profiles of the genetic aberrations could reveal the candidates for diagnostic and/or prognostic biomarkers as well as mechanistic insights into tumor development and progression. Recent efforts to establish the huge cancer genome compendium and integrative omics analyses, so-called "integromics", have extended our understanding on the cancer genome, showing its daunting complexity and heterogeneity. However, the challenges of the structured integration, sharing, and interpretation of the big omics data still remain to be resolved. Here, we review several issues raised in cancer omics data analysis, including NGS, focusing particularly on the study design and analysis strategies. This might be helpful to understand the current trends and strategies of the rapidly evolving cancer genomics research.

  15. High-Throughput Block Optical DNA Sequence Identification.

    Science.gov (United States)

    Sagar, Dodderi Manjunatha; Korshoj, Lee Erik; Hanson, Katrina Bethany; Chowdhury, Partha Pratim; Otoupal, Peter Britton; Chatterjee, Anushree; Nagpal, Prashant

    2018-01-01

    Optical techniques for molecular diagnostics or DNA sequencing generally rely on small molecule fluorescent labels, which utilize light with a wavelength of several hundred nanometers for detection. Developing a label-free optical DNA sequencing technique will require nanoscale focusing of light, a high-throughput and multiplexed identification method, and a data compression technique to rapidly identify sequences and analyze genomic heterogeneity for big datasets. Such a method should identify characteristic molecular vibrations using optical spectroscopy, especially in the "fingerprinting region" from ≈400-1400 cm -1 . Here, surface-enhanced Raman spectroscopy is used to demonstrate label-free identification of DNA nucleobases with multiplexed 3D plasmonic nanofocusing. While nanometer-scale mode volumes prevent identification of single nucleobases within a DNA sequence, the block optical technique can identify A, T, G, and C content in DNA k-mers. The content of each nucleotide in a DNA block can be a unique and high-throughput method for identifying sequences, genes, and other biomarkers as an alternative to single-letter sequencing. Additionally, coupling two complementary vibrational spectroscopy techniques (infrared and Raman) can improve block characterization. These results pave the way for developing a novel, high-throughput block optical sequencing method with lossy genomic data compression using k-mer identification from multiplexed optical data acquisition. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Genome-wide mapping of DNA strand breaks.

    Directory of Open Access Journals (Sweden)

    Frédéric Leduc

    Full Text Available Determination of cellular DNA damage has so far been limited to global assessment of genome integrity whereas nucleotide-level mapping has been restricted to specific loci by the use of specific primers. Therefore, only limited DNA sequences can be studied and novel regions of genomic instability can hardly be discovered. Using a well-characterized yeast model, we describe a straightforward strategy to map genome-wide DNA strand breaks without compromising nucleotide-level resolution. This technique, termed "damaged DNA immunoprecipitation" (dDIP, uses immunoprecipitation and the terminal deoxynucleotidyl transferase-mediated dUTP-biotin end-labeling (TUNEL to capture DNA at break sites. When used in combination with microarray or next-generation sequencing technologies, dDIP will allow researchers to map genome-wide DNA strand breaks as well as other types of DNA damage and to establish a clear profiling of altered genes and/or intergenic sequences in various experimental conditions. This mapping technique could find several applications for instance in the study of aging, genotoxic drug screening, cancer, meiosis, radiation and oxidative DNA damage.

  17. DNA Sequencing in Cultural Heritage.

    Science.gov (United States)

    Vai, Stefania; Lari, Martina; Caramelli, David

    2016-02-01

    During the last three decades, DNA analysis on degraded samples revealed itself as an important research tool in anthropology, archaeozoology, molecular evolution, and population genetics. Application on topics such as determination of species origin of prehistoric and historic objects, individual identification of famous personalities, characterization of particular samples important for historical, archeological, or evolutionary reconstructions, confers to the paleogenetics an important role also for the enhancement of cultural heritage. A really fast improvement in methodologies in recent years led to a revolution that permitted recovering even complete genomes from highly degraded samples with the possibility to go back in time 400,000 years for samples from temperate regions and 700,000 years for permafrozen remains and to analyze even more recent material that has been subjected to hard biochemical treatments. Here we propose a review on the different methodological approaches used so far for the molecular analysis of degraded samples and their application on some case studies.

  18. Targeted DNA Methylation Analysis by High Throughput Sequencing in Porcine Peri-attachment Embryos

    OpenAIRE

    MORRILL, Benson H.; COX, Lindsay; WARD, Anika; HEYWOOD, Sierra; PRATHER, Randall S.; ISOM, S. Clay

    2013-01-01

    Abstract The purpose of this experiment was to implement and evaluate the effectiveness of a next-generation sequencing-based method for DNA methylation analysis in porcine embryonic samples. Fourteen discrete genomic regions were amplified by PCR using bisulfite-converted genomic DNA derived from day 14 in vivo-derived (IVV) and parthenogenetic (PA) porcine embryos as template DNA. Resulting PCR products were subjected to high-throughput sequencing using the Illumina Genome Analyzer IIx plat...

  19. Advanced Whole-Genome Sequencing and Analysis of Fetal Genomes from Amniotic Fluid.

    Science.gov (United States)

    Mao, Qing; Chin, Robert; Xie, Weiwei; Deng, Yuqing; Zhang, Wenwei; Xu, Huixin; Zhang, Rebecca Yu; Shi, Quan; Peters, Erin E; Gulbahce, Natali; Li, Zhenyu; Chen, Fang; Drmanac, Radoje; Peters, Brock A

    2018-04-01

    Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 ( CHD8 ) and LDL receptor-related protein 1 ( LRP1 ), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures. © 2018 American Association for Clinical Chemistry.

  20. Aspects of coverage in medical DNA sequencing

    Directory of Open Access Journals (Sweden)

    Wilson Richard K

    2008-05-01

    Full Text Available Abstract Background DNA sequencing is now emerging as an important component in biomedical studies of diseases like cancer. Short-read, highly parallel sequencing instruments are expected to be used heavily for such projects, but many design specifications have yet to be conclusively established. Perhaps the most fundamental of these is the redundancy required to detect sequence variations, which bears directly upon genomic coverage and the consequent resolving power for discerning somatic mutations. Results We address the medical sequencing coverage problem via an extension of the standard mathematical theory of haploid coverage. The expected diploid multi-fold coverage, as well as its generalization for aneuploidy are derived and these expressions can be readily evaluated for any project. The resulting theory is used as a scaling law to calibrate performance to that of standard BAC sequencing at 8× to 10× redundancy, i.e. for expected coverages that exceed 99% of the unique sequence. A differential strategy is formalized for tumor/normal studies wherein tumor samples are sequenced more deeply than normal ones. In particular, both tumor alleles should be detected at least twice, while both normal alleles are detected at least once. Our theory predicts these requirements can be met for tumor and normal redundancies of approximately 26× and 21×, respectively. We explain why these values do not differ by a factor of 2, as might intuitively be expected. Future technology developments should prompt even deeper sequencing of tumors, but the 21× value for normal samples is essentially a constant. Conclusion Given the assumptions of standard coverage theory, our model gives pragmatic estimates for required redundancy. The differential strategy should be an efficient means of identifying potential somatic mutations for further study.

  1. Understanding human DNA sequence variation.

    Science.gov (United States)

    Kidd, K K; Pakstis, A J; Speed, W C; Kidd, J R

    2004-01-01

    Over the past century researchers have identified normal genetic variation and studied that variation in diverse human populations to determine the amounts and distributions of that variation. That information is being used to develop an understanding of the demographic histories of the different populations and the species as a whole, among other studies. With the advent of DNA-based markers in the last quarter century, these studies have accelerated. One of the challenges for the next century is to understand that variation. One component of that understanding will be population genetics. We present here examples of many of the ways these new data can be analyzed from a population perspective using results from our laboratory on multiple individual DNA-based polymorphisms, many clustered in haplotypes, studied in multiple populations representing all major geographic regions of the world. These data support an "out of Africa" hypothesis for human dispersal around the world and begin to refine the understanding of population structures and genetic relationships. We are also developing baseline information against which we can compare findings at different loci to aid in the identification of loci subject, now and in the past, to selection (directional or balancing). We do not yet have a comprehensive understanding of the extensive variation in the human genome, but some of that understanding is coming from population genetics.

  2. Complete Genome Sequence of Pseudomonas aeruginosa Phage AAT-1.

    Science.gov (United States)

    Andrade-Domínguez, Andrés; Kolter, Roberto

    2016-08-25

    Aspects of the interaction between phages and animals are of interest and importance for medical applications. Here, we report the genome sequence of the lytic Pseudomonas phage AAT-1, isolated from mammalian serum. AAT-1 is a double-stranded DNA phage, with a genome of 57,599 bp, containing 76 predicted open reading frames. Copyright © 2016 Andrade-Domínguez and Kolter.

  3. Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus.

    OpenAIRE

    Sanford, J; Forrester, L; Chapman, V; Chandley, A; Hastie, N

    1984-01-01

    The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic...

  4. Genome sequence of Lactobacillus rhamnosus ATCC 8530.

    Science.gov (United States)

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R; Ziola, Barry

    2012-02-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  5. Genome Sequence of Lactobacillus rhamnosus ATCC 8530

    OpenAIRE

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R.; Ziola, Barry

    2012-01-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  6. Value of a newly sequenced bacterial genome

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Aburjaile, Flavia F; Ramos, Rommel Tj

    2014-01-01

    and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses...... heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting...

  7. Genomic sequencing and in vivo footprinting of an expression-specific DNase I-hypersensitive site of avian vitellogenin II promoter reveal a demethylation of a mCpG and a change in specific interactions of proteins with DNA.

    Science.gov (United States)

    Saluz, H P; Feavers, I M; Jiricny, J; Jost, J P

    1988-01-01

    Genomic sequencing was used to study the in vivo methylation pattern of two CpG sites in the promoter region of the avian vitellogenin gene. The CpG at position +10 was fully methylated in DNA isolated from tissues that do not express the gene but was unmethylated in the liver of mature hens and estradiol-treated roosters. In the latter tissue, this site became demethylated and DNase I hypersensitive after estradiol treatment. A second CpG (position -52) was unmethylated in all tissues examined. In vivo genomic footprinting with dimethyl sulfate revealed different patterns of DNA protection in silent and expressed genes. In rooster liver cells, at least 10 base pairs of DNA, including the methylated CpG, were protected by protein(s). Gel-shift assays indicated that a protein factor, present in rooster liver nuclear extract, bound at this site only when it was methylated. In hen liver cells, the same unmethylated CpG lies within a protected region of approximately equal to 20 base pairs. In vitro DNase I protection and gel-shift assays indicate that this sequence is bound by a protein, which binds both double- and single-stranded DNA. For the latter substrate, this factor was shown to bind solely the noncoding (i.e., mRNA-like) strand. Images PMID:3413118

  8. DNA fingerprinting, DNA barcoding, and next generation sequencing technology in plants.

    Science.gov (United States)

    Sucher, Nikolaus J; Hennell, James R; Carles, Maria C

    2012-01-01

    DNA fingerprinting of plants has become an invaluable tool in forensic, scientific, and industrial laboratories all over the world. PCR has become part of virtually every variation of the plethora of approaches used for DNA fingerprinting today. DNA sequencing is increasingly used either in combination with or as a replacement for traditional DNA fingerprinting techniques. A prime example is the use of short, standardized regions of the genome as taxon barcodes for biological identification of plants. Rapid advances in "next generation sequencing" (NGS) technology are driving down the cost of sequencing and bringing large-scale sequencing projects into the reach of individual investigators. We present an overview of recent publications that demonstrate the use of "NGS" technology for DNA fingerprinting and DNA barcoding applications.

  9. Sequence periodicity in nucleosomal DNA and intrinsic curvature.

    Science.gov (United States)

    Nair, T Murlidharan

    2010-05-17

    Most eukaryotic DNA contained in the nucleus is packaged by wrapping DNA around histone octamers. Histones are ubiquitous and bind most regions of chromosomal DNA. In order to achieve smooth wrapping of the DNA around the histone octamer, the DNA duplex should be able to deform and should possess intrinsic curvature. The deformability of DNA is a result of the non-parallelness of base pair stacks. The stacking interaction between base pairs is sequence dependent. The higher the stacking energy the more rigid the DNA helix, thus it is natural to expect that sequences that are involved in wrapping around the histone octamer should be unstacked and possess intrinsic curvature. Intrinsic curvature has been shown to be dictated by the periodic recurrence of certain dinucleotides. Several genome-wide studies directed towards mapping of nucleosome positions have revealed periodicity associated with certain stretches of sequences. In the current study, these sequences have been analyzed with a view to understand their sequence-dependent structures. Higher order DNA structures and the distribution of molecular bend loci associated with 146 base nucleosome core DNA sequence from C. elegans and chicken have been analyzed using the theoretical model for DNA curvature. The curvature dispersion calculated by cyclically permuting the sequences revealed that the molecular bend loci were delocalized throughout the nucleosome core region and had varying degrees of intrinsic curvature. The higher order structures associated with nucleosomes of C.elegans and chicken calculated from the sequences revealed heterogeneity with respect to the deviation of the DNA axis. The results points to the possibility of context dependent curvature of varying degrees to be associated with nucleosomal DNA.

  10. PREDICTION OF CHROMATIN STATES USING DNA SEQUENCE PROPERTIES

    KAUST Repository

    Bahabri, Rihab R.

    2013-06-01

    Activities of DNA are to a great extent controlled epigenetically through the internal struc- ture of chromatin. This structure is dynamic and is influenced by different modifications of histone proteins. Various combinations of epigenetic modification of histones pinpoint to different functional regions of the DNA determining the so-called chromatin states. How- ever, the characterization of chromatin states by the DNA sequence properties remains largely unknown. In this study we aim to explore whether DNA sequence patterns in the human genome can characterize different chromatin states. Using DNA sequence motifs we built binary classifiers for each chromatic state to eval- uate whether a given genomic sequence is a good candidate for belonging to a particular chromatin state. Of four classification algorithms (C4.5, Naive Bayes, Random Forest, and SVM) used for this purpose, the decision tree based classifiers (C4.5 and Random Forest) yielded best results among those we evaluated. Our results suggest that in general these models lack sufficient predictive power, although for four chromatin states (insulators, het- erochromatin, and two types of copy number variation) we found that presence of certain motifs in DNA sequences does imply an increased probability that such a sequence is one of these chromatin states.

  11. The diploid genome sequence of an individual human.

    Directory of Open Access Journals (Sweden)

    Samuel Levy

    2007-09-01

    Full Text Available Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel included 3,213,401 single nucleotide polymorphisms (SNPs, 53,823 block substitutions (2-206 bp, 292,102 heterozygous insertion/deletion events (indels(1-571 bp, 559,473 homozygous indels (1-82,711 bp, 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

  12. Human Genome Sequencing in Health and Disease

    Science.gov (United States)

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  13. Intra-species sequence comparisons for annotating genomes

    Energy Technology Data Exchange (ETDEWEB)

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  14. Getting complete genomes from complex samples using nanopore sequencing

    DEFF Research Database (Denmark)

    Kirkegaard, Rasmus Hansen; Karst, Søren Michael; Albertsen, Mads

    Short read sequencing and metagenomic binning workflows have made it possible to extract bacterial genome bins from environmental microbial samples containing hundreds to thousands of different species. However, these genome bins often do not represent complete genomes, as they are mostly...... fragmented, incomplete and often contaminated with foreign DNA and with no robust strategies to validate the quality. The value of these `draft genomes` have limited, lasting value to the scientific community, as gene synteny is broken and the uncertainty of what is missing. The genetic material most often...... missed is important multi-copy and/or conserved marker genes such as the 16S rRNA gene, as sequence micro-heterogeneity prevents assembly of these genes in the de novo assembly. We demonstrate that using nanopore long reads it is now possible to overcome these issues and make complete genomes from...

  15. The complete chloroplast genome sequence of Abies nephrolepis (Pinaceae: Abietoideae

    Directory of Open Access Journals (Sweden)

    Dong-Keun Yi

    2016-06-01

    Full Text Available The plant chloroplast (cp genome has maintained a relatively conserved structure and gene content throughout evolution. Cp genome sequences have been used widely for resolving evolutionary and phylogenetic issues at various taxonomic levels of plants. Here, we report the complete cp genome of Abies nephrolepis. The A. nephrolepis cp genome is 121,336 base pairs (bp in length including a pair of short inverted repeat regions (IRa and IRb of 139 bp each separated by a small single copy (SSC region of 54,323 bp (SSC and a large single copy region of 66,735 bp (LSC. It contains 114 genes, 68 of which are protein coding genes, 35 tRNA and four rRNA genes, six open reading frames, and one pseudogene. Seventeen repeat units and 64 simple sequence repeats (SSR have been detected in A. nephrolepis cp genome. Large IR sequences locate in 42-kb inversion points (1186 bp. The A. nephrolepis cp genome is identical to Abies koreana’s which is closely related to taxa. Pairwise comparison between two cp genomes revealed 140 polymorphic sites in each. Complete cp genome sequence of A. nephrolepis has a significant potential to provide information on the evolutionary pattern of Abietoideae and valuable data for development of DNA markers for easy identification and classification.

  16. Draft genome sequence of the silver pomfret fish, Pampus argenteus.

    Science.gov (United States)

    AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

    2016-01-01

    Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.

  17. Genome sequence of the olive tree, Olea europaea.

    Science.gov (United States)

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  18. Genome Sequencing and Analysis Conference IV

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  19. The Release 6 reference sequence of the Drosophila melanogaster genome.

    Science.gov (United States)

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. © 2015 Hoskins et al.; Published by Cold Spring Harbor Laboratory Press.

  20. Genomic analysis of murine DNA-dependent protein kinase

    International Nuclear Information System (INIS)

    Fujimori, A.; Abe, M.

    2003-01-01

    Full text: The gene of catalytic subunit of DNA dependent protein kinase is responsible gene for SCID mice. The molecules play a critical role in non-homologous end joining including the V(D)J recombination. Contribution of the molecules to the difference of radiosensitivity and the susceptibility to cancer has been suggested. Here we show the entire nucleotide sequence of approximately 193 kbp and 84 kbp genomic regions encoding the entire DNA-PKcs gene in the mouse and chicken respectively. Retroposon was found in the intron 51 of mouse genomic DNA-PKcs gene but in human and chicken. Comparative analysis of these two species strongly suggested that only two genes, DNA-PKcs and MCM4, exist in the region of both species. Several conserved sequences and cis elements, however, were predicted. Recently, the orthologous region for the human DNA-PKcs locus was completed. The results of further comparative study will be discussed

  1. Complete sequence of the mitochondrial genome of ...

    Indian Academy of Sciences (India)

    products were purified using the DNA Gel Extraction Kit. (Tiangen, Shanghai, China). The purified products obtained ..... Base composition of O. rubicundus mitochondrial genome. .... the help of fish sampled and identified by morphology.

  2. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  3. Detecting differential DNA methylation from sequencing of bisulfite converted DNA of diverse species.

    Science.gov (United States)

    Huh, Iksoo; Wu, Xin; Park, Taesung; Yi, Soojin V

    2017-07-21

    DNA methylation is one of the most extensively studied epigenetic modifications of genomic DNA. In recent years, sequencing of bisulfite-converted DNA, particularly via next-generation sequencing technologies, has become a widely popular method to study DNA methylation. This method can be readily applied to a variety of species, dramatically expanding the scope of DNA methylation studies beyond the traditionally studied human and mouse systems. In parallel to the increasing wealth of genomic methylation profiles, many statistical tools have been developed to detect differentially methylated loci (DMLs) or differentially methylated regions (DMRs) between biological conditions. We discuss and summarize several key properties of currently available tools to detect DMLs and DMRs from sequencing of bisulfite-converted DNA. However, the majority of the statistical tools developed for DML/DMR analyses have been validated using only mammalian data sets, and less priority has been placed on the analyses of invertebrate or plant DNA methylation data. We demonstrate that genomic methylation profiles of non-mammalian species are often highly distinct from those of mammalian species using examples of honey bees and humans. We then discuss how such differences in data properties may affect statistical analyses. Based on these differences, we provide three specific recommendations to improve the power and accuracy of DML and DMR analyses of invertebrate data when using currently available statistical tools. These considerations should facilitate systematic and robust analyses of DNA methylation from diverse species, thus advancing our understanding of DNA methylation. © The Author 2017. Published by Oxford University Press.

  4. Methylation patterns of repetitive DNA sequences in germ cells of Mus musculus.

    Science.gov (United States)

    Sanford, J; Forrester, L; Chapman, V; Chandley, A; Hastie, N

    1984-03-26

    The major and the minor satellite sequences of Mus musculus were undermethylated in both sperm and oocyte DNAs relative to the amount of undermethylation observed in adult somatic tissue DNA. This hypomethylation was specific for satellite sequences in sperm DNA. Dispersed repetitive and low copy sequences show a high degree of methylation in sperm DNA; however, a dispersed repetitive sequence was undermethylated in oocyte DNA. This finding suggests a difference in the amount of total genomic DNA methylation between sperm and oocyte DNA. The methylation levels of the minor satellite sequences did not change during spermiogenesis, and were not associated with the onset of meiosis or a specific stage in sperm development.

  5. Building a model: developing genomic resources for common milkweed (Asclepias syriaca with low coverage genome sequencing

    Directory of Open Access Journals (Sweden)

    Weitemier Kevin

    2011-05-01

    Full Text Available Abstract Background Milkweeds (Asclepias L. have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L. could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp and 5S rDNA (120 bp sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp, with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae unigenes (median coverage of 0.29× and 66% of single copy orthologs (COSII in asterids (median coverage of 0.14×. From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites and phylogenetics (low-copy nuclear genes studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species

  6. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

    Science.gov (United States)

    Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron

    2011-05-04

    Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first

  7. Image correlation method for DNA sequence alignment.

    Science.gov (United States)

    Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván

    2012-01-01

    The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.

  8. "First generation" automated DNA sequencing technology.

    Science.gov (United States)

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  9. Simple sequence repeats in mycobacterial genomes

    Indian Academy of Sciences (India)

    2006-12-18

    Dec 18, 2006 ... Although prokaryotic genomes derive some plasticity due to microsatellite mutations they have in-built mechanisms to arrest undue expansions of microsatellites and one such mechanism is constituted by post-replicative DNA repair enzymes MutL, MutH and MutS. The mycobacterial genomes lack these ...

  10. USE OF COMPETITIVE DNA HYBRIDIZATION TO IDENTIFY DIFFERENCES IN THE GENOMES OF TWO CLOSELY RELATED FECAL INDICATOR BACTERIA

    Science.gov (United States)

    Although recent technological advances in DNA sequencing and computational biology now allow scientists to compare entire microbial genomes, comparisons of closely related bacterial species and individual isolates by whole-genome sequencing approaches remains prohibitively expens...

  11. Mapping copy number variation by population-scale genome sequencing

    DEFF Research Database (Denmark)

    Mills, Ryan E.; Walter, Klaudia; Stewart, Chip

    2011-01-01

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is......, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications...

  12. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  13. The proviral genome of radiation leukemia virus: Molecular cloning, nucleotide sequence of its long terminal repeat and integration in lymphoma cell DNA

    International Nuclear Information System (INIS)

    Janowski, M.; Merregaert, J.; Boniver, J.; Maisin, J.R.

    1985-01-01

    The proviral genome of a thymotropic and leukemogenic C57BL/Ka mouse retrovirus, RadLV/VL/sub 3/(T+L+), was cloned as a biologically active PstI insert in the bacterial plasmid pBR322. Its restriction map was compared to those, already known, of two nonthymotropic and nonleukemogenic viruses of the same mouse strain, the ecotropic BL/Ka(B) and the xenotropic constituent of the radiation leukemia virus complex (RadLV). Differences were observed in the pol gene and in the env gene. Moreover, the nucleotide sequence of the RadLV/VL/sub 3/(T+L+) long terminal repeat revealed the existence of two copies of a 42 bp long sequence, separated by 11 nucleotides and of which BL/Ka(B) possesses only one copy

  14. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, nois...... patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer....

  15. Complete Genome Sequence of EtG, the First Phage Sequenced from Erwinia tracheiphila.

    Science.gov (United States)

    Andrade-Domínguez, Andrés; Kolter, Roberto; Shapiro, Lori R

    2018-02-22

    Erwinia tracheiphila is the causal agent of bacterial wilt of cucurbits. Here, we report the genome sequence of the temperate phage EtG, which was isolated from an E. tracheiphila -infected cucumber plant. Phage EtG has a linear 30,413-bp double-stranded DNA genome with cohesive ends and 45 predicted open reading frames. Copyright © 2018 Andrade-Domínguez et al.

  16. The DNA-encoded nucleosome organization of a eukaryotic genome.

    Science.gov (United States)

    Kaplan, Noam; Moore, Irene K; Fondufe-Mittendorf, Yvonne; Gossett, Andrea J; Tillo, Desiree; Field, Yair; LeProust, Emily M; Hughes, Timothy R; Lieb, Jason D; Widom, Jonathan; Segal, Eran

    2009-03-19

    Nucleosome organization is critical for gene regulation. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers, competition with site-specific DNA-binding proteins, and the DNA sequence preferences of the nucleosomes themselves. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for approximately 40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.

  17. Methyl-Analyzer--whole genome DNA methylation profiling.

    Science.gov (United States)

    Xin, Yurong; Ge, Yongchao; Haghighi, Fatemeh G

    2011-08-15

    Methyl-Analyzer is a python package that analyzes genome-wide DNA methylation data produced by the Methyl-MAPS (methylation mapping analysis by paired-end sequencing) method. Methyl-MAPS is an enzymatic-based method that uses both methylation-sensitive and -dependent enzymes covering >80% of CpG dinucleotides within mammalian genomes. It combines enzymatic-based approaches with high-throughput next-generation sequencing technology to provide whole genome DNA methylation profiles. Methyl-Analyzer processes and integrates sequencing reads from methylated and unmethylated compartments and estimates CpG methylation probabilities at single base resolution. Methyl-Analyzer is available at http://github.com/epigenomics/methylmaps. Sample dataset is available for download at http://epigenomicspub.columbia.edu/methylanalyzer_data.html. fgh3@columbia.edu Supplementary data are available at Bioinformatics online.

  18. DNA sequence polymorphisms within the bovine guanine nucleotide-binding protein Gs subunit alpha (Gsα-encoding (GNAS genomic imprinting domain are associated with performance traits

    Directory of Open Access Journals (Sweden)

    Mullen Michael P

    2011-01-01

    43101491 SNP. Following adjustment for multiple-testing, significant association (q ≤ 0.05 remained between the rs41694646 SNP and four traits (animal stature, body depth, direct calving difficulty and milk yield only. Notably, the single SNP in the bovine NESP55 gene (rs41694656 was associated (P ≤ 0.01 with somatic cell count--an often-cited indicator of resistance to mastitis and overall health status of the mammary system--and previous studies have demonstrated that the chromosomal region to where the GNAS domain maps underlies an important quantitative trait locus for this trait. This association, however, was not significant after adjustment for multiple testing. The three remaining SNPs assayed were not associated with any of the performance traits analysed in this study. Analysis of all pairwise linkage disequilibrium (r2 values suggests that most allele substitution effects for the assayed SNPs observed are independent. Finally, the polymorphic coding SNP in the putative bovine NESP55 gene was used to test the imprinting status of this gene across a range of foetal bovine tissues. Conclusions Previous studies in other mammalian species have shown that DNA sequence variation within the imprinted GNAS gene cluster contributes to several physiological and metabolic disorders, including obesity in humans and mice. Similarly, the results presented here indicate an important role for the imprinted GNAS cluster in underlying complex performance traits in cattle such as animal growth, calving, fertility and health. These findings suggest that GNAS domain-associated polymorphisms may serve as important genetic markers for future livestock breeding programs and support previous studies that candidate imprinted loci may act as molecular targets for the genetic improvement of agricultural populations. In addition, we present new evidence that the bovine NESP55 gene is epigenetically regulated as a maternally expressed imprinted gene in placental and intestinal tissues

  19. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  20. Dialects of the DNA uptake sequence in Neisseriaceae.

    Directory of Open Access Journals (Sweden)

    Stephan A Frye

    2013-04-01

    Full Text Available In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS, which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS-dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5'-CTG-3' is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS-dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic

  1. Dialects of the DNA Uptake Sequence in Neisseriaceae

    Science.gov (United States)

    Frye, Stephan A.; Nilsen, Mariann; Tønjum, Tone; Ambur, Ole Herman

    2013-01-01

    In all sexual organisms, adaptations exist that secure the safe reassortment of homologous alleles and prevent the intrusion of potentially hazardous alien DNA. Some bacteria engage in a simple form of sex known as transformation. In the human pathogen Neisseria meningitidis and in related bacterial species, transformation by exogenous DNA is regulated by the presence of a specific DNA Uptake Sequence (DUS), which is present in thousands of copies in the respective genomes. DUS affects transformation by limiting DNA uptake and recombination in favour of homologous DNA. The specific mechanisms of DUS–dependent genetic transformation have remained elusive. Bioinformatic analyses of family Neisseriaceae genomes reveal eight distinct variants of DUS. These variants are here termed DUS dialects, and their effect on interspecies commutation is demonstrated. Each of the DUS dialects is remarkably conserved within each species and is distributed consistent with a robust Neisseriaceae phylogeny based on core genome sequences. The impact of individual single nucleotide transversions in DUS on meningococcal transformation and on DNA binding and uptake is analysed. The results show that a DUS core 5′-CTG-3′ is required for transformation and that transversions in this core reduce DNA uptake more than two orders of magnitude although the level of DNA binding remains less affected. Distinct DUS dialects are efficient barriers to interspecies recombination in N. meningitidis, N. elongata, Kingella denitrificans, and Eikenella corrodens, despite the presence of the core sequence. The degree of similarity between the DUS dialect of the recipient species and the donor DNA directly correlates with the level of transformation and DNA binding and uptake. Finally, DUS–dependent transformation is documented in the genera Eikenella and Kingella for the first time. The results presented here advance our understanding of the function and evolution of DUS and genetic transformation

  2. The characterization of twenty sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Kimberly Pelak

    2010-09-01

    Full Text Available We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.

  3. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  4. Mapping Base Modifications in DNA by Transverse-Current Sequencing

    Science.gov (United States)

    Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

    2018-02-01

    Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.

  5. Genotypic Characterization of Bradyrhizobium Strains Nodulating Endemic Woody Legumes of the Canary Islands by PCR-Restriction Fragment Length Polymorphism Analysis of Genes Encoding 16S rRNA (16S rDNA) and 16S-23S rDNA Intergenic Spacers, Repetitive Extragenic Palindromic PCR Genomic Fingerprinting, and Partial 16S rDNA Sequencing

    Science.gov (United States)

    Vinuesa, Pablo; Rademaker, Jan L. W.; de Bruijn, Frans J.; Werner, Dietrich

    1998-01-01

    We present a phylogenetic analysis of nine strains of symbiotic nitrogen-fixing bacteria isolated from nodules of tagasaste (Chamaecytisus proliferus) and other endemic woody legumes of the Canary Islands, Spain. These and several reference strains were characterized genotypically at different levels of taxonomic resolution by computer-assisted analysis of 16S ribosomal DNA (rDNA) PCR-restriction fragment length polymorphisms (PCR-RFLPs), 16S-23S rDNA intergenic spacer (IGS) RFLPs, and repetitive extragenic palindromic PCR (rep-PCR) genomic fingerprints with BOX, ERIC, and REP primers. Cluster analysis of 16S rDNA restriction patterns with four tetrameric endonucleases grouped the Canarian isolates with the two reference strains, Bradyrhizobium japonicum USDA 110spc4 and Bradyrhizobium sp. strain (Centrosema) CIAT 3101, resolving three genotypes within these bradyrhizobia. In the analysis of IGS RFLPs with three enzymes, six groups were found, whereas rep-PCR fingerprinting revealed an even greater genotypic diversity, with only two of the Canarian strains having similar fingerprints. Furthermore, we show that IGS RFLPs and even very dissimilar rep-PCR fingerprints can be clustered into phylogenetically sound groupings by combining them with 16S rDNA RFLPs in computer-assisted cluster analysis of electrophoretic patterns. The DNA sequence analysis of a highly variable 264-bp segment of the 16S rRNA genes of these strains was found to be consistent with the fingerprint-based classification. Three different DNA sequences were obtained, one of which was not previously described, and all belonged to the B. japonicum/Rhodopseudomonas rDNA cluster. Nodulation assays revealed that none of the Canarian isolates nodulated Glycine max or Leucaena leucocephala, but all nodulated Acacia pendula, C. proliferus, Macroptilium atropurpureum, and Vigna unguiculata. PMID:9603820

  6. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  7. Draft genome sequence of the rubber tree Hevea brasiliensis

    Directory of Open Access Journals (Sweden)

    Rahman Ahmad Yamin Abdul

    2013-02-01

    Full Text Available Abstract Background Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR. NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. Results Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. Conclusions The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber.

  8. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  9. A complete mitochondrial genome sequence from a mesolithic wild aurochs (Bos primigenius).

    LENUS (Irish Health Repository)

    Edwards, Ceiridwen J

    2010-01-01

    BACKGROUND: The derivation of domestic cattle from the extinct wild aurochs (Bos primigenius) has been well-documented by archaeological and genetic studies. Genetic studies point towards the Neolithic Near East as the centre of origin for Bos taurus, with some lines of evidence suggesting possible, albeit rare, genetic contributions from locally domesticated wild aurochsen across Eurasia. Inferences from these investigations have been based largely on the analysis of partial mitochondrial DNA sequences generated from modern animals, with limited sequence data from ancient aurochsen samples. Recent developments in DNA sequencing technologies, however, are affording new opportunities for the examination of genetic material retrieved from extinct species, providing new insight into their evolutionary history. Here we present DNA sequence analysis of the first complete mitochondrial genome (16,338 base pairs) from an archaeologically-verified and exceptionally-well preserved aurochs bone sample. METHODOLOGY: DNA extracts were generated from an aurochs humerus bone sample recovered from a cave site located in Derbyshire, England and radiocarbon-dated to 6,738+\\/-68 calibrated years before present. These extracts were prepared for both Sanger and next generation DNA sequencing technologies (Illumina Genome Analyzer). In total, 289.9 megabases (22.48%) of the post-filtered DNA sequences generated using the Illumina Genome Analyzer from this sample mapped with confidence to the bovine genome. A consensus B. primigenius mitochondrial genome sequence was constructed and was analysed alongside all available complete bovine mitochondrial genome sequences. CONCLUSIONS: For all nucleotide positions where both Sanger and Illumina Genome Analyzer sequencing methods gave high-confidence calls, no discrepancies were observed. Sequence analysis reveals evidence of heteroplasmy in this sample and places this mitochondrial genome sequence securely within a previously identified

  10. Multiple Genome Sequences of Lactobacillus plantarum Strains

    OpenAIRE

    Kafka, Thomas A.; Geissler, Andreas J.; Vogel, Rudi F.

    2017-01-01

    ABSTRACT We report here the genome sequences of four Lactobacillus plantarum strains which vary in surface hydrophobicity. Bioinformatic analysis, using additional genomes of Lactobacillus plantarum strains, revealed a possible correlation between the cell wall teichoic acid-type and cell surface hydrophobicity and provide the basis for consecutive analyses.

  11. Complete Genome Sequence of Staphylococcus epidermidis 1457.

    Science.gov (United States)

    Galac, Madeline R; Stam, Jason; Maybank, Rosslyn; Hinkle, Mary; Mack, Dietrich; Rohde, Holger; Roth, Amanda L; Fey, Paul D

    2017-06-01

    Staphylococcus epidermidis 1457 is a frequently utilized strain that is amenable to genetic manipulation and has been widely used for biofilm-related research. We report here the whole-genome sequence of this strain, which encodes 2,277 protein-coding genes and 81 RNAs within its 2.4-Mb genome and plasmid. Copyright © 2017 Galac et al.

  12. Comparison of 61 Sequenced Escherichia coli Genomes

    DEFF Research Database (Denmark)

    Lukjancenko, Oksana; Wassenaar, T. M.; Ussery, David

    2010-01-01

    Escherichia coli is an important component of the biosphere and is an ideal model for studies of processes involved in bacterial genome evolution. Sixty-one publically available E. coli and Shigella spp. sequenced genomes are compared, using basic methods to produce phylogenetic and proteomics...

  13. Nucleotide sequence preservation of human mitochondrial DNA

    International Nuclear Information System (INIS)

    Monnat, R.J. Jr.; Loeb, L.A.

    1985-01-01

    Recombinant DNA techniques have been used to quantitate the amount of nucleotide sequence divergence in the mitochondrial DNA population of individual normal humans. Mitochondrial DNA was isolated from the peripheral blood lymphocytes of five normal humans and cloned in M13 mp11; 49 kilobases of nucleotide sequence information was obtained from 248 independently isolated clones from the five normal donors. Both between- and within-individual differences were identified. Between-individual differences were identified in approximately = to 1/200 nucleotides. In contrast, only one within-individual difference was identified in 49 kilobases of nucleotide sequence information. This high degree of mitochondrial nucleotide sequence homogeneity in human somatic cells is in marked contrast to the rapid evolutionary divergence of human mitochondrial DNA and suggests the existence of mechanisms for the concerted preservation of mammalian mitochondrial DNA sequences in single organisms

  14. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...

  15. Human Chromosome 7: DNA Sequence and Biology

    OpenAIRE

    Scherer, Stephen W.; Cheung, Joseph; MacDonald, Jeffrey R.; Osborne, Lucy R.; Nakabayashi, Kazuhiko; Herbrick, Jo-Anne; Carson, Andrew R.; Parker-Katiraee, Layla; Skaug, Jennifer; Khaja, Razi; Zhang, Junjun; Hudek, Alexander K.; Li, Martin; Haddad, May; Duggan, Gavin E.

    2003-01-01

    DNA sequence and annotation of the entire human chromosome 7, encompassing nearly 158 million nucleotides of DNA and 1917 gene structures, are presented. To generate a higher order description, additional structural features such as imprinted genes, fragile sites, and segmental duplications were integrated at the level of the DNA sequence with medical genetic data, including 440 chromosome rearrangement breakpoints associated with disease. This approach enabled the discovery of candidate gene...

  16. Genetic basis of olfactory cognition: extremely high level of DNA sequence polymorphism in promoter regions of the human olfactory receptor genes revealed using the 1000 Genomes Project dataset.

    Science.gov (United States)

    Ignatieva, Elena V; Levitsky, Victor G; Yudin, Nikolay S; Moshkin, Mikhail P; Kolchanov, Nikolay A

    2014-01-01

    The molecular mechanism of olfactory cognition is very complicated. Olfactory cognition is initiated by olfactory receptor proteins (odorant receptors), which are activated by olfactory stimuli (ligands). Olfactory receptors are the initial player in the signal transduction cascade producing a nerve impulse, which is transmitted to the brain. The sensitivity to a particular ligand depends on the expression level of multiple proteins involved in the process of olfactory cognition: olfactory receptor proteins, proteins that participate in signal transduction cascade, etc. The expression level of each gene is controlled by its regulatory regions, and especially, by the promoter [a region of DNA about 100-1000 base pairs long located upstream of the transcription start site (TSS)]. We analyzed single nucleotide polymorphisms using human whole-genome data from the 1000 Genomes Project and revealed an extremely high level of single nucleotide polymorphisms in promoter regions of olfactory receptor genes and HLA genes. We hypothesized that the high level of polymorphisms in olfactory receptor promoters was responsible for the diversity in regulatory mechanisms controlling the expression levels of olfactory receptor proteins. Such diversity of regulatory mechanisms may cause the great variability of olfactory cognition of numerous environmental olfactory stimuli perceived by human beings (air pollutants, human body odors, odors in culinary etc.). In turn, this variability may provide a wide range of emotional and behavioral reactions related to the vast variety of olfactory stimuli.

  17. Genetic basis of olfactory cognition: extremely high level of DNA sequence polymorphism in promoter regions of the human olfactory receptor genes revealed using the 1000 Genomes Project dataset

    Directory of Open Access Journals (Sweden)

    Elena V. Ignatieva

    2014-03-01

    Full Text Available The molecular mechanism of olfactory cognition is very complicated. Olfactory cognition is initiated by olfactory receptor proteins (odorant receptors, which are activated by olfactory stimuli (ligands. Olfactory receptors are the initial player in the signal transduction cascade producing a nerve impulse, which is transmitted to the brain. The sensitivity to a particular ligand depends on the expression level of multiple proteins involved in the process of olfactory cognition: olfactory receptor proteins, proteins that participate in signal transduction cascade, etc. The expression level of each gene is controlled by its regulatory regions, and especially, by the promoter (a region of DNA about 100–1000 base pairs long located upstream of the transcription start site. We analyzed single nucleotide polymorphisms using human whole-genome data from the 1000 Genomes Project and revealed an extremely high level of single nucleotide polymorphisms in promoter regions of olfactory receptor genes and HLA genes. We hypothesized that the high level of polymorphisms in olfactory receptor promoters was responsible for the diversity in regulatory mechanisms controlling the expression levels of olfactory receptor proteins. Such diversity of regulatory mechanisms may cause the great variability of olfactory cognition of numerous environmental olfactory stimuli perceived by human beings (air pollutants, human body odors, odors in culinary etc.. In turn, this variability may provide a wide range of emotional and behavioral reactions related to the vast variety of olfactory stimuli.

  18. Multiple tag labeling method for DNA sequencing

    Science.gov (United States)

    Mathies, R.A.; Huang, X.C.; Quesada, M.A.

    1995-07-25

    A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.

  19. The complete mitochondrial genome sequence of Eimeria magna (Apicomplexa: Coccidia).

    Science.gov (United States)

    Tian, Si-Qin; Cui, Ping; Fang, Su-Fang; Liu, Guo-Hua; Wang, Chun-Ren; Zhu, Xing-Quan

    2015-01-01

    In the present study, we determined the complete mitochondrial DNA (mtDNA) sequence of Eimeria magna from rabbits for the first time, and compared its gene contents and genome organizations with that of seven Eimeria spp. from domestic chickens. The size of the complete mt genome sequence of E. magna is 6249 bp, which consists of 3 protein-coding genes (cytb, cox1 and cox3), 12 gene fragments for the large subunit (LSU) rRNA, and 7 gene fragments for the small subunit (SSU) rRNA, without transfer RNA genes, in accordance with that of Eimeria spp. from chickens. The putative direction of translation for three genes (cytb, cox1 and cox3) was the same as those of Eimeria species from domestic chickens. The content of A + T is 65.16% for E. magna mt genome (29.73% A, 35.43% T, 17.09 G and 17.75% C). The E. magna mt genome sequence provides novel mtDNA markers for studying the molecular epidemiology and population genetics of Eimeria spp. and has implications for the molecular diagnosis and control of rabbit coccidiosis.

  20. Highly multiplexed targeted DNA sequencing from single nuclei.

    Science.gov (United States)

    Leung, Marco L; Wang, Yong; Kim, Charissa; Gao, Ruli; Jiang, Jerry; Sei, Emi; Navin, Nicholas E

    2016-02-01

    Single-cell DNA sequencing methods are challenged by poor physical coverage, high technical error rates and low throughput. To address these issues, we developed a single-cell DNA sequencing protocol that combines flow-sorting of single nuclei, time-limited multiple-displacement amplification (MDA), low-input library preparation, DNA barcoding, targeted capture and next-generation sequencing (NGS). This approach represents a major improvement over our previous single nucleus sequencing (SNS) Nature Protocols paper in terms of generating higher-coverage data (>90%), thereby enabling the detection of genome-wide variants in single mammalian cells at base-pair resolution. Furthermore, by pooling 48-96 single-cell libraries together for targeted capture, this approach can be used to sequence many single-cell libraries in parallel in a single reaction. This protocol greatly reduces the cost of single-cell DNA sequencing, and it can be completed in 5-6 d by advanced users. This single-cell DNA sequencing protocol has broad applications for studying rare cells and complex populations in diverse fields of biological research and medicine.

  1. Googling DNA sequences on the World Wide Web.

    Science.gov (United States)

    Hajibabaei, Mehrdad; Singer, Gregory A C

    2009-11-10

    New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.

  2. Genome instabilities arising from ribonucleotides in DNA.

    Science.gov (United States)

    Klein, Hannah L

    2017-08-01

    Genomic DNA is transiently contaminated with ribonucleotide residues during the process of DNA replication through misincorporation by the replicative DNA polymerases α, δ and ε, and by the normal replication process on the lagging strand, which uses RNA primers. These ribonucleotides are efficiently removed during replication by RNase H enzymes and the lagging strand synthesis machinery. However, when ribonucleotides remain in DNA they can distort the DNA helix, affect machineries for DNA replication, transcription and repair, and can stimulate genomic instabilities which are manifest as increased mutation, recombination and chromosome alterations. The genomic instabilities associated with embedded ribonucleotides are considered here, along with a discussion of the origin of the lesions that stimulate particular classes of instabilities. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  4. Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

    Science.gov (United States)

    2011-01-01

    Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of

  5. [Complete genome sequencing and sequence analysis of BCG Tice].

    Science.gov (United States)

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  6. Highly sensitive polymerase chain reaction-free quantum dot-based quantification of forensic genomic DNA

    International Nuclear Information System (INIS)

    Tak, Yu Kyung; Kim, Won Young; Kim, Min Jung; Han, Eunyoung; Han, Myun Soo; Kim, Jong Jin; Kim, Wook; Lee, Jong Eun; Song, Joon Myong

    2012-01-01

    Highlights: ► Genomic DNA quantification were performed using a quantum dot-labeled Alu sequence. ► This probe provided PCR-free determination of human genomic DNA. ► Qdot-labeled Alu probe-hybridized genomic DNAs had a 2.5-femtogram detection limit. ► Qdot-labeled Alu sequence was used to assess DNA samples for human identification. - Abstract: Forensic DNA samples can degrade easily due to exposure to light and moisture at the crime scene. In addition, the amount of DNA acquired at a criminal site is inherently limited. This limited amount of human DNA has to be quantified accurately after the process of DNA extraction. The accurately quantified extracted genomic DNA is then used as a DNA template in polymerase chain reaction (PCR) amplification for short tandem repeat (STR) human identification. Accordingly, highly sensitive and human-specific quantification of forensic DNA samples is an essential issue in forensic study. In this work, a quantum dot (Qdot)-labeled Alu sequence was developed as a probe to simultaneously satisfy both the high sensitivity and human genome selectivity for quantification of forensic DNA samples. This probe provided PCR-free determination of human genomic DNA and had a 2.5-femtogram detection limit due to the strong emission and photostability of the Qdot. The Qdot-labeled Alu sequence has been used successfully to assess 18 different forensic DNA samples for STR human identification.

  7. Generation and Analysis of Full-length cDNA Sequences from Elephant Shark (Callorhinchus milii)

    KAUST Repository

    Kodzius, Rimantas

    2009-03-17

    Cartilaginous fishes are the oldest living group of jawed vertebrates and therefore is an important group for understanding the evolution of vertebrate genomes including the human genome. Our laboratory has proposed elephant shark (C. milii) as a model cartilaginous fish genome because of its relatively small genome size (910 Mb). The whole genome of C. milii is being sequenced (first cartilaginous fish genome to be sequenced completely). To characterize the transcriptome of C. milii and to assist in annotating exon-intron boundaries, transcriptional start sites and alternatively spliced transcripts, we are generating full-length cDNA sequences from C. milii.

  8. Cyprinus carpio Genome sequencing and assembly

    NARCIS (Netherlands)

    Kolder, I.C.R.M.; Plas-Duivesteijn, van der Suzanne J.; Tan, G.; Wiegertjes, G.; Forlenza, M.; Guler, A.T.; Travin, D.Y.; Nakao, M.; Moritomo, T.; Irnazarow, I.; Jansen, H.J.

    2013-01-01

    Sequencing of the common carp (Cyprinus carpio carpio Linnaeus, 1758) genome, with the objective of establishing carp as a model organism to supplement the closely related zebrafish (Danio rerio). The sequenced individual is a homozygous female (by gynogenesis) of R3 x R8 carp, the heterozygous

  9. Harnessing Whole Genome Sequencing in Medical Mycology.

    Science.gov (United States)

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  10. 10KP: A phylodiverse genome sequencing plan

    Science.gov (United States)

    Cheng, Shifeng; Melkonian, Michael; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun

    2018-01-01

    Abstract Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here. PMID:29618049

  11. 10KP: A phylodiverse genome sequencing plan.

    Science.gov (United States)

    Cheng, Shifeng; Melkonian, Michael; Smith, Stephen A; Brockington, Samuel; Archibald, John M; Delaux, Pierre-Marc; Li, Fay-Wei; Melkonian, Barbara; Mavrodiev, Evgeny V; Sun, Wenjing; Fu, Yuan; Yang, Huanming; Soltis, Douglas E; Graham, Sean W; Soltis, Pamela S; Liu, Xin; Xu, Xun; Wong, Gane Ka-Shu

    2018-03-01

    Understanding plant evolution and diversity in a phylogenomic context is an enormous challenge due, in part, to limited availability of genome-scale data across phylodiverse species. The 10KP (10,000 Plants) Genome Sequencing Project will sequence and characterize representative genomes from every major clade of embryophytes, green algae, and protists (excluding fungi) within the next 5 years. By implementing and continuously improving leading-edge sequencing technologies and bioinformatics tools, 10KP will catalogue the genome content of plant and protist diversity and make these data freely available as an enduring foundation for future scientific discoveries and applications. 10KP is structured as an international consortium, open to the global community, including botanical gardens, plant research institutes, universities, and private industry. Our immediate goal is to establish a policy framework for this endeavor, the principles of which are outlined here.

  12. Deep whole-genome sequencing of 90 Han Chinese genomes.

    Science.gov (United States)

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000

  13. Templated sequence insertion polymorphisms in the human genome

    Science.gov (United States)

    Onozawa, Masahiro; Aplan, Peter

    2016-11-01

    Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.

  14. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

    Science.gov (United States)

    Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

    2016-07-01

    The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  15. A novel method of genomic DNA extraction for Cactaceae1

    Science.gov (United States)

    Fehlberg, Shannon D.; Allen, Jessica M.; Church, Kathleen

    2013-01-01

    • Premise of the study: Genetic studies of Cactaceae can at times be impeded by difficult sampling logistics and/or high mucilage content in tissues. Simplifying sampling and DNA isolation through the use of cactus spines has not previously been investigated. • Methods and Results: Several protocols for extracting DNA from spines were tested and modified to maximize yield, amplification, and sequencing. Sampling of and extraction from spines resulted in a simplified protocol overall and complete avoidance of mucilage as compared to typical tissue extractions. Sequences from one nuclear and three plastid regions were obtained across eight genera and 20 species of cacti using DNA extracted from spines. • Conclusions: Genomic DNA useful for amplification and sequencing can be obtained from cactus spines. The protocols described here are valuable for any cactus species, but are particularly useful for investigators interested in sampling living collections, extensive field sampling, and/or conservation genetic studies. PMID:25202521

  16. Draft genome sequence of ramie, Boehmeria nivea (L.) Gaudich.

    Science.gov (United States)

    Luan, Ming-Bao; Jian, Jian-Bo; Chen, Ping; Chen, Jun-Hui; Chen, Jian-Hua; Gao, Qiang; Gao, Gang; Zhou, Ju-Hong; Chen, Kun-Mei; Guang, Xuan-Min; Chen, Ji-Kang; Zhang, Qian-Qian; Wang, Xiao-Fei; Fang, Long; Sun, Zhi-Min; Bai, Ming-Zhou; Fang, Xiao-Dong; Zhao, Shan-Cen; Xiong, He-Ping; Yu, Chun-Ming; Zhu, Ai-Guo

    2018-05-01

    Ramie, Boehmeria nivea (L.) Gaudich, family Urticaceae, is a plant native to eastern Asia, and one of the world's oldest fibre crops. It is also used as animal feed and for the phytoremediation of heavy metal-contaminated farmlands. Thus, the genome sequence of ramie was determined to explore the molecular basis of its fibre quality, protein content and phytoremediation. For further understanding ramie genome, different paired-end and mate-pair libraries were combined to generate 134.31 Gb of raw DNA sequences using the Illumina whole-genome shotgun sequencing approach. The highly heterozygous B. nivea genome was assembled using the Platanus Genome Assembler, which is an effective tool for the assembly of highly heterozygous genome sequences. The final length of the draft genome of this species was approximately 341.9 Mb (contig N50 = 22.62 kb, scaffold N50 = 1,126.36 kb). Based on ramie genome annotations, 30,237 protein-coding genes were predicted, and the repetitive element content was 46.3%. The completeness of the final assembly was evaluated by benchmarking universal single-copy orthologous genes (BUSCO); 90.5% of the 1,440 expected embryophytic genes were identified as complete, and 4.9% were identified as fragmented. Phylogenetic analysis based on single-copy gene families and one-to-one orthologous genes placed ramie with mulberry and cannabis, within the clade of urticalean rosids. Genome information of ramie will be a valuable resource for the conservation of endangered Boehmeria species and for future studies on the biogeography and characteristic evolution of members of Urticaceae. © 2018 John Wiley & Sons Ltd.

  17. Rhipicephalus microplus strain Deutsch, whole genome shotgun sequencing project Version 2

    Science.gov (United States)

    The cattle tick, Rhipicephalus (Boophilus) microplus, has a genome over 2.4 times the size of the human genome, and with over 70% of repetitive DNA, this genome would prove very costly to sequence at today's prices and difficult to assemble and analyze. Cot filtration/selection techniques were used ...

  18. The first complete genome sequences of clinical isolates of human coronavirus 229E

    NARCIS (Netherlands)

    Farsani, Seyed Mohammad Jazaeri; Dijkman, Ronald; Jebbink, Maarten F.; Goossens, Herman; Ieven, Margareta; Deijs, Martin; Molenkamp, Richard; van der Hoek, Lia

    2012-01-01

    Human coronavirus 229E has been identified in the mid-1960s, yet still only one full-genome sequence is available. This full-length sequence has been determined from the cDNA-clone Inf-1 that is based on the lab-adapted strain VR-740. Lab-adaptation might have resulted in genomic changes, due to

  19. Genome sequencing of Deutsch strain of cattle ticks, Rhipicephalus microplus: Raw Pac Bio reads.

    Science.gov (United States)

    Pac Bio RS II whole genome shotgun sequencing technology was used to sequence the genome of the cattle tick, Rhipicephalus microplus. The DNA was derived from 14 day old eggs from the Deutsch Texas outbreak strain reared at the USDA-ARS Cattle Fever Tick Research Laboratory, Edinburg, TX. Each corre...

  20. The mitochondrial and plastid genomes of Volvox carteri: bloated molecules rich in repetitive DNA

    Directory of Open Access Journals (Sweden)

    Lee Robert W

    2009-03-01

    Full Text Available Abstract Background The magnitude of noncoding DNA in organelle genomes can vary significantly; it is argued that much of this variation is attributable to the dissemination of selfish DNA. The results of a previous study indicate that the mitochondrial DNA (mtDNA of the green alga Volvox carteri abounds with palindromic repeats, which appear to be selfish elements. We became interested in the evolution and distribution of these repeats when, during a cursory exploration of the V. carteri nuclear DNA (nucDNA and plastid DNA (ptDNA sequences, we found palindromic repeats with similar structural features to those of the mtDNA. Upon this discovery, we decided to investigate the diversity and evolutionary implications of these palindromic elements by sequencing and characterizing large portions of mtDNA and ptDNA and then comparing these data to the V. carteri draft nuclear genome sequence. Results We sequenced 30 and 420 kilobases (kb of the mitochondrial and plastid genomes of V. carteri, respectively – resulting in partial assemblies of these genomes. The mitochondrial genome is the most bloated green-algal mtDNA observed to date: ~61% of the sequence is noncoding, most of which is comprised of short palindromic repeats spread throughout the intergenic and intronic regions. The plastid genome is the largest (>420 kb and most expanded (>80% noncoding ptDNA sequence yet discovered, with a myriad of palindromic repeats in the noncoding regions, which have a similar size and secondary structure to those of the mtDNA. We found that 15 kb (~0.01% of the nuclear genome are homologous to the palindromic elements of the mtDNA, and 50 kb (~0.05% are homologous to those of the ptDNA. Conclusion Selfish elements in the form of short palindromic repeats have propagated in the V. carteri mtDNA and ptDNA, resulting in the distension of these genomes. Copies of these same repeats are also found in a small fraction of the nucDNA, but appear to be inert in this

  1. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  2. Application of synthetic DNA probes to the analysis of DNA sequence variants in man

    International Nuclear Information System (INIS)

    Wallace, R.B.; Petz, L.D.; Yam, P.Y.

    1986-01-01

    Oligonucleotide probes provide a tool to discriminate between any two alleles on the basis of hybridization. Random sampling of the genome with different oligonucleotide probes should reveal polymorphism in a certain percentage of the cases. In the hope of identifying polymorphic regions more efficiently, we chose to take advantage of the proposed hypermutability of repeated DNA sequences and the specificity of oligonucleotide hybridization. Since, under appropriate conditions, oligonucleotide probes require complete base pairing for hybridization to occur, they will only hybridize to a subset of the members of a repeat family when all members of the family are not identical. The results presented here suggest that oligonucleotide hybridization can be used to extend the genomic sequences that can be tested for the presence of RFLPs. This expands the tools available to human genetics. In addition, the results suggest that repeated DNA sequences are indeed more polymorphic than single-copy sequences. 28 references, 2 figures

  3. EGNAS: an exhaustive DNA sequence design algorithm

    Directory of Open Access Journals (Sweden)

    Kick Alfred

    2012-06-01

    Full Text Available Abstract Background The molecular recognition based on the complementary base pairing of deoxyribonucleic acid (DNA is the fundamental principle in the fields of genetics, DNA nanotechnology and DNA computing. We present an exhaustive DNA sequence design algorithm that allows to generate sets containing a maximum number of sequences with defined properties. EGNAS (Exhaustive Generation of Nucleic Acid Sequences offers the possibility of controlling both interstrand and intrastrand properties. The guanine-cytosine content can be adjusted. Sequences can be forced to start and end with guanine or cytosine. This option reduces the risk of “fraying” of DNA strands. It is possible to limit cross hybridizations of a defined length, and to adjust the uniqueness of sequences. Self-complementarity and hairpin structures of certain length can be avoided. Sequences and subsequences can optionally be forbidden. Furthermore, sequences can be designed to have minimum interactions with predefined strands and neighboring sequences. Results The algorithm is realized in a C++ program. TAG sequences can be generated and combined with primers for single-base extension reactions, which were described for multiplexed genotyping of single nucleotide polymorphisms. Thereby, possible foldback through intrastrand interaction of TAG-primer pairs can be limited. The design of sequences for specific attachment of molecular constructs to DNA origami is presented. Conclusions We developed a new software tool called EGNAS for the design of unique nucleic acid sequences. The presented exhaustive algorithm allows to generate greater sets of sequences than with previous software and equal constraints. EGNAS is freely available for noncommercial use at http://www.chm.tu-dresden.de/pc6/EGNAS.

  4. Species tree phylogeny and character evolution in the genus Centipeda (Asteraceae): evidence from DNA sequences from coding and non-coding loci from the plastid and nuclear genomes.

    Science.gov (United States)

    Nylinder, Stephan; Cronholm, Bodil; de Lange, Peter J; Walsh, Neville; Anderberg, Arne A

    2013-08-01

    A species tree phylogeny of the Australian/New Zealand genus Centipeda (Asteraceae) is estimated based on nucleotide sequence data. We analysed sequences of nuclear ribosomal DNA (ETS, ITS) and three plasmid loci (ndhF, psbA-trnH, and trnL-F) using the multi-species coalescent module in BEAST. A total of 129 individuals from all 10 recognised species of Centipeda were sampled throughout the species distribution ranges, including two subspecies. We conclude that the inferred species tree topology largely conform previous assumptions on species relationships. Centipeda racemosa (Snuffweed) is the sister to remaining species, which is also the only consistently perennial representative in the genus. Centipeda pleiocephala (Tall Sneezeweed) and C. nidiformis (Cotton Sneezeweed) constitute a species pair, as does C. borealis and C. minima (Spreading Sneezeweed), all sharing the symplesiomorphic characters of spherical capitulum and convex receptacle with C. racemosa. Another species group comprising C. thespidioides (Desert Sneezeweed), C. cunninghamii (Old man weed, or Common sneeze-weed), C. crateriformis is well-supported but then include the morphologically aberrant C. aotearoana, all sharing the character of having capitula that mature more slowly relative the subtending shoot. Centipeda elatinoides takes on a weakly supported intermediate position between the two mentioned groups, and is difficult to relate to any of the former groups based on morphological characters. Copyright © 2013 Elsevier Inc. All rights reserved.

  5. Mitochondrial genome sequence of the Tibetan wild ass (Equus kiang).

    Science.gov (United States)

    Luo, Yongjun; Chen, Yu; Liu, Fuyu; Jiang, Chunhua; Gao, Yuqi

    2011-02-01

    The Tibetan wild ass, or kiang (Equus kiang) is endemic to the cold and hypoxic (4000-7000 m above sea level) climates of the montane and alpine grasslands of the Tibetan Plateau. We report here the complete nucleotide sequence of the E. kiang mitochondrial genome. Our results show that E. kiang mitochondrial DNA is 16,634 bp long, and predicted to encode all the 37 genes that are typical for vertebrates.

  6. Next-Generation Sequencing and Genome Editing in Plant Virology

    Directory of Open Access Journals (Sweden)

    Ahmed Hadidi

    2016-08-01

    Full Text Available Next-generation sequencing (NGS has been applied to plant virology since 2009. NGS provides highly efficient, rapid, low cost DNA or RNA high-throughput sequencing of the genomes of plant viruses and viroids and of the specific small RNAs generated during the infection process. These small RNAs, which cover frequently the whole genome of the infectious agent, are 21-24 nt long and are known as vsRNAs for viruses and vd-sRNAs for viroids. NGS has been used in a number of studies in plant virology including, but not limited to, discovery of novel viruses and viroids as well as detection and identification of those pathogens already known, analysis of genome diversity and evolution, and study of pathogen epidemiology. The genome engineering editing method, clustered regularly interspaced short palindromic repeats (CRISPR-Cas9 system has been successfully used recently to engineer resistance to DNA geminiviruses (family, Geminiviridae by targeting different viral genome sequences in infected Nicotiana benthamiana or Arabidopsis plants. The DNA viruses targeted include tomato yellow leaf curl virus and merremia mosaic virus (begomovirus; beet curly top virus and beet severe curly top virus (curtovirus; and bean yellow dwarf virus (mastrevirus. The technique has also been used against the RNA viruses zucchini yellow mosaic virus, papaya ringspot virus and turnip mosaic virus (potyvirus and cucumber vein yellowing virus (ipomovirus, family, Potyviridae by targeting the translation initiation genes eIF4E in cucumber or Arabidopsis plants. From these recent advances of major importance, it is expected that NGS and CRISPR-Cas technologies will play a significant role in the very near future in advancing the field of plant virology and connecting it with other related fields of biology.Keywords: Next-generation sequencing, NGS, plant virology, plant viruses, viroids, resistance to plant viruses by CRISPR-Cas9

  7. Mitochondrial genome sequencing and development of genetic markers for the detection of DNA of invasive bighead and silver carp (Hypophthalmichthys nobilis and H. molitrix in environmental water samples from the United States.

    Directory of Open Access Journals (Sweden)

    Heather L Farrington

    Full Text Available Invasive Asian bighead and silver carp (Hypophthalmichthys nobilis and H. molitrix pose a substantial threat to North American aquatic ecosystems. Recently, environmental DNA (eDNA, genetic material shed by organisms into their environment that can be detected by non-invasive sampling strategies and genetic assays, has gained recognition as a tool for tracking the invasion front of these species toward the Great Lakes. The goal of this study was to develop new species-specific conventional PCR (cPCR and quantitative (qPCR markers for detection of these species in North American surface waters. We first generated complete mitochondrial genome sequences from 33 bighead and 29 silver carp individuals collected throughout their introduced range. These sequences were aligned with those from other common and closely related fish species from the Illinois River watershed to identify and design new species-specific markers for the detection of bighead and silver carp DNA in environmental water samples. We then tested these genetic markers in the laboratory for species-specificity and sensitivity. Newly developed markers performed well in field trials, did not have any false positive detections, and many markers had much higher detection rates and sensitivity compared to the markers currently used in eDNA surveillance programs. We also explored the use of multiple genetic markers to determine whether it would improve detection rates, results of which showed that using multiple highly sensitive markers should maximize detection rates in environmental samples. The new markers developed in this study greatly expand the number of species-specific genetic markers available to track the invasion front of bighead and silver carp and will improve the resolution of these assays. Additionally, the use of the qPCR markers developed in this study may reduce sample processing time and cost of eDNA monitoring for these species.

  8. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls....... In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes...

  9. Rapid sequencing of the bamboo mitochondrial genome using Illumina technology and parallel episodic evolution of organelle genomes in grasses.

    Science.gov (United States)

    Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

    2012-01-01

    Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast

  10. Genome-wide identification, sequence characterization, and protein-protein interaction properties of DDB1 (damaged DNA binding protein-1)-binding WD40-repeat family members in Solanum lycopersicum.

    Science.gov (United States)

    Zhu, Yunye; Huang, Shengxiong; Miao, Min; Tang, Xiaofeng; Yue, Junyang; Wang, Wenjie; Liu, Yongsheng

    2015-06-01

    One hundred DDB1 (damaged DNA binding protein-1)-binding WD40-repeat domain (DWD) family genes were identified in the S. lycopersicum genome. The DWD genes encode proteins presumably functioning as the substrate recognition subunits of the cullin4-ring ubiquitin E3 ligase complex. These findings provide candidate genes and a research platform for further gene functionality and molecular breeding study. A subclass of DDB1 (damaged DNA binding protein-1)-binding WD40-repeat domain (DWD) family proteins has been demonstrated to function as the substrate recognition subunits of the cullin4-ring ubiquitin E3 ligase complex. However, little information is available about the cognate subfamily genes in tomato (S. lycopersicum). In this study, based on the recently released tomato genome sequences, 100 tomato genes encoding DWD proteins that potentially interact with DDB1 were identified and characterized, including analyses of the detailed annotations, chromosome locations and compositions of conserved amino acid domains. In addition, a phylogenetic tree, which comprises of three main groups, of the subfamily genes was constructed. The physical interaction between tomato DDB1 and 14 representative DWD proteins was determined by yeast two-hybrid and co-immunoprecipitation assays. The subcellular localization of these 14 representative DWD proteins was determined. Six of them were localized in both nucleus and cytoplasm, seven proteins exclusively in cytoplasm, and one protein either in nucleus and cytoplasm, or exclusively in cytoplasm. Comparative genomic analysis demonstrated that the expansion of these subfamily members in tomato predominantly resulted from two whole-genome triplication events in the evolution history.

  11. 2D-dynamic representation of DNA sequences as a graphical tool in bioinformatics

    Science.gov (United States)

    Bielińska-Wa̧Ż, D.; Wa̧Ż, P.

    2016-10-01

    2D-dynamic representation of DNA sequences is briefly reviewed. Some new examples of 2D-dynamic graphs which are the graphical tool of the method are shown. Using the examples of the complete genome sequences of the Zika virus it is shown that the present method can be applied for the study of the evolution of viral genomes.

  12. Directed PCR-free engineering of highly repetitive DNA sequences

    Directory of Open Access Journals (Sweden)

    Preissler Steffen

    2011-09-01

    Full Text Available Abstract Background Highly repetitive nucleotide sequences are commonly found in nature e.g. in telomeres, microsatellite DNA, polyadenine (poly(A tails of eukaryotic messenger RNA as well as in several inherited human disorders linked to trinucleotide repeat expansions in the genome. Therefore, studying repetitive sequences is of biological, biotechnological and medical relevance. However, cloning of such repetitive DNA sequences is challenging because specific PCR-based amplification is hampered by the lack of unique primer binding sites resulting in unspecific products. Results For the PCR-free generation of repetitive DNA sequences we used antiparallel oligonucleotides flanked by restriction sites of Type IIS endonucleases. The arrangement of recognition sites allowed for stepwise and seamless elongation of repetitive sequences. This facilitated the assembly of repetitive DNA segments and open reading frames encoding polypeptides with periodic amino acid sequences of any desired length. By this strategy we cloned a series of polyglutamine encoding sequences as well as highly repetitive polyadenine tracts. Such repetitive sequences can be used for diverse biotechnological applications. As an example, the polyglutamine sequences were expressed as His6-SUMO fusion proteins in Escherichia coli cells to study their aggregation behavior in vitro. The His6-SUMO moiety enabled affinity purification of the polyglutamine proteins, increased their solubility, and allowed controlled induction of the aggregation process. We successfully purified the fusions proteins and provide an example for their applicability in filter retardation assays. Conclusion Our seamless cloning strategy is PCR-free and allows the directed and efficient generation of highly repetitive DNA sequences of defined lengths by simple standard cloning procedures.

  13. Inaugural Genomics Automation Congress and the coming deluge of sequencing data.

    Science.gov (United States)

    Creighton, Chad J

    2010-10-01

    Presentations at Select Biosciences's first 'Genomics Automation Congress' (Boston, MA, USA) in 2010 focused on next-generation sequencing and the platforms and methodology around them. The meeting provided an overview of sequencing technologies, both new and emerging. Speakers shared their recent work on applying sequencing to profile cells for various levels of biomolecular complexity, including DNA sequences, DNA copy, DNA methylation, mRNA and microRNA. With sequencing time and costs continuing to drop dramatically, a virtual explosion of very large sequencing datasets is at hand, which will probably present challenges and opportunities for high-level data analysis and interpretation, as well as for information technology infrastructure.

  14. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  15. Defining functional DNA elements in the human genome

    Science.gov (United States)

    Kellis, Manolis; Wold, Barbara; Snyder, Michael P.; Bernstein, Bradley E.; Kundaje, Anshul; Marinov, Georgi K.; Ward, Lucas D.; Birney, Ewan; Crawford, Gregory E.; Dekker, Job; Dunham, Ian; Elnitski, Laura L.; Farnham, Peggy J.; Feingold, Elise A.; Gerstein, Mark; Giddings, Morgan C.; Gilbert, David M.; Gingeras, Thomas R.; Green, Eric D.; Guigo, Roderic; Hubbard, Tim; Kent, Jim; Lieb, Jason D.; Myers, Richard M.; Pazin, Michael J.; Ren, Bing; Stamatoyannopoulos, John A.; Weng, Zhiping; White, Kevin P.; Hardison, Ross C.

    2014-01-01

    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease. PMID:24753594

  16. Recurrence plot analysis of DNA sequences

    Energy Technology Data Exchange (ETDEWEB)

    Wu Zuobing [State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080 (China)]. E-mail: wuzb@lnm.imech.ac.cn

    2004-11-15

    Recurrence plot technique of DNA sequences is established on metric representation and employed to analyze correlation structure of nucleotide strings. It is found that, in the transference of nucleotide strings, a human DNA fragment has a major correlation distance, but a yeast chromosome's correlation distance has a constant increasing.

  17. Microbial species delineation using whole genome sequences.

    Science.gov (United States)

    Varghese, Neha J; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T; Mavrommatis, Kostas; Kyrpides, Nikos C; Pati, Amrita

    2015-08-18

    Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Genomic Sequence Variation Markup Language (GSVML).

    Science.gov (United States)

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as

  19. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

    Science.gov (United States)

    Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

  20. Chaos game representation (CGR)-walk model for DNA sequences

    International Nuclear Information System (INIS)

    Jie, Gao; Zhen-Yuan, Xu

    2009-01-01

    Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model. (cross-disciplinary physics and related areas of science and technology)

  1. Complete genome sequence of Ikoma lyssavirus.

    Science.gov (United States)

    Marston, Denise A; Ellis, Richard J; Horton, Daniel L; Kuzmin, Ivan V; Wise, Emma L; McElhinney, Lorraine M; Banyard, Ashley C; Ngeleja, Chanasa; Keyyu, Julius; Cleaveland, Sarah; Lembo, Tiziana; Rupprecht, Charles E; Fooks, Anthony R

    2012-09-01

    Lyssaviruses (family Rhabdoviridae) constitute one of the most important groups of viral zoonoses globally. All lyssaviruses cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Currently available vaccines are highly protective against the predominantly circulating lyssavirus species. Using next-generation sequencing technologies, we have obtained the whole-genome sequence for a novel lyssavirus, Ikoma lyssavirus (IKOV), isolated from an African civet in Tanzania displaying clinical signs of rabies. Genetically, this virus is the most divergent within the genus Lyssavirus. Characterization of the genome will help to improve our understanding of lyssavirus diversity and enable investigation into vaccine-induced immunity and protection.

  2. Novel DNA sequence detection method based on fluorescence energy transfer

    International Nuclear Information System (INIS)

    Kobayashi, S.; Tamiya, E.; Karube, I.

    1987-01-01

    Recently the detection of specific DNA sequence, DNA analysis, has been becoming more important for diagnosis of viral genomes causing infections disease and human sequences related to inherited disorders. These methods typically involve electrophoresis, the immobilization of DNA on a solid support, hybridization to a complementary probe, the detection using labeled with /sup 32/P or nonisotopically with a biotin-avidin-enzyme system, and so on. These techniques are highly effective, but they are very time-consuming and expensive. A principle of fluorescene energy transfer is that the light energy from an excited donor (fluorophore) is transferred to an acceptor (fluorophore), if the acceptor exists in the vicinity of the donor and the excitation spectrum of donor overlaps the emission spectrum of acceptor. In this study, the fluorescence energy transfer was applied to the detection of specific DNA sequence using the hybridization method. The analyte, single-stranded DNA labeled with the donor fluorophore is hybridized to a probe DNA labeled with the acceptor. Because of the complementary DNA duplex formation, two fluorophores became to be closed to each other, and the fluorescence energy transfer was occurred

  3. Ribosomal RNA Genes Contribute to the Formation of Pseudogenes and Junk DNA in the Human Genome.

    Science.gov (United States)

    Robicheau, Brent M; Susko, Edward; Harrigan, Amye M; Snyder, Marlene

    2017-02-01

    Approximately 35% of the human genome can be identified as sequence devoid of a selected-effect function, and not derived from transposable elements or repeated sequences. We provide evidence supporting a known origin for a fraction of this sequence. We show that: 1) highly degraded, but near full length, ribosomal DNA (rDNA) units, including both 45S and Intergenic Spacer (IGS), can be found at multiple sites in the human genome on chromosomes without rDNA arrays, 2) that these rDNA sequences have a propensity for being centromere proximal, and 3) that sequence at all human functional rDNA array ends is divergent from canonical rDNA to the point that it is pseudogenic. We also show that small sequence strings of rDNA (from 45S + IGS) can be found distributed throughout the genome and are identifiable as an "rDNA-like signal", representing 0.26% of the q-arm of HSA21 and ∼2% of the total sequence of other regions tested. The size of sequence strings found in the rDNA-like signal intergrade into the size of sequence strings that make up the full-length degrading rDNA units found scattered throughout the genome. We conclude that the displaced and degrading rDNA sequences are likely of a similar origin but represent different stages in their evolution towards random sequence. Collectively, our data suggests that over vast evolutionary time, rDNA arrays contribute to the production of junk DNA. The concept that the production of rDNA pseudogenes is a by-product of concerted evolution represents a previously under-appreciated process; we demonstrate here its importance. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  4. The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus)

    DEFF Research Database (Denmark)

    Miller, Webb; Drautz, Daniela I; Janecka, Jan E

    2009-01-01

    We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the ......We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support...... for the thylacine's basal position in Dasyuromorphia, aided by mitochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus). Surprisingly, both of our thylacine sequences differ by 11%-15% from putative thylacine mitochondrial genes in GenBank, with one of our samples originating...... at a very low genetic diversity shortly before extinction. Despite the samples' heavy contamination with bacterial and human DNA and their temperate storage history, we estimate that as much as one-third of the total DNA in each sample is from the thylacine. The microbial content of the two thylacine...

  5. Management of High-Throughput DNA Sequencing Projects: Alpheus.

    Science.gov (United States)

    Miller, Neil A; Kingsmore, Stephen F; Farmer, Andrew; Langley, Raymond J; Mudge, Joann; Crow, John A; Gonzalez, Alvaro J; Schilkey, Faye D; Kim, Ryan J; van Velkinburgh, Jennifer; May, Gregory D; Black, C Forrest; Myers, M Kathy; Utsey, John P; Frost, Nicholas S; Sugarbaker, David J; Bueno, Raphael; Gullans, Stephen R; Baxter, Susan M; Day, Steve W; Retzel, Ernest F

    2008-12-26

    High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem's SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis.

  6. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der [California Univ., San Francisco, CA (United States); Univ. of California, Berkeley, CA (United States)

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  7. Sequence modelling and an extensible data model for genomic database

    Energy Technology Data Exchange (ETDEWEB)

    Li, Peter Wei-Der (California Univ., San Francisco, CA (United States) Lawrence Berkeley Lab., CA (United States))

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

  8. Complete Genome Sequence of Ikoma Lyssavirus

    OpenAIRE

    Marston, Denise A.; Ellis, Richard J.; Horton, Daniel L.; Kuzmin, Ivan V.; Wise, Emma L.; McElhinney, Lorraine M.; Banyard, Ashley C.; Ngeleja, Chanasa; Keyyu, Julius; Cleaveland, Sarah; Lembo, Tiziana; Rupprecht, Charles E.; Fooks, Anthony R.

    2012-01-01

    Lyssaviruses (family Rhabdoviridae) constitute one of the most important groups of viral zoonoses globally. All lyssaviruses cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Currently available vaccines are highly protective against the predominantly circulating lyssavirus species. Using next-generation sequencing technologies, we have obtained the whole-genome sequence for a novel lyssavirus, Ikoma lyssavirus (IKOV), isol...

  9. A nonsense mutation causing decreased levels of insulin receptor mRNA: Detection by a simplified technique for direct sequencing of genomic DNA amplified by the polymerase chain reaction

    International Nuclear Information System (INIS)

    Kadowaki, T.; Kadowaki, H.; Taylor, S.I.

    1990-01-01

    Mutations in the insulin receptor gene can render the cell resistant to the biological action of insulin. The authors have studied a patient with leprechaunism (leprechaun/Minn-1), a genetic syndrome associated with intrauterine growth retardation and extreme insulin resistance. Genomic DNA from the patient was amplified by the polymerase chain reaction catalyzed by Thermus aquaticus (Taq) DNA polymerase, and the amplified DNA was directly sequenced. A nonsense mutations was identified at codon 897 in exon 14 in the paternal allele of the patient's insulin receptor gene. Levels of insulin receptor mRNA are decreased to <10% of normal in Epstein-Barr virus-transformed lymphoblasts and cultured skin fibroblasts from this patient. Thus, this nonsense mutation appears to cause a decrease in the levels of insulin receptor mRNA. In addition, they have obtained indirect evidence that the patient's maternal allele of the insulin receptor gene contains a cis-acting dominant mutation that also decreases the level of mRNA, but by a different mechanism. The nucleotide sequence of the entire protein-coding domain and the sequences of the intron-exon boundaries for all 22 exons of the maternal allele were normal. Presumably, the mutation in the maternal allele maps elsewhere in the insulin receptor gene. Thus, they conclude that the patient is a compound heterozygote for two cis-acting dominant mutations in the insulin receptor gene: (i) a nonsense mutation in the paternal allel that reduces the level of insulin receptor mRNA and (ii) an as yet unidentified mutation in the maternal allele that either decreases the rate of transcription or decreases the stability of the mRNA

  10. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  11. Assembly of the Complete Sitka Spruce Chloroplast Genome Using 10X Genomics' GemCode Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Lauren Coombe

    Full Text Available The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis. Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.

  12. Whole genome sequencing and bioinformatics analysis of two Egyptian genomes.

    Science.gov (United States)

    ElHefnawi, Mahmoud; Jeon, Sungwon; Bhak, Youngjune; ElFiky, Asmaa; Horaiz, Ahmed; Jun, JeHoon; Kim, Hyunho; Bhak, Jong

    2018-05-15

    We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia. Copyright © 2017. Published by Elsevier B.V.

  13. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    Science.gov (United States)

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  14. Genome sequence of the lager brewing yeast, an interspecies hybrid.

    Science.gov (United States)

    Nakao, Yoshihiro; Kanamori, Takeshi; Itoh, Takehiko; Kodama, Yukiko; Rainieri, Sandra; Nakamura, Norihisa; Shimonaga, Tomoko; Hattori, Masahira; Ashikari, Toshihiko

    2009-04-01

    This work presents the genome sequencing of the lager brewing yeast (Saccharomyces pastorianus) Weihenstephan 34/70, a strain widely used in lager beer brewing. The 25 Mb genome comprises two nuclear sub-genomes originating from Saccharomyces cerevisiae and Saccharomyces bayanus and one circular mitochondrial genome originating from S. bayanus. Thirty-six different types of chromosomes were found including eight chromosomes with translocations between the two sub-genomes, whose breakpoints are within the orthologous open reading frames. Several gene loci responsible for typical lager brewing yeast characteristics such as maltotriose uptake and sulfite production have been increased in number by chromosomal rearrangements. Despite an overall high degree of conservation of the synteny with S. cerevisiae and S. bayanus, the syntenies were not well conserved in the sub-telomeric regions that contain lager brewing yeast characteristic and specific genes. Deletion of larger chromosomal regions, a massive unilateral decrease of the ribosomal DNA cluster and bilateral truncations of over 60 genes reflect a post-hybridization evolution process. Truncations and deletions of less efficient maltose and maltotriose uptake genes may indicate the result of adaptation to brewing. The genome sequence of this interspecies hybrid yeast provides a new tool for better understanding of lager brewing yeast behavior in industrial beer production.

  15. Genome-wide survey of repetitive DNA elements in the button mushroom Agaricus bisporus

    NARCIS (Netherlands)

    Foulongne-Oriol, M.; Murat, C.; Castanera, R.; Ramírez, L.; Sonnenberg, A.S.M.

    2013-01-01

    Repetitive DNA elements are ubiquitous constituents of eukaryotic genomes. The biological roles of these repetitive elements, supposed to impact genome organization and evolution, are not completely elucidated yet. The availability of whole genome sequence offers the opportunity to draw a picture of

  16. Sequencing of bovine herpesvirus 4 v.test strain reveals important genome features

    Directory of Open Access Journals (Sweden)

    Gillet Laurent

    2011-08-01

    Full Text Available Abstract Background Bovine herpesvirus 4 (BoHV-4 is a useful model for the human pathogenic gammaherpesviruses Epstein-Barr virus and Kaposi's Sarcoma-associated Herpesvirus. Although genome manipulations of this virus have been greatly facilitated by the cloning of the BoHV-4 V.test strain as a Bacterial Artificial Chromosome (BAC, the lack of a complete genome sequence for this strain limits its experimental use. Methods In this study, we have determined the complete sequence of BoHV-4 V.test strain by a pyrosequencing approach. Results The long unique coding region (LUR consists of 108,241 bp encoding at least 79 open reading frames and is flanked by several polyrepetitive DNA units (prDNA. As previously suggested, we showed that the prDNA unit located at the left prDNA-LUR junction (prDNA-G differs from the other prDNA units (prDNA-inner. Namely, the prDNA-G unit lacks the conserved pac-2 cleavage and packaging signal in its right terminal region. Based on the mechanisms of cleavage and packaging of herpesvirus genomes, this feature implies that only genomes bearing left and right end prDNA units are encapsulated into virions. Conclusions In this study, we have determined the complete genome sequence of the BAC-cloned BoHV-4 V.test strain and identified genome organization features that could be important in other herpesviruses.

  17. Differential DNA Methylation Analysis without a Reference Genome

    Directory of Open Access Journals (Sweden)

    Johanna Klughammer

    2015-12-01

    Full Text Available Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS, which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish. Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org. The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome.

  18. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from total DNA Sequences.

    NARCIS (Netherlands)

    Izan, Shairul; Esselink, G.; Visser, R.G.F.; Smulders, M.J.M.; Borm, T.J.A.

    2017-01-01

    Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This

  19. Genome sequencing for obstetricians & gynaecologists | Kent ...

    African Journals Online (AJOL)

    The medical profession has been waiting for a decade to be invigorated by the sequencing of the human genome, arguably the greatest scientific project ever. The technology has been spectacular but the results of the project have yielded more unexpected results than definitive answers – many about the very nature of our ...

  20. The genomic sequence of cowpea aphid-borne mosaic virus and its similarities with other potyviruses

    NARCIS (Netherlands)

    Mlotshwa, S.; Verver, J.; Sithole-Niang, I.; Kampen, van T.; Kammen, van A.; Wellink, J.

    2002-01-01

    The genomic sequence of a Zimbabwe isolate of Cowpea aphid-borne mosaic virus (CABMV-Z) was determined by sequencing overlapping viral cDNA clones generated by RT-PCR using degenerate and/or specific primers. The sequence is 9465 nucleotides in length excluding the 3' terminal poly (A) tail and

  1. Identification of genomic insertion and flanking sequence of G2-EPSPS and GAT transgenes in soybean using whole genome sequencing method

    Directory of Open Access Journals (Sweden)

    Bingfu Guo

    2016-07-01

    Full Text Available Molecular characterization of sequences flanking exogenous fragment insertions is essential for safety assessment and labeling of genetically modified organisms (GMO. In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS method. About 21 Gb sequence data (~21× coverage for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundary of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of the genomic insertion site of the G2-EPSPS and GAT transgenes will facilitate the use of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS is a cost-effective and rapid method of identifying sites of T-DNA insertions and flanking sequences in soybean.

  2. Sequence heterogeneity accelerates protein search for targets on DNA

    International Nuclear Information System (INIS)

    Shvets, Alexey A.; Kolomeisky, Anatoly B.

    2015-01-01

    The process of protein search for specific binding sites on DNA is fundamentally important since it marks the beginning of all major biological processes. We present a theoretical investigation that probes the role of DNA sequence symmetry, heterogeneity, and chemical composition in the protein search dynamics. Using a discrete-state stochastic approach with a first-passage events analysis, which takes into account the most relevant physical-chemical processes, a full analytical description of the search dynamics is obtained. It is found that, contrary to existing views, the protein search is generally faster on DNA with more heterogeneous sequences. In addition, the search dynamics might be affected by the chemical composition near the target site. The physical origins of these phenomena are discussed. Our results suggest that biological processes might be effectively regulated by modifying chemical composition, symmetry, and heterogeneity of a genome

  3. Sequence heterogeneity accelerates protein search for targets on DNA

    Energy Technology Data Exchange (ETDEWEB)

    Shvets, Alexey A.; Kolomeisky, Anatoly B., E-mail: tolya@rice.edu [Department of Chemistry and Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005 (United States)

    2015-12-28

    The process of protein search for specific binding sites on DNA is fundamentally important since it marks the beginning of all major biological processes. We present a theoretical investigation that probes the role of DNA sequence symmetry, heterogeneity, and chemical composition in the protein search dynamics. Using a discrete-state stochastic approach with a first-passage events analysis, which takes into account the most relevant physical-chemical processes, a full analytical description of the search dynamics is obtained. It is found that, contrary to existing views, the protein search is generally faster on DNA with more heterogeneous sequences. In addition, the search dynamics might be affected by the chemical composition near the target site. The physical origins of these phenomena are discussed. Our results suggest that biological processes might be effectively regulated by modifying chemical composition, symmetry, and heterogeneity of a genome.

  4. Compressing DNA sequence databases with coil

    Directory of Open Access Journals (Sweden)

    Hendy Michael D

    2008-05-01

    Full Text Available Abstract Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  5. DNA sequence responsible for the amplification of adjacent genes.

    Science.gov (United States)

    Pasion, S G; Hartigan, J A; Kumar, V; Biswas, D K

    1987-10-01

    A 10.3-kb DNA fragment in the 5'-flanking region of the rat prolactin (rPRL) gene was isolated from F1BGH(1)2C1, a strain of rat pituitary tumor cells (GH cells) that produces prolactin in response to 5-bromodeoxyuridine (BrdU). Following transfection and integration into genomic DNA of recipient mouse L cells, this DNA induced amplification of the adjacent thymidine kinase gene from Herpes simplex virus type 1 (HSV1TK). We confirmed the ability of this "Amplicon" sequence to induce amplification of other linked or unlinked genes in DNA-mediated gene transfer studies. When transferred into the mouse L cells with the 10.3-5'rPRL gene sequence of BrdU-responsive cells, both the human growth hormone and the HSV1TK genes are amplified in response to 5-bromodeoxyuridine. This observation is substantiated by BrdU-induced amplification of the cotransferred bacterial Neo gene. Cotransfection studies reveal that the BrdU-induced amplification capability is associated with a 4-kb DNA sequence in the 5'-flanking region of the rPRL gene of BrdU-responsive cells. These results demonstrate that genes of heterologous origin, linked or unlinked, and selected or unselected, can be coamplified when located within the amplification boundary of the Amplicon sequence.

  6. The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants.

    Science.gov (United States)

    Reuter, Miriam S; Walker, Susan; Thiruvahindrapuram, Bhooma; Whitney, Joe; Cohn, Iris; Sondheimer, Neal; Yuen, Ryan K C; Trost, Brett; Paton, Tara A; Pereira, Sergio L; Herbrick, Jo-Anne; Wintle, Richard F; Merico, Daniele; Howe, Jennifer; MacDonald, Jeffrey R; Lu, Chao; Nalpathamkalam, Thomas; Sung, Wilson W L; Wang, Zhuozhi; Patel, Rohan V; Pellecchia, Giovanna; Wei, John; Strug, Lisa J; Bell, Sherilyn; Kellam, Barbara; Mahtani, Melanie M; Bassett, Anne S; Bombard, Yvonne; Weksberg, Rosanna; Shuman, Cheryl; Cohn, Ronald D; Stavropoulos, Dimitri J; Bowdin, Sarah; Hildebrandt, Matthew R; Wei, Wei; Romm, Asli; Pasceri, Peter; Ellis, James; Ray, Peter; Meyn, M Stephen; Monfared, Nasim; Hosseini, S Mohsen; Joseph-George, Ann M; Keeley, Fred W; Cook, Ryan A; Fiume, Marc; Lee, Hin C; Marshall, Christian R; Davies, Jill; Hazell, Allison; Buchanan, Janet A; Szego, Michael J; Scherer, Stephen W

    2018-02-05

    The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers. Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant. Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set ( n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants - associated with cancer, cardiac or neurodegenerative phenotypes - remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual. Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care. © 2018 Joule Inc. or its licensors.

  7. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    OpenAIRE

    Arabi E. keshk

    2014-01-01

    The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between se...

  8. DNA Microarrays in Comparative Genomics and Transcriptomics

    DEFF Research Database (Denmark)

    Willenbrock, Hanni

    2007-01-01

    at identifying the exact breakpoints where DNA has been gained or lost. In this thesis, three popular methods are compared and a realistic simulation model is presented for generating artificial data with known breakpoints and known DNA copy number. By using simulated data, we obtain a realistic evaluation......During the past few years, innovations in the DNA sequencing technology has led to an explosion in available DNA sequence information. This has revolutionized biological research and promoted the development of high throughput analysis methods that can take advantage of the vast amount of sequence...... data. For this, the DNA microarray technology has gained enormous popularity due to its ability to measure the presence or the activity of thousands of genes simultaneously. Microarrays for high throughput data analyses are not limited to a few organisms but may be applied to everything from bacteria...

  9. Pericentric satellite DNA sequences in Pipistrellus pipistrellus (Vespertilionidae; Chiroptera).

    Science.gov (United States)

    Barragán, M J L; Martínez, S; Marchal, J A; Fernández, R; Bullejos, M; Díaz de la Guardia, R; Sánchez, A

    2003-09-01

    This paper reports the molecular and cytogenetic characterization of a HindIII family of satellite DNA in the bat species Pipistrellus pipistrellus. This satellite is organized in tandem repeats of 418 bp monomer units, and represents approximately 3% of the whole genome. The consensus sequence from five cloned monomer units has an A-T content of 62.20%. We have found differences in the ladder pattern of bands between two populations of the same species. These differences are probably because of the absence of the target sites for the HindIII enzyme in most monomer units of one population, but not in the other. Fluorescent in situ hybridization (FISH) localized the satellite DNA in the pericentromeric regions of all autosomes and the X chromosome, but it was absent from the Y chromosome. Digestion of genomic DNAs with HpaII and its isoschizomer MspI demonstrated that these repetitive DNA sequences are not methylated. Other bat species were tested for the presence of this repetitive DNA. It was absent in five Vespertilionidae and one Rhinolophidae species, indicating that it could be a species/genus specific, repetitive DNA family.

  10. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    Science.gov (United States)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  11. Isolation of Retroelement from Plant Genomic DNA

    OpenAIRE

    sprotocols

    2014-01-01

    Author: Pat Heslop-Harrison ### Abstract: Retroelements and their derivatives are an ubiquitous and abundant component of plant genomes. From the 1990s, PCR based techniques have been developed to isolate the elements from genomic DNA of different plants, and the methods and primers used are presented here. Major classes of retroelements include the Ty1-copia, the Ty3-gypsy and the LINE (non-LTR) groups. Mixed PCR products representing the full heterogeneous pool of retrotransposo...

  12. Agaricus bisporus genome sequence: a commentary.

    Science.gov (United States)

    Kerrigan, Richard W; Challen, Michael P; Burton, Kerry S

    2013-06-01

    The genomes of two isolates of Agaricus bisporus have been sequenced recently. This soil-inhabiting fungus has a wide geographical distribution in nature and it is also cultivated in an industrialized indoor process ($4.7bn annual worldwide value) to produce edible mushrooms. Previously this lignocellulosic fungus has resisted precise econutritional classification, i.e. into white- or brown-rot decomposers. The generation of the genome sequence and transcriptomic analyses has revealed a new classification, 'humicolous', for species adapted to grow in humic-rich, partially decomposed leaf material. The Agaricus biporus genomes contain a collection of polysaccharide and lignin-degrading genes and more interestingly an expanded number of genes (relative to other lignocellulosic fungi) that enhance degradation of lignin derivatives, i.e. heme-thiolate peroxidases and β-etherases. A motif that is hypothesized to be a promoter element in the humicolous adaptation suite is present in a large number of genes specifically up-regulated when the mycelium is grown on humic-rich substrate. The genome sequence of A. bisporus offers a platform to explore fungal biology in carbon-rich soil environments and terrestrial cycling of carbon, nitrogen, phosphorus and potassium. Copyright © 2013 Elsevier Inc. All rights reserved.

  13. Genome sequence of Aspergillus luchuensis NBRC 4314

    Science.gov (United States)

    Yamada, Osamu; Machida, Masayuki; Hosoyama, Akira; Goto, Masatoshi; Takahashi, Toru; Futagami, Taiki; Yamagata, Youhei; Takeuchi, Michio; Kobayashi, Tetsuo; Koike, Hideaki; Abe, Keietsu; Asai, Kiyoshi; Arita, Masanori; Fujita, Nobuyuki; Fukuda, Kazuro; Higa, Ken-ichi; Horikawa, Hiroshi; Ishikawa, Takeaki; Jinno, Koji; Kato, Yumiko; Kirimura, Kohtaro; Mizutani, Osamu; Nakasone, Kaoru; Sano, Motoaki; Shiraishi, Yohei; Tsukahara, Masatoshi; Gomi, Katsuya

    2016-01-01

    Awamori is a traditional distilled beverage made from steamed Thai-Indica rice in Okinawa, Japan. For brewing the liquor, two microbes, local kuro (black) koji mold Aspergillus luchuensis and awamori yeast Saccharomyces cerevisiae are involved. In contrast, that yeasts are used for ethanol fermentation throughout the world, a characteristic of Japanese fermentation industries is the use of Aspergillus molds as a source of enzymes for the maceration and saccharification of raw materials. Here we report the draft genome of a kuro (black) koji mold, A. luchuensis NBRC 4314 (RIB 2604). The total length of nonredundant sequences was nearly 34.7 Mb, comprising approximately 2,300 contigs with 16 telomere-like sequences. In total, 11,691 genes were predicted to encode proteins. Most of the housekeeping genes, such as transcription factors and N-and O-glycosylation system, were conserved with respect to Aspergillus niger and Aspergillus oryzae. An alternative oxidase and acid-stable α-amylase regarding citric acid production and fermentation at a low pH as well as a unique glutamic peptidase were also found in the genome. Furthermore, key biosynthetic gene clusters of ochratoxin A and fumonisin B were absent when compared with A. niger genome, showing the safety of A. luchuensis for food and beverage production. This genome information will facilitate not only comparative genomics with industrial kuro-koji molds, but also molecular breeding of the molds in improvements of awamori fermentation. PMID:27651094

  14. Genome-wide DNA methylation sequencing reveals miR-663a is a novel epimutation candidate in CIMP-high endometrial cancer

    OpenAIRE

    Yanokura, Megumi; Banno, Kouji; Adachi, Masataka; Aoki, Daisuke; Abe, Kuniya

    2017-01-01

    Aberrant DNA methylation is widely observed in many cancers. Concurrent DNA methylation of multiple genes occurs in endometrial cancer and is referred to as the CpG island methylator phenotype (CIMP). However, the features and causes of CIMP-positive endometrial cancer are not well understood. To investigate DNA methylation features characteristic to CIMP-positive endometrial cancer, we first classified samples from 25 patients with endometrial cancer based on the methylation status of three ...

  15. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-10-24

    Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity

  16. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic

  17. Second generation sequencing of the mesothelioma tumor genome.

    Directory of Open Access Journals (Sweden)

    Raphael Bueno

    2010-05-01

    Full Text Available The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM tumor and matched normal tissue by using a combination of sequencing-by-synthesis and pyrosequencing methodologies to a 9.6X depth of coverage. Read density analysis uncovered significant aneuploidy and numerous rearrangements. Method-dependent informatics rules, which combined the results of different sequencing platforms, were developed to identify and validate candidate mutations of multiple types. Many more tumor-specific rearrangements than point mutations were uncovered at this depth of sequencing, resulting in novel, large-scale, inter- and intra-chromosomal deletions, inversions, and translocations. Nearly all candidate point mutations appeared to be previously unknown SNPs. Thirty tumor-specific fusions/translocations were independently validated with PCR and Sanger sequencing. Of these, 15 represented disrupted gene-encoding regions, including kinases, transcription factors, and growth factors. One large deletion in DPP10 resulted in altered transcription and expression of DPP10 transcripts in a set of 53 additional MPM tumors correlated with survival. Additionally, three point mutations were observed in the coding regions of NKX6-2, a transcription regulator, and NFRKB, a DNA-binding protein involved in modulating NFKB1. Several regions containing genes such as PCBD2 and DHFR, which are involved in growth factor signaling and nucleotide synthesis, respectively, were selectively amplified in the tumor. Second-generation sequencing uncovered all types of mutations in this MPM tumor, with DNA rearrangements representing the dominant type.

  18. Whole genome sequence and genome annotation of Colletotrichum acutatum, causal agent of anthracnose in pepper plants in South Korea.

    Science.gov (United States)

    Han, Joon-Hee; Chon, Jae-Kyung; Ahn, Jong-Hwa; Choi, Ik-Young; Lee, Yong-Hwan; Kim, Kyoung Su

    2016-06-01

    Colletotrichum acutatum is a destructive fungal pathogen which causes anthracnose in a wide range of crops. Here we report the whole genome sequence and annotation of C. acutatum strain KC05, isolated from an infected pepper in Kangwon, South Korea. Genomic DNA from the KC05 strain was used for the whole genome sequencing using a PacBio sequencer and the MiSeq system. The KC05 genome was determined to be 52,190,760 bp in size with a G + C content of 51.73% in 27 scaffolds and to contain 13,559 genes with an average length of 1516 bp. Gene prediction and annotation were performed by incorporating RNA-Seq data. The genome sequence of the KC05 was deposited at DDBJ/ENA/GenBank under the accession number LUXP00000000.

  19. An intact sequence-specific DNA-binding domain is required for human cytomegalovirus-mediated sequestration of p53 and may promote in vivo binding to the viral genome during infection

    International Nuclear Information System (INIS)

    Rosenke, Kyle; Samuel, Melanie A.; McDowell, Eric T.; Toerne, Melissa A.; Fortunato, Elizabeth A.

    2006-01-01

    The p53 protein is stabilized during infection of primary human fibroblasts with human cytomegalovirus (HCMV). However, the p53 in HCMV-infected cells is unable to activate its downstream targets. HCMV accomplishes this inactivation, at least in part, by sequestering p53 into viral replication centers within the cell's nucleus soon after they are established. In order to better understand the interplay between HCMV and p53 and the mechanism of sequestration, we constructed a panel of mutant p53-GFP fusion constructs for use in transfection/infection experiments. These mutants affected several post-translational modification sites and several sites within the central sequence-specific DNA-binding domain of the protein. Two categories of p53 sequestration were observed when the mutant constructs were transfected into primary fibroblasts and then infected at either high or low multiplicity. The first category, including all of the post-translational modification mutants, showed sequestration comparable to a wild-type (wt) control, while the second category, mutants affecting the DNA-binding core, were not specifically sequestered above control GFP levels. This suggested that the DNA-binding ability of the protein was required for sequestration. When the HCMV genome was analyzed for p53 consensus binding sites, 21 matches were found, which localized either to the promoters or the coding regions of viral proteins involved in DNA replication and processing as well as structural proteins. An analysis of in vivo binding to these identified sites via chromatin immunoprecipitation assays revealed differential binding to several of the sites over the course of infection

  20. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    Segmental duplications are >1kb segments of duplicated DNA present in a genome with high sequence identity (>90%). They are associated with genomic rearrangements and provide a significant source of gene and genome evolution within mammalian genomes. Although segmental duplications have been...... extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... and their associated copy number alterations, focusing on the global organization of these segments and their possible functional significance in porcine phenotypes. This work provides insights into mammalian genome evolution and generates a valuable resource for porcine genomics research...

  1. Genomic gigantism: DNA loss is slow in mountain grasshoppers.

    Science.gov (United States)

    Bensasson, D; Petrov, D A; Zhang, D X; Hartl, D L; Hewitt, G M

    2001-02-01

    Several studies have shown DNA loss to be inversely correlated with genome size in animals. These studies include a comparison between Drosophila and the cricket, Laupala, but there has been no assessment of DNA loss in insects with very large genomes. Podisma pedestris, the brown mountain grasshopper, has a genome over 100 times as large as that of Drosophila and 10 times as large as that of Laupala. We used 58 paralogous nuclear pseudogenes of mitochondrial origin to study the characteristics of insertion, deletion, and point substitution in P. pedestris and Italopodisma. In animals, these pseudogenes are "dead on arrival"; they are abundant in many different eukaryotes, and their mitochondrial origin simplifies the identification of point substitutions accumulated in nuclear pseudogene lineages. There appears to be a mononucleotide repeat within the 643-bp pseudogene sequence studied that acts as a strong hot spot for insertions or deletions (indels). Because the data for other insect species did not contain such an unusual region, hot spots were excluded from species comparisons. The rate of DNA loss relative to point substitution appears to be considerably and significantly lower in the grasshoppers studied than in Drosophila or Laupala. This suggests that the inverse correlation between genome size and the rate of DNA loss can be extended to comparisons between insects with large or gigantic genomes (i.e., Laupala and Podisma). The low rate of DNA loss implies that in grasshoppers, the accumulation of point mutations is a more potent force for obscuring ancient pseudogenes than their loss through indel accumulation, whereas the reverse is true for Drosophila. The main factor contributing to the difference in the rates of DNA loss estimated for grasshoppers, crickets, and Drosophila appears to be deletion size. Large deletions are relatively rare in Podisma and Italopodisma.

  2. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS

    Directory of Open Access Journals (Sweden)

    Alves-Ferreira Marcelo

    2008-09-01

    Full Text Available Abstract Background Genome survey sequences (GSS offer a preliminary global view of a genome since, unlike ESTs, they cover coding as well as non-coding DNA and include repetitive regions of the genome. A more precise estimation of the nature, quantity and variability of repetitive sequences very early in a genome sequencing project is of considerable importance, as such data strongly influence the estimation of genome coverage, library quality and progress in scaffold construction. Also, the elimination of repetitive sequences from the initial assembly process is important to avoid errors and unnecessary complexity. Repetitive sequences are also of interest in a variety of other studies, for instance as molecular markers. Results We designed and implemented a straightforward pipeline called ReRep, which combines bioinformatics tools for identifying repetitive structures in a GSS dataset. In a case study, we first applied the pipeline to a set of 970 GSSs, sequenced in our laboratory from the human pathogen Leishmania braziliensis, the causative agent of leishmaniosis, an important public health problem in Brazil. We also verified the applicability of ReRep to new sequencing technologies using a set of 454-reads of an Escheria coli. The behaviour of several parameters in the algorithm is evaluated and suggestions are made for tuning of the analysis. Conclusion The ReRep approach for identification of repetitive elements in GSS datasets proved to be straightforward and efficient. Several potential repetitive sequences were found in a L. braziliensis GSS dataset generated in our laboratory, and further validated by the analysis of a more complete genomic dataset from the EMBL and Sanger Centre databases. ReRep also identified most of the E. coli K12 repeats prior to assembly in an example dataset obtained by automated sequencing using 454 technology. The parameters controlling the algorithm behaved consistently and may be tuned to the properties

  3. Enhanced throughput for infrared automated DNA sequencing

    Science.gov (United States)

    Middendorf, Lyle R.; Gartside, Bill O.; Humphrey, Pat G.; Roemer, Stephen C.; Sorensen, David R.; Steffens, David L.; Sutter, Scott L.

    1995-04-01

    Several enhancements have been developed and applied to infrared automated DNA sequencing resulting in significantly higher throughput. A 41 cm sequencing gel (31 cm well- to-read distance) combines high resolution of DNA sequencing fragments with optimized run times yielding two runs per day of 500 bases per sample. A 66 cm sequencing gel (56 cm well-to-read distance) produces sequence read lengths of up to 1000 bases for ds and ss templates using either T7 polymerase or cycle-sequencing protocols. Using a multichannel syringe to load 64 lanes allows 16 samples (compatible with 96-well format) to be visualized for each run. The 41 cm gel configuration allows 16,000 bases per day (16 samples X 500 bases/sample X 2 ten hour runs/day) to be sequenced with the advantages of infrared technology. Enhancements to internal labeling techniques using an infrared-labeled dATP molecule (Boehringer Mannheim GmbH, Penzberg, Germany; Sequenase (U.S. Biochemical) have also been made. The inclusion of glycerol in the sequencing reactions yields greatly improved results for some primer and template combinations. The inclusion of (alpha) -Thio-dNTP's in the labeling reaction increases signal intensity two- to three-fold.

  4. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-01-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  5. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  6. Resurrection of DNA function in vivo from an extinct genome.

    Directory of Open Access Journals (Sweden)

    Andrew J Pask

    2008-05-01

    Full Text Available There is a burgeoning repository of information available from ancient DNA that can be used to understand how genomes have evolved and to determine the genetic features that defined a particular species. To assess the functional consequences of changes to a genome, a variety of methods are needed to examine extinct DNA function. We isolated a transcriptional enhancer element from the genome of an extinct marsupial, the Tasmanian tiger (Thylacinus cynocephalus or thylacine, obtained from 100 year-old ethanol-fixed tissues from museum collections. We then examined the function of the enhancer in vivo. Using a transgenic approach, it was possible to resurrect DNA function in transgenic mice. The results demonstrate that the thylacine Col2A1 enhancer directed chondrocyte-specific expression in this extinct mammalian species in the same way as its orthologue does in mice. While other studies have examined extinct coding DNA function in vitro, this is the first example of the restoration of extinct non-coding DNA and examination of its function in vivo. Our method using transgenesis can be used to explore the function of regulatory and protein-coding sequences obtained from any extinct species in an in vivo model system, providing important insights into gene evolution and diversity.

  7. Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5-2 isolated from fermented soybean.

    Science.gov (United States)

    Liu, Chen-Jian; Wang, Rui; Gong, Fu-Ming; Liu, Xiao-Feng; Zheng, Hua-Jun; Luo, Yi-Yong; Li, Xiao-Ran

    2015-12-01

    Lactobacillus plantarum is an important probiotic and is mostly isolated from fermented foods. We sequenced the genome of L. plantarum strain 5-2, which was derived from fermented soybean isolated from Yunnan province, China. The strain was determined to contain 3114 genes. Fourteen complete insertion sequence (IS) elements were found in 5-2 chromosome. There were 24 DNA replication proteins and 76 DNA repair proteins in the 5-2 genome. Consistent with the classification of L. plantarum as a facultative heterofermentative lactobacillus, the 5-2 genome encodes key enzymes required for the EMP (Embden-Meyerhof-Parnas) and phosphoketolase (PK) pathways. Several components of the secretion machinery are found in the 5-2 genome, which was compared with L. plantarum ST-III, JDM1 and WCFS1. Most of the specific proteins in the four genomes appeared to be related to their prophage elements. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Sequencing of a Cultivated Diploid Cotton Genome-Gossypium arboreum

    Institute of Scientific and Technical Information of China (English)

    WILKINS; Thea; A

    2008-01-01

    Sequencing the genomes of crop species and model systems contributes significantly to our understanding of the organization,structure and function of plant genomes.In a `white paper' published in 2007,the cotton community set forth a strategic plan for sequencing the AD genome of cultivated upland cotton that initially targets less complex diploid genomes.This strategy banks on the high degree

  9. Analysis of high-throughput sequencing and annotation strategies for phage genomes.

    Directory of Open Access Journals (Sweden)

    Matthew R Henn

    Full Text Available BACKGROUND: Bacterial viruses (phages play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPAL FINDINGS: To advance the study of phage biology through comparative genomic approaches we used marine cyanophage as a model system. We compared DNA preparation methodologies (DNA extraction directly from either phage lysates or CsCl purified phage particles, and sequencing strategies that utilize either Sanger sequencing of a linker amplification shotgun library (LASL or of a whole genome shotgun library (WGSL, or 454 pyrosequencing methods. We demonstrate that genomic DNA sample preparation directly from a phage lysate, combined with 454 pyrosequencing, is best suited for phage genome sequencing at scale, as this method is capable of capturing complete continuous genomes with high accuracy. In addition, we describe an automated annotation informatics pipeline that delivers high-quality annotation and yields few false positives and negatives in ORF calling. CONCLUSIONS/SIGNIFICANCE: These DNA preparation, sequencing and annotation strategies enable a high-throughput approach to the burgeoning field of phage genomics.

  10. Reducing assembly complexity of microbial genomes with single-molecule sequencing

    Science.gov (United States)

    Genome assembly algorithms cannot fully reconstruct microbial chromosomes from the DNA reads output by first or second-generation sequencing instruments. Therefore, most genomes are left unfinished due to the significant resources required to manually close gaps left in the draft assemblies. Single-...

  11. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer

    NARCIS (Netherlands)

    Wang, Kai; Yuen, Siu Tsan; Xu, Jiangchun; Lee, Siu Po; Yan, Helen H N; Shi, Stephanie T; Siu, Hoi Cheong; Deng, Shibing; Chu, Kent Man; Law, Simon; Chan, Kok Hoe; Chan, Annie S Y; Tsui, Wai Yin; Ho, Siu Lun; Chan, Anthony K W; Man, Jonathan L K; Foglizzo, Valentina; Ng, Man Kin; Chan, April S; Ching, Yick Pang; Cheng, Grace H W; Xie, Tao; Fernandez, Julio; Li, Vivian S W; Clevers, Hans; Rejto, Paul A; Mao, Mao; Leung, Suet Yi

    Gastric cancer is a heterogeneous disease with diverse molecular and histological subtypes. We performed whole-genome sequencing in 100 tumor-normal pairs, along with DNA copy number, gene expression and methylation profiling, for integrative genomic analysis. We found subtype-specific genetic and

  12. The genome sequence of a widespread apex Predator, the golden eagle (Aquila chrysaetos)

    Science.gov (United States)

    Jacqueline M. Doyle; Todd E. Katzner; Peter H. Bloom; Yanzhu Ji; Bhagya K. Wijayawardena; J. Andrew DeWoody; Ludovic. Orlando

    2014-01-01

    Biologists routinely use molecular markers to identify conservation units, to quantify genetic connectivity, to estimate population sizes, and to identify targets of selection. Many imperiled eagle populations require such efforts and would benefit from enhanced genomic resources. We sequenced, assembled, and annotated the first eagle genome using DNA from a male...

  13. DNA Qualification Workflow for Next Generation Sequencing of Histopathological Samples

    Science.gov (United States)

    Simbolo, Michele; Gottardi, Marisa; Corbo, Vincenzo; Fassan, Matteo; Mafficini, Andrea; Malpeli, Giorgio; Lawlor, Rita T.; Scarpa, Aldo

    2013-01-01

    Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA) and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR) was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF) tissues, 6 formalin-fixed paraffin-embedded (FFPE) tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard workflow for

  14. DNA qualification workflow for next generation sequencing of histopathological samples.

    Directory of Open Access Journals (Sweden)

    Michele Simbolo

    Full Text Available Histopathological samples are a treasure-trove of DNA for clinical research. However, the quality of DNA can vary depending on the source or extraction method applied. Thus a standardized and cost-effective workflow for the qualification of DNA preparations is essential to guarantee interlaboratory reproducible results. The qualification process consists of the quantification of double strand DNA (dsDNA and the assessment of its suitability for downstream applications, such as high-throughput next-generation sequencing. We tested the two most frequently used instrumentations to define their role in this process: NanoDrop, based on UV spectroscopy, and Qubit 2.0, which uses fluorochromes specifically binding dsDNA. Quantitative PCR (qPCR was used as the reference technique as it simultaneously assesses DNA concentration and suitability for PCR amplification. We used 17 genomic DNAs from 6 fresh-frozen (FF tissues, 6 formalin-fixed paraffin-embedded (FFPE tissues, 3 cell lines, and 2 commercial preparations. Intra- and inter-operator variability was negligible, and intra-methodology variability was minimal, while consistent inter-methodology divergences were observed. In fact, NanoDrop measured DNA concentrations higher than Qubit and its consistency with dsDNA quantification by qPCR was limited to high molecular weight DNA from FF samples and cell lines, where total DNA and dsDNA quantity virtually coincide. In partially degraded DNA from FFPE samples, only Qubit proved highly reproducible and consistent with qPCR measurements. Multiplex PCR amplifying 191 regions of 46 cancer-related genes was designated the downstream application, using 40 ng dsDNA from FFPE samples calculated by Qubit. All but one sample produced amplicon libraries suitable for next-generation sequencing. NanoDrop UV-spectrum verified contamination of the unsuccessful sample. In conclusion, as qPCR has high costs and is labor intensive, an alternative effective standard

  15. Continued colonization of the human genome by mitochondrial DNA.

    Directory of Open Access Journals (Sweden)

    Miria Ricchetti

    2004-09-01

    Full Text Available Integration of mitochondrial DNA fragments into nuclear chromosomes (giving rise to nuclear DNA sequences of mitochondrial origin, or NUMTs is an ongoing process that shapes nuclear genomes. In yeast this process depends on double-strand-break repair. Since NUMTs lack amplification and specific integration mechanisms, they represent the prototype of exogenous insertions in the nucleus. From sequence analysis of the genome of Homo sapiens, followed by sampling humans from different ethnic backgrounds, and chimpanzees, we have identified 27 NUMTs that are specific to humans and must have colonized human chromosomes in the last 4-6 million years. Thus, we measured the fixation rate of NUMTs in the human genome. Six such NUMTs show insertion polymorphism and provide a useful set of DNA markers for human population genetics. We also found that during recent human evolution, Chromosomes 18 and Y have been more susceptible to colonization by NUMTs. Surprisingly, 23 out of 27 human-specific NUMTs are inserted in known or predicted genes, mainly in introns. Some individuals carry a NUMT insertion in a tumor-suppressor gene and in a putative angiogenesis inhibitor. Therefore in humans, but not in yeast, NUMT integrations preferentially target coding or regulatory sequences. This is indeed the case for novel insertions associated with human diseases and those driven by environmental insults. We thus propose a mutagenic phenomenon that may be responsible for a variety of genetic diseases in humans and suggest that genetic or environmental factors that increase the frequency of chromosome breaks provide the impetus for the continued colonization of the human genome by mitochondrial DNA.

  16. Human genome sequencing with direct x-ray holographic imaging

    International Nuclear Information System (INIS)

    Rhodes, C.K.

    1993-01-01

    Direct holographic imaging of biological materials is widely applicable to the study of the structure, properties and action of genetic material. This particular application involves the sequencing of the human genome where prospective genomic imaging technology is composed of three subtechnologies, name an x-ray holographic camera, suitable chemistry and enzymology for the preparation of tagged DNA samples, and the illuminator in the form of an x-ray laser. We report appropriate x-ray camera, embodied by the instrument developed by MCR, is available and that suitable chemical and enzymatic procedures exist for the preparation of the necessary tagged DNA strands. Concerning the future development of the x-ray illuminator. We find that a practical small scale x-ray light source is indeed feasible. This outcome requires the use of unconventional physical processes in order to achieve the necessary power-compression in the amplifying medium. The understanding of these new physical mechanisms is developing rapidly. Importantly, although the x-ray source does not currently exist, the understanding of these new physical mechanisms is developing rapidly and the research has established the basic scaling laws that will determine the properties of the x-ray illuminator. When this x-ray source becomes available, an extremely rapid and cost effective instrument for 3-D imaging of biological materials can be applied to a wide range of biological structural assays, including the base-pair sequencing of the human genome and many questions regarding its higher levels of organization

  17. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.

    Science.gov (United States)

    Kwok, Hin; Chiang, Alan Kwok Shing

    2016-02-24

    Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  18. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes

    Directory of Open Access Journals (Sweden)

    Hin Kwok

    2016-02-01

    Full Text Available Genomic sequences of Epstein–Barr virus (EBV have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  19. From Genome Sequence to Taxonomy - A Skeptic’s View

    DEFF Research Database (Denmark)

    Özen, Asli Ismihan; Vesth, Tammi Camilla; Ussery, David

    2012-01-01

    The relative ease of sequencing bacterial genomes has resulted in thousands of sequenced bacterial genomes available in the public databases. This same technology now allows for using the entire genome sequence as an identifier for an organism. There are many methods available which attempt to us...

  20. Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing.

    Science.gov (United States)

    Hribová, Eva; Neumann, Pavel; Matsumoto, Takashi; Roux, Nicolas; Macas, Jirí; Dolezel, Jaroslav

    2010-09-16

    Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic

  1. Inheritance of the yeast mitochondrial genome

    DEFF Research Database (Denmark)

    Piskur, Jure

    1994-01-01

    Mitochondrion, extrachromosomal genetics, intergenic sequences, genome size, mitochondrial DNA, petite mutation, yeast......Mitochondrion, extrachromosomal genetics, intergenic sequences, genome size, mitochondrial DNA, petite mutation, yeast...

  2. Structural properties of replication origins in yeast DNA sequences

    International Nuclear Information System (INIS)

    Cao Xiaoqin; Zeng Jia; Yan Hong

    2008-01-01

    Sequence-dependent DNA flexibility is an important structural property originating from the DNA 3D structure. In this paper, we investigate the DNA flexibility of the budding yeast (S. Cerevisiae) replication origins on a genome-wide scale using flexibility parameters from two different models, the trinucleotide and the tetranucleotide models. Based on analyzing average flexibility profiles of 270 replication origins, we find that yeast replication origins are significantly rigid compared with their surrounding genomic regions. To further understand the highly distinctive property of replication origins, we compare the flexibility patterns between yeast replication origins and promoters, and find that they both contain significantly rigid DNAs. Our results suggest that DNA flexibility is an important factor that helps proteins recognize and bind the target sites in order to initiate DNA replication. Inspired by the role of the rigid region in promoters, we speculate that the rigid replication origins may facilitate binding of proteins, including the origin recognition complex (ORC), Cdc6, Cdt1 and the MCM2-7 complex

  3. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    Science.gov (United States)

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  4. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  5. Complete Genome Sequence of the Halophilic Methylotrophic Methanogen Archaeon Methanohalophilus portucalensis Strain FDF-1T

    KAUST Repository

    L’Haridon, Stéphane

    2018-01-17

    We report here the complete genome sequence (2.08 Mb) of Methanohalophilus portucalensis strain FDF-1T, a halophilic methylotrophic methanogen isolated from the sediment of a saltern in Figeria da Foz, Portugal. The average nucleotide identity and DNA-DNA hybridization analyses show that Methanohalophilus mahii, M. halophilus, and M. portucalensis are three different species within the Methanosarcinaceae family.

  6. Complete Genome Sequence of the Halophilic Methylotrophic Methanogen Archaeon Methanohalophilus portucalensis Strain FDF-1T

    KAUST Repository

    L’ Haridon, Sté phane; Corre, Erwan; Guan, Yue; Vinu, Manikandan; La Cono, Violetta; Yakimov, Michail; Stingl, Ulrich; Toffin, Laurent; Jebbar, Mohamed

    2018-01-01

    We report here the complete genome sequence (2.08 Mb) of Methanohalophilus portucalensis strain FDF-1T, a halophilic methylotrophic methanogen isolated from the sediment of a saltern in Figeria da Foz, Portugal. The average nucleotide identity and DNA-DNA hybridization analyses show that Methanohalophilus mahii, M. halophilus, and M. portucalensis are three different species within the Methanosarcinaceae family.

  7. Analysis of the complete DNA sequence of the temperate bacteriophage TP901-1: Evolution, structure, and genome organization of lactococcal bacteriophages

    DEFF Research Database (Denmark)

    Brøndsted, Lone; Østergaard, Solvej; Pedersen, Margit

    2001-01-01

    A complete analysis of the entire genome of the temperate lactococcal bacteriophage TP901-1 has been performed and the function of 21 of 56 TP901-1-encoded ORFs has been assigned. This knowledge has been used to propose 10 functional modules each responsible for specific functions during...

  8. DNA Sequence Patterns – A Successful Example of Grid Computing in Genome Research and Building Virtual Super-Computers for the Research Commons of e-Societies

    NARCIS (Netherlands)

    T.A. Knoch (Tobias); A. Abuseiris (Anis); M. Lesnussa (Michael); F.N. Kepper (Nick); R.M. de Graaf (Rob); F.G. Grosveld (Frank)

    2011-01-01

    textabstractThe amount of information is growing exponentially with ever-new technologies emerging and is believed to be always at the limit. In contrast, huge resources are obviously available, which are underused in the IT sector, similar as e.g. in the renewable energy sector. Genome research is

  9. Solution-based targeted genomic enrichment for precious DNA samples

    Directory of Open Access Journals (Sweden)

    Shearer Aiden

    2012-05-01

    Full Text Available Abstract Background Solution-based targeted genomic enrichment (TGE protocols permit selective sequencing of genomic regions of interest on a massively parallel scale. These protocols could be improved by: 1 modifying or eliminating time consuming steps; 2 increasing yield to reduce input DNA and excessive PCR cycling; and 3 enhancing reproducible. Results We developed a solution-based TGE method for downstream Illumina sequencing in a non-automated workflow, adding standard Illumina barcode indexes during the post-hybridization amplification to allow for sample pooling prior to sequencing. The method utilizes Agilent SureSelect baits, primers and hybridization reagents for the capture, off-the-shelf reagents for the library preparation steps, and adaptor oligonucleotides for Illumina paired-end sequencing purchased directly from an oligonucleotide manufacturing company. Conclusions This solution-based TGE method for Illumina sequencing is optimized for small- or medium-sized laboratories and addresses the weaknesses of standard protocols by reducing the amount of input DNA required, increasing capture yield, optimizing efficiency, and improving reproducibility.

  10. The complete chloroplast genome sequence of Dendrobium officinale.

    Science.gov (United States)

    Yang, Pei; Zhou, Hong; Qian, Jun; Xu, Haibin; Shao, Qingsong; Li, Yonghua; Yao, Hui

    2016-01-01

    The complete chloroplast sequence of Dendrobium officinale, an endangered and economically important traditional Chinese medicine, was reported and characterized. The genome size is 152,018 bp, with 37.5% GC content. A pair of inverted repeats (IRs) of 26,284 bp are separated by a large single-copy region (LSC, 84,944 bp) and a small single-copy region (SSC, 14,506 bp). The complete cp DNA contains 83 protein-coding genes, 39 tRNA genes and 8 rRNA genes. Fourteen genes contained one or two introns.

  11. Transforming clinical microbiology with bacterial genome sequencing.

    Science.gov (United States)

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  12. Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

    Science.gov (United States)

    Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

    2016-02-27

    In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a

  13. Complete chloroplast genome sequence of Elodea canadensis and comparative analyses with other monocot plastid genomes.

    Science.gov (United States)

    Huotari, Tea; Korpelainen, Helena

    2012-10-15

    Elodea canadensis is an aquatic angiosperm native to North America. It has attracted great attention due to its invasive nature when transported to new areas in its non-native range. We have determined the complete nucleotide sequence of the chloroplast (cp) genome of Elodea. Taxonomically Elodea is a basal monocot, and only few monocot cp genomes representing early lineages of monocots have been sequenced so far. The genome is a circular double-stranded DNA molecule 156,700 bp in length, and has a typical structure with large (LSC 86,194 bp) and small (SSC 17,810 bp) single-copy regions separated by a pair of inverted repeats (IRs 26,348 bp each). The Elodea cp genome contains 113 unique genes and 16 duplicated genes in the IR regions. A comparative analysis showed that the gene order and organization of the Elodea cp genome is almost identical to that of Amborella trichopoda, a basal angiosperm. The structure of IRs in Elodea is unique among monocot species with the whole cp genome sequenced. In Elodea and another monocot Lemna minor the borders between IRs and LSC are located upstream of rps 19 gene and downstream of trnH-GUG gene, while in most monocots, IR has extended to include both trnH and rps 19 genes. A phylogenetic analysis conducted using Bayesian method, based on the DNA sequences of 81 chloroplast genes from 17 monocot taxa provided support for the placement of Elodea together with Lemna as a basal monocot and the next diverging lineage of monocots after Acorales. In comparison with other monocots, the Elodea cp genome has gone through only few rearrangements or gene losses. IR of Elodea has a unique structure among the monocot species studied so far as its structure is similar to that of a basal angiosperm Amborella. This result together with phylogenetic analyses supports the placement of Elodea as a basal monocot to the next diverging lineage of monocots after Acorales. So far, only few cp genomes representing early lineages of monocots have been

  14. Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

    Directory of Open Access Journals (Sweden)

    Jonathan A Shortt

    2017-01-01

    Full Text Available In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies.We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample.This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species

  15. Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

    Science.gov (United States)

    Shortt, Jonathan A; Card, Daren C; Schield, Drew R; Liu, Yang; Zhong, Bo; Castoe, Todd A; Carlton, Elizabeth J; Pollock, David D

    2017-01-01

    In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other

  16. Development of a defined-sequence DNA system for use in DNA misrepair studies

    International Nuclear Information System (INIS)

    Sutton, S.; Tobias, C.A.

    1984-01-01

    The authors have developed a system that allows them to study cellular DNA repair processes at the molecular level. In particular, the authors are using this system to examine the consequences of a misrepair of radiation-induced DNA damage, as a function of dose. The cells being used are specially engineered haploid yeast cells. Maintained in the cells, at one copy per cell, is a cen plasmid, a plasmid that behaves like a functional chromosome. This plasmid carries a small defined sequence of DNA from the E. coli lac z gene. It is this lac z region (called the alpha region) that serves as the target for radiation damage. Two copies of the complimentary portion of the lac z gene are integrated into the yeast genome. Irradiated cells are screened for possible mutation in the alpha region by testing the cells' ability to hydrolyze xgal, a lactose substrate. The DNA of interest is then extracted from the cells, sequenced, and the sequence is compared to that of the control. Unlike the usual defined-sequence DNA systems, theirs is an in vivo system. A disadvantage is the relatively high background mutation rate. Results achieved with this system, as well as future applications, are discussed

  17. A complete mitochondrial genome sequence from a mesolithic wild aurochs (Bos primigenius.

    Directory of Open Access Journals (Sweden)

    Ceiridwen J Edwards

    Full Text Available BACKGROUND: The derivation of domestic cattle from the extinct wild aurochs (Bos primigenius has been well-documented by archaeological and genetic studies. Genetic studies point towards the Neolithic Near East as the centre of origin for Bos taurus, with some lines of evidence suggesting possible, albeit rare, genetic contributions from locally domesticated wild aurochsen across Eurasia. Inferences from these investigations have been based largely on the analysis of partial mitochondrial DNA sequences generated from modern animals, with limited sequence data from ancient aurochsen samples. Recent developments in DNA sequencing technologies, however, are affording new opportunities for the examination of genetic material retrieved from extinct species, providing new insight into their evolutionary history. Here we present DNA sequence analysis of the first complete mitochondrial genome (16,338 base pairs from an archaeologically-verified and exceptionally-well preserved aurochs bone sample. METHODOLOGY: DNA extracts were generated from an aurochs humerus bone sample recovered from a cave site located in Derbyshire, England and radiocarbon-dated to 6,738+/-68 calibrated years before present. These extracts were prepared for both Sanger and next generation DNA sequencing technologies (Illumina Genome Analyzer. In total, 289.9 megabases (22.48% of the post-filtered DNA sequences generated using the Illumina Genome Analyzer from this sample mapped with confidence to the bovine genome. A consensus B. primigenius mitochondrial genome sequence was constructed and was analysed alongside all available complete bovine mitochondrial genome sequences. CONCLUSIONS: For all nucleotide positions where both Sanger and Illumina Genome Analyzer sequencing methods gave high-confidence calls, no discrepancies were observed. Sequence analysis reveals evidence of heteroplasmy in this sample and places this mitochondrial genome sequence securely within a previously

  18. Genomic sequence and organization of two members of a human lectin gene family

    International Nuclear Information System (INIS)

    Gitt, M.A.; Barondes, S.H.

    1991-01-01

    The authors have isolated and sequenced the genomic DNA encoding a human dimeric soluble lactose-binding lectin. The gene has four exons, and its upstream region contains sequences that suggest control by glucocorticoids, heat (environmental) shock, metals, and other factors. They have also isolated and sequenced three exons of the gene encoding another human putative lectin, the existence of which was first indicated by isolation of its cDNA. Comparisons suggest a general pattern of genomic organization of members of this lectin gene family

  19. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

    Czech Academy of Sciences Publication Activity Database

    Macas, Jiří; Neumann, Pavel; Navrátilová, Alice

    2007-01-01

    Roč. 8, č. 1 (2007), s. 427 ISSN 1471-2164 R&D Projects: GA AV ČR IAA500960702; GA MŠk(CZ) LC06004 Institutional research plan: CEZ:AV0Z50510513 Keywords : DNA * Pisum sativum L. Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 4.180, year: 2007

  20. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

    Science.gov (United States)

    Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas

    2013-06-01

    We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

  1. Spectral sum rules and search for periodicities in DNA sequences

    International Nuclear Information System (INIS)

    Chechetkin, V.R.

    2011-01-01

    Periodic patterns play the important regulatory and structural roles in genomic DNA sequences. Commonly, the underlying periodicities should be understood in a broad statistical sense, since the corresponding periodic patterns have been strongly distorted by the random point mutations and insertions/deletions during molecular evolution. The latent periodicities in DNA sequences can be efficiently displayed by Fourier transform. The criteria of significance for observed periodicities are obtained via the comparison versus the counterpart characteristics of the reference random sequences. We show that the restrictions imposed on the significance criteria by the rigorous spectral sum rules can be rationally described with De Finetti distribution. This distribution provides the convenient intermediate asymptotic form between Rayleigh distribution and exact combinatoric theory. - Highlights: → We study the significance criteria for latent periodicities in DNA sequences. → The constraints imposed by sum rules can be described with De Finetti distribution. → It is intermediate between Rayleigh distribution and exact combinatoric theory. → Theory is applicable to the study of correlations between different periodicities. → The approach can be generalized to the arbitrary discrete Fourier transform.

  2. ChickVD: a sequence variation database for the chicken genome

    DEFF Research Database (Denmark)

    Wang, Jing; He, Ximiao; Ruan, Jue

    2005-01-01

    Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DN...... on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn. Udgivelsesdato: 2005-Jan-1...

  3. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    Science.gov (United States)

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  4. An evaluation of Comparative Genome Sequencing (CGS by comparing two previously-sequenced bacterial genomes

    Directory of Open Access Journals (Sweden)

    Herring Christopher D

    2007-08-01

    Full Text Available Abstract Background With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. Results In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome Sequencing service provided by Nimblegen Systems Inc., we resequenced the E. coli strain W3110 Kohara using MG1655 as a reference, both of which have been completely sequenced using traditional sequencing methods. CGS detected 7 of 8 small sequence differences, one large deletion, and 9 of 12 IS element insertions present in W3110, but did not detect a large chromosomal inversion. In addition, we confirmed that CGS also detected 2 SNPs, one deletion and 7 IS element insertions that are not present in the genome sequence, which we attribute to changes that occurred after the creation of the W3110 lambda clone library. The false positive rate for SNPs was one per 244 Kb of genome sequence. Conclusion CGS is an effective way to detect multiple mutations present in one bacterium relative to another, and while highly cost-effective, is prone to certain errors. Mutations occurring in repeated sequences or in sequences with a high degree of secondary structure may go undetected. It is also critical to follow up on regions of interest in which SNPs were not called because they often indicate deletions or IS element insertions.

  5. The sequence specificity of UV-induced DNA damage in a systematically altered DNA sequence.

    Science.gov (United States)

    Khoe, Clairine V; Chung, Long H; Murray, Vincent

    2018-06-01

    The sequence specificity of UV-induced DNA damage was investigated in a specifically designed DNA plasmid using two procedures: end-labelling and linear amplification. Absorption of UV photons by DNA leads to dimerisation of pyrimidine bases and produces two major photoproducts, cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). A previous study had determined that two hexanucleotide sequences, 5'-GCTC*AC and 5'-TATT*AA, were high intensity UV-induced DNA damage sites. The UV clone plasmid was constructed by systematically altering each nucleotide of these two hexanucleotide sequences. One of the main goals of this study was to determine the influence of single nucleotide alterations on the intensity of UV-induced DNA damage. The sequence 5'-GCTC*AC was designed to examine the sequence specificity of 6-4PPs and the highest intensity 6-4PP damage sites were found at 5'-GTTC*CC nucleotides. The sequence 5'-TATT*AA was devised to investigate the sequence specificity of CPDs and the highest intensity CPD damage sites were found at 5'-TTTT*CG nucleotides. It was proposed that the tetranucleotide DNA sequence, 5'-YTC*Y (where Y is T or C), was the consensus sequence for the highest intensity UV-induced 6-4PP adduct sites; while it was 5'-YTT*C for the highest intensity UV-induced CPD damage sites. These consensus tetranucleotides are composed entirely of consecutive pyrimidines and must have a DNA conformation that is highly productive for the absorption of UV photons. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.

  6. Approaches for in silico finishing of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Frederico Schmitt Kremer

    Full Text Available Abstract The introduction of next-generation sequencing (NGS had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases tools that are available to facilitate genome finishing.

  7. Approaches for in silico finishing of microbial genome sequences.

    Science.gov (United States)

    Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

    The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.

  8. Genomic DNA extraction protocols from ovine hair

    Directory of Open Access Journals (Sweden)

    Jennifer Nonato da Silva Prate

    2013-12-01

    Full Text Available Genomic DNA extracted from animal cells can be used for several purposes, for example, to know genetic variability and genetic relationships between individuals, breeds and/or species, paternity tests, to describe the genetic profile for registration of the animal at association of breeders, detect genetic polymorphisms (SNP related to characteristics of commercial interest, disease diagnose, assess resistance or susceptibility to pathogens, etc. For such evaluations, in general, DNA is amplified by PCR (polymerase chain reaction, and then subjected to various techniques as RFLP (restriction fragments length polymorphism, SSCP (single strand conformation polymorphism, and sequencing. The DNA may be obtained from blood, buccal swabs, meat, cartilage or hair bulb. Among all, the last biological material has been preferred by farmers for its ease acquisition. Several methods for extracting DNA from hair bulb were reported without any consensus for its implementation. This study aimed to optimize a protocol for efficient DNA extraction for use in PCR-RFLP analysis of the Prion gene. For this study, were collected hair samples containing hair bulb from 131 Santa Inês sheep belonging to the Institute of Zootechny, Nova Odessa - SP. Two DNA extraction protocols were evaluated. The first, called phenol-chloroform-isoamyl alcohol (PCIA has long been used by Animal Genetic Laboratories, whose procedures are described below: in each microtube (1.5 mL containing 500 µL of TE-Tween solution (Tris-HCl 50 mM, EDTA 1 mM and 0.5% Tween 20 were added to approximately 30 hair bulb per animal which was incubated at 65°C with shaking at 170 rpm for 2 hours. Then was added 15 µL of proteinase K [10 mg mL-1] and incubated at 55°C at 170 rpm for 6-12 hours. At the end of digestion was added 1 volume of solution phenol-chloroform-isoamyl alcohol (25:24:1 followed by vigorous shaking for 10 seconds and centrifuged at 8000 rpm and 4°C for 10 minutes. The upper phase

  9. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis.

    Science.gov (United States)

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.

  10. Draft Genome Sequence of Mycobacterium chimaera Type ...

    Science.gov (United States)

    We report the draft genome sequence of the type strain Mycobacterium chimaera Fl-0169T, a member of the Mycobacterium avium complex (MAC). M. chimaera Fl-0169T was isolated from a patient in Italy and is highly similar to strains of M. chimaera isolated in Ireland, though Fl-0169T possesses unique virulence genes. Evidence suggests that M. avium, M. intracellulare, and M. chimaera are differently virulent and a comparative genomic analysis is critically needed to identify diagnostic targets that reliably differentiate species of MAC. With treatment costs for Mycobacterium infections estimated to be >$1.8 B annually in the U.S., correct species identification will result in improved treatment selection, lower costs, and improved patient outcomes.

  11. The complete sequence of the mitochondrial genome of the African Penguin (Spheniscus demersus).

    Science.gov (United States)

    Labuschagne, Christiaan; Kotzé, Antoinette; Grobler, J Paul; Dalton, Desiré L

    2014-01-15

    The complete mitochondrial genome of the African Penguin (Spheniscus demersus) was sequenced. The molecule was sequenced via next generation sequencing and primer walking. The size of the genome is 17,346 bp in length. Comparison with the mitochondrial DNA of two other penguin genomes that have so far been reported was conducted namely; Little blue penguin (Eudyptula minor) and the Rockhopper penguin (Eudyptes chrysocome). This analysis made it possible to identify common penguin mitochondrial DNA characteristics. The S. demersus mtDNA genome is very similar, both in composition and length to both the E. chrysocome and E. minor genomes. The gene content of the African penguin mitochondrial genome is typical of vertebrates and all three penguin species have the standard gene order originally identified in the chicken. The control region for S. demersus is located between tRNA-Glu and tRNA-Phe and all three species of penguins contain two sets of similar repeats with varying copy numbers towards the 3' end of the control region, accounting for the size variance. This is the first report of the complete nucleotide sequence for the mitochondrial genome of the African penguin, S. demersus. These results can be subsequently used to provide information for penguin phylogenetic studies and insights into the evolution of genomes. © 2013 Elsevier B.V. All rights reserved.

  12. Alu Mobile Elements: From Junk DNA to Genomic Gems

    Directory of Open Access Journals (Sweden)

    Sami Dridi

    2012-01-01

    Full Text Available Alus, the short interspersed repeated sequences (SINEs, are retrotransposons that litter the human genomes and have long been considered junk DNA. However, recent findings that these mobile elements are transcribed, both as distinct RNA polymerase III transcripts and as a part of RNA polymerase II transcripts, suggest biological functions and refute the notion that Alus are biologically unimportant. Indeed, Alu RNAs have been shown to control mRNA processing at several levels, to have complex regulatory functions such as transcriptional repression and modulating alternative splicing and to cause a host of human genetic diseases. Alu RNAs embedded in Pol II transcripts can promote evolution and proteome diversity, which further indicates that these mobile retroelements are in fact genomic gems rather than genomic junks.

  13. Comparative genomic hybridization analysis detects frequent over-representation of DNA sequences at 3q, 7p, 8q and 18q in head and neck carcinomas

    DEFF Research Database (Denmark)

    Bergamo, N A; Rogatto, S R; Poli-Frederico, R C

    2000-01-01

    Comparative genomic hybridization (CGH) was used to identify chromosomal imbalances in 19 samples of squamous cell carcinoma of the head and neck (HNSCC). The chromosome arms most often over-represented were 3q (48%), 8q (42%), and 7p (32%); in many cases, these changes were observed at high copy...... and 2q material were detected in patients exhibiting a clinical history of recurrence and/or metastasis followed by terminal disease. This association suggests that gain of 1q and 2q may be a new marker of head and neck tumors with a refractory clinical response....

  14. The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome.

    Science.gov (United States)

    Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea

    2014-04-01

    Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome.

  15. Gene Discovery through Genomic Sequencing of Brucella abortus

    Science.gov (United States)

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  16. Simulating efficiently the evolution of DNA sequences.

    Science.gov (United States)

    Schöniger, M; von Haeseler, A

    1995-02-01

    Two menu-driven FORTRAN programs are described that simulate the evolution of DNA sequences in accordance with a user-specified model. This general stochastic model allows for an arbitrary stationary nucleotide composition and any transition-transversion bias during the process of base substitution. In addition, the user may define any hypothetical model tree according to which a family of sequences evolves. The programs suggest the computationally most inexpensive approach to generate nucleotide substitutions. Either reproducible or non-repeatable simulations, depending on the method of initializing the pseudo-random number generator, can be performed. The corresponding options are offered by the interface menu.

  17. Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications.

    Science.gov (United States)

    Christen, Matthias; Del Medico, Luca; Christen, Heinz; Christen, Beat

    2017-01-01

    Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner.

  18. Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications.

    Directory of Open Access Journals (Sweden)

    Matthias Christen

    Full Text Available Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner.

  19. Direct detection of chicken genomic DNA for gender determination by thymine-DNA glycosylase.

    Science.gov (United States)

    Porat, N; Bogdanov, K; Danielli, A; Arie, A; Samina, I; Hadani, A

    2011-02-01

    1. Birds, especially nestlings, are generally difficult to sex by morphology and early detection of chick gender in ovo in the hatchery would facilitate removal of unwanted chicks and diminish welfare objections regarding culling after hatch. 2. We describe a method to determine chicken gender without the need for PCR via use of Thymine-DNA Glycosylase (TDG). TDG restores thymine (T)/guanine (G) mismatches to cytosine (C)/G. We show here, that like DNA Polymerase, TDG can recognise, bind and function on a primer hybridised to chicken genomic DNA. 3. The primer contained a T to mismatch a G in a chicken genomic template and the T/G was cleaved with high fidelity by TDG. Thus, the chicken genomic DNA can be identified without PCR amplification via direct and linear detection. Sensitivity was increased using gender specific sequences from the chicken genome. 4. Currently, these are laboratory results, but we anticipate that further development will allow this method to be used in non-laboratory settings, where PCR cannot be employed.

  20. Early Lyme disease with spirochetemia - diagnosed by DNA sequencing

    Directory of Open Access Journals (Sweden)

    Jones William

    2010-11-01

    Full Text Available Abstract Background A sensitive and analytically specific nucleic acid amplification test (NAAT is valuable in confirming the diagnosis of early Lyme disease at the stage of spirochetemia. Findings Venous blood drawn from patients with clinical presentations of Lyme disease was tested for the standard 2-tier screen and Western Blot serology assay for Lyme disease, and also by a nested polymerase chain reaction (PCR for B. burgdorferi sensu lato 16S ribosomal DNA. The PCR amplicon was sequenced for B. burgdorferi genomic DNA validation. A total of 130 patients visiting emergency room (ER or Walk-in clinic (WALKIN, and 333 patients referred through the private physicians' offices were studied. While 5.4% of the ER/WALKIN patients showed DNA evidence of spirochetemia, none (0% of the patients referred from private physicians' offices were DNA-positive. In contrast, while 8.4% of the patients referred from private physicians' offices were positive for the 2-tier Lyme serology assay, only 1.5% of the ER/WALKIN patients were positive for this antibody test. The 2-tier serology assay missed 85.7% of the cases of early Lyme disease with spirochetemia. The latter diagnosis was confirmed by DNA sequencing. Conclusion Nested PCR followed by automated DNA sequencing is a valuable supplement to the standard 2-tier antibody assay in the diagnosis of early Lyme disease with spirochetemia. The best time to test for Lyme spirochetemia is when the patients living in the Lyme disease endemic areas develop unexplained symptoms or clinical manifestations that are consistent with Lyme disease early in the course of their illness.

  1. Distribution of Ds-like sequences in genomes of cereals

    International Nuclear Information System (INIS)

    Vershinin, A.V.; Salina, E.A.; Shumnii, V.K.; Svitashev, S.K.

    1986-01-01

    It has been suggested that insertions of Ds-elements may alter the effectiveness of transcription or translation of the genetic loci and the normal processing of introns and exons, and that they may impair coding frames, etc. The object of the present study was to determine the frequency of occurence of DNA sequences similar to the Ds-controlling elements of mazie (Ds-like sequences) among other representatives of cereals. The conservative feature of the primary structure of transposons from different eukaryotic species served as a basis in this investigation. By means of the ''nick-translation'' reaction with the aid of DNA-polymerase I (alpha- 32 P) dCTP or TTP was introduced into the Ds-element. The specific radioactivity of the preparations obtained was 5 x 10 7 to 1 x 10 8 cpm/gamma. From the results obtained, it is suggested that the genomes of cereals examined contain a collection of Ds-like sequences. The Ds-element may have a significant effect on gene expression in the presence of Ac-like or other sequences, which undergo transposition

  2. The complete chloroplast genome sequence of Dianthus superbus var. longicalycinus.

    Science.gov (United States)

    Gurusamy, Raman; Lee, Do-Hyung; Park, SeonJoo

    2016-05-01

    The complete chloroplast genome (cpDNA) sequence of Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicine was reported and characterized. The cpDNA of Dianthus superbus var. longicalycinus is 149,539 bp, with 36.3% GC content. A pair of inverted repeats (IRs) of 24,803 bp is separated by a large single-copy region (LSC, 82,805 bp) and a small single-copy region (SSC, 17,128 bp). It encodes 85 protein-coding genes, 36 tRNA genes and 8 rRNA genes. Of 129 individual genes, 13 genes encoded one intron and three genes have two introns.

  3. Genetic integrity of the Dark European honey bee (Apis mellifera mellifera) from protected populations: a genome-wide assessment using SNPs and mtDNA sequence data

    OpenAIRE

    Pinto, M. Alice; Henriques, Dora; Chavez-Galarza, Julio; Kryger, Per; Garnery, Lionel; Zee, Romée van der; Dahle, Bjørn; Soland-Reckeweg, Gabriele; De la Rúa, Pilar; Dall’ Olio, Raffaele; Carreck, Norman L.; Johnston, J. Spencer

    2014-01-01

    The recognition that the Dark European honey bee, Apis mellifera mellifera, is increasingly threatened in its native range has led to the establishment of conservation programmes and protected areas throughout western Europe. Previous molecular surveys showed that, despite management strategies to preserve the genetic integrity of A. m. mellifera, protected populations had a measurable component of their gene pool derived from commercial C-lineage honey bees. Here we used both sequence data f...

  4. Virtual Genome Walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.

    Science.gov (United States)

    Evans, Teri; Johnson, Andrew D; Loose, Matthew

    2018-01-12

    Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .

  5. Google matrix analysis of DNA sequences.

    Science.gov (United States)

    Kandiah, Vivek; Shepelyansky, Dima L

    2013-01-01

    For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  6. Google matrix analysis of DNA sequences.

    Directory of Open Access Journals (Sweden)

    Vivek Kandiah

    Full Text Available For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW. At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.

  7. Open source tools to exploit DNA sequence data from livestock species

    Science.gov (United States)

    Next-Generation Sequencing (NGS) is a recent technological development that allows researchers to rapidly determine the DNA sequence of an individual. The decrease in cost of NGS has brought the technology into the realm of practical applications in livestock genomics, where it can be used to genera...

  8. Extraction of High Molecular Weight DNA from Fungal Rust Spores for Long Read Sequencing.

    Science.gov (United States)

    Schwessinger, Benjamin; Rathjen, John P

    2017-01-01

    Wheat rust fungi are complex organisms with a complete life cycle that involves two different host plants and five different spore types. During the asexual infection cycle on wheat, rusts produce massive amounts of dikaryotic urediniospores. These spores are dikaryotic (two nuclei) with each nucleus containing one haploid genome. This dikaryotic state is likely to contribute to their evolutionary success, making them some of the major wheat pathogens globally. Despite this, most published wheat rust genomes are highly fragmented and contain very little haplotype-specific sequence information. Current long-read sequencing technologies hold great promise to provide more contiguous and haplotype-phased genome assemblies. Long reads are able to span repetitive regions and phase structural differences between the haplomes. This increased genome resolution enables the identification of complex loci and the study of genome evolution beyond simple nucleotide polymorphisms. Long-read technologies require pure high molecular weight DNA as an input for sequencing. Here, we describe a DNA extraction protocol for rust spores that yields pure double-stranded DNA molecules with molecular weight of >50 kilo-base pairs (kbp). The isolated DNA is of sufficient purity for PacBio long-read sequencing, but may require additional purification for other sequencing technologies such as Nanopore and 10× Genomics.

  9. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA; Vos, M. de; Louw, GE; Merwe, RG van der; Dippenaar, A.; Streicher, EM; Abdallah, AM; Sampson, SL; Victor, TC; Dolby, T.; Simpson, JA; Helden, PD van; Warren, RM; Pain, Arnab

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug

  10. Molecular analysis and genomic organization of major DNA satellites in banana (Musa spp.).

    Science.gov (United States)

    Čížková, Jana; Hřibová, Eva; Humplíková, Lenka; Christelová, Pavla; Suchánková, Pavla; Doležel, Jaroslav

    2013-01-01

    Satellite DNA sequences consist of tandemly arranged repetitive units up to thousands nucleotides long in head-to-tail orientation. The evolutionary processes by which satellites arise and evolve include unequal crossing over, gene conversion, transposition and extra chromosomal circular DNA formation. Large blocks of satellite DNA are often observed in heterochromatic regions of chromosomes and are a typical component of centromeric and telomeric regions. Satellite-rich loci may show specific banding patterns and facilitate chromosome identification and analysis of structural chromosome changes. Unlike many other genomes, nuclear genomes of banana (Musa spp.) are poor in satellite DNA and the information on this class of DNA remains limited. The banana cultivars are seed sterile clones originating mostly from natural intra-specific crosses within M. acuminata (A genome) and inter-specific crosses between M. acuminata and M. balbisiana (B genome). Previous studies revealed the closely related nature of the A and B genomes, including similarities in repetitive DNA. In this study we focused on two main banana DNA satellites, which were previously identified in silico. Their genomic organization and molecular diversity was analyzed in a set of nineteen Musa accessions, including representatives of A, B and S (M. schizocarpa) genomes and their inter-specific hybrids. The two DNA satellites showed a high level of sequence conservation within, and a high homology between Musa species. FISH with probes for the satellite DNA sequences, rRNA genes and a single-copy BAC clone 2G17 resulted in characteristic chromosome banding patterns in M. acuminata and M. balbisiana which may aid in determining genomic constitution in interspecific hybrids. In addition to improving the knowledge on Musa satellite DNA, our study increases the number of cytogenetic markers and the number of individual chromosomes, which can be identified in Musa.

  11. In Vitro Whole Genome DNA Binding Analysis of the Bacterial Replication Initiator and Transcription Factor DnaA.

    Directory of Open Access Journals (Sweden)

    Janet L Smith

    2015-05-01

    Full Text Available DnaA, the replication initiation protein in bacteria, is an AAA+ ATPase that binds and hydrolyzes ATP and exists in a heterogeneous population of ATP-DnaA and ADP-DnaA. DnaA binds cooperatively to the origin of replication and several other chromosomal regions, and functions as a transcription factor at some of these regions. We determined the binding properties of Bacillus subtilis DnaA to genomic DNA in vitro at single nucleotide resolution using in vitro DNA affinity purification and deep sequencing (IDAP-Seq. We used these data to identify 269 binding regions, refine the consensus sequence of the DnaA binding site, and compare the relative affinity of binding regions for ATP-DnaA and ADP-DnaA. Most sites had a slightly higher affinity for ATP-DnaA than ADP-DnaA, but a few had a strong preference for binding ATP-DnaA. Of the 269 sites, only the eight strongest binding ones have been observed to bind DnaA in vivo, suggesting that other cellular factors or the amount of available DnaA in vivo restricts DnaA binding to these additional sites. Conversely, we found several chromosomal regions that were bound by DnaA in vivo but not in vitro, and that the nucleoid-associated protein Rok was required for binding in vivo. Our in vitro characterization of the inherent ability of DnaA to bind the genome at single nucleotide resolution provides a backdrop for interpreting data on in vivo binding and regulation of DnaA, and is an approach that should be adaptable to many other DNA binding proteins.

  12. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis

    DEFF Research Database (Denmark)

    Carlton, Jane M.; Hirt, Robert P.; Silva, Joana C.

    2007-01-01

    We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the approximately 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion...... environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria....

  13. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Science.gov (United States)

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  14. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    Directory of Open Access Journals (Sweden)

    Soichi Inagaki

    Full Text Available Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  15. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

    Science.gov (United States)

    Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca

    2015-01-01

    Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.

  16. Comparison of DNA Quantification Methods for Next Generation Sequencing.

    Science.gov (United States)

    Robin, Jérôme D; Ludlow, Andrew T; LaRanger, Ryan; Wright, Woodring E; Shay, Jerry W

    2016-04-06

    Next Generation Sequencing (NGS) is a powerful tool that depends on loading a precise amount of DNA onto a flowcell. NGS strategies have expanded our ability to investigate genomic phenomena by referencing mutations in cancer and diseases through large-scale genotyping, developing methods to map rare chromatin interactions (4C; 5C and Hi-C) and identifying chromatin features associated with regulatory elements (ChIP-seq, Bis-Seq, ChiA-PET). While many methods are available for DNA library quantification, there is no unambiguous gold standard. Most techniques use PCR to amplify DNA libraries to obtain sufficient quantities for optical density measurement. However, increased PCR cycles can distort the library's heterogeneity and prevent the detection of rare variants. In this analysis, we compared new digital PCR technologies (droplet digital PCR; ddPCR, ddPCR-Tail) with standard methods for the titration of NGS libraries. DdPCR-Tail is comparable to qPCR and fluorometry (QuBit) and allows sensitive quantification by analysis of barcode repartition after sequencing of multiplexed samples. This study provides a direct comparison between quantification methods throughout a complete sequencing experiment and provides the impetus to use ddPCR-based quantification for improvement of NGS quality.

  17. Genome sequence data from 17 accessions of Ensete ventricosum, a staple food crop for millions in Ethiopia

    Directory of Open Access Journals (Sweden)

    Zerihun Yemataw

    2018-06-01

    Full Text Available We present raw sequence reads and genome assemblies derived from 17 accessions of the Ethiopian orphan crop plant enset (Ensete ventricosum (Welw. Cheesman using the Illumina HiSeq and MiSeq platforms. Also presented is a catalogue of single-nucleotide polymorphisms inferred from the sequence data at an average density of approximately one per kilobase of genomic DNA.

  18. A DNA minor groove electronegative potential genome map based on photo-chemical probing

    DEFF Research Database (Denmark)

    Lindemose, Søren; Nielsen, Peter Eigil; Hansen, Morten

    2011-01-01

    The double-stranded DNA of the genome contains both sequence information directly relating to the protein and RNA coding as well as functional and structural information relating to protein recognition. Only recently is the importance of DNA shape in this recognition process being fully appreciat...

  19. Primers for polymerase chain reaction to detect genomic DNA of Toxocara canis and T. cati.

    Science.gov (United States)

    Wu, Z; Nagano, I; Xu, D; Takahashi, Y

    1997-03-01

    Primers for polymerase chain reaction to amplify genomic DNA of both Toxocara canis and T. cati were constructed by adapting cloning and sequencing random amplified polymorphic DNA. The primers are expected to detect eggs and/or larvae of T. canis and T. cati, both of which are known to cause toxocariasis in humans.

  20. Applications of Genomic Sequencing in Pediatric CNS Tumors.

    Science.gov (United States)

    Bavle, Abhishek A; Lin, Frank Y; Parsons, D Williams

    2016-05-01

    Recent advances in genome-scale sequencing methods have resulted in a significant increase in our understanding of the biology of human cancers. When applied to pediatric central nervous system (CNS) tumors, these remarkable technological breakthroughs have facilitated the molecular characterization of multiple tumor types, provided new insights into the genetic basis of these cancers, and prompted innovative strategies that are changing the management paradigm in pediatric neuro-oncology. Genomic tests have begun to affect medical decision making in a number of ways, from