WorldWideScience

Sample records for genomic sequence divergence

  1. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    Directory of Open Access Journals (Sweden)

    Shade Larry L

    2006-06-01

    Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.

  2. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

    Directory of Open Access Journals (Sweden)

    Yandell Mark

    2010-07-01

    Full Text Available Abstract Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24. The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity elsewhere in the genome, but only 23% have identical copies (99% identity. The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is

  3. Using comparative genomic hybridization to survey genomic sequence divergence across species: a proof-of-concept from Drosophila

    Directory of Open Access Journals (Sweden)

    Kulathinal Rob J

    2010-04-01

    Full Text Available Abstract Background Genome-wide analysis of sequence divergence among species offers profound insights into the evolutionary processes that shape lineages. When full-genome sequencing is not feasible for a broad comparative study, we propose the use of array-based comparative genomic hybridization (aCGH in order to identify orthologous genes with high sequence divergence. Here we discuss experimental design, statistical power, success rate, sources of variation and potential confounding factors. We used a spotted PCR product microarray platform from Drosophila melanogaster to assess sequence divergence on a gene-by-gene basis in three fully sequenced heterologous species (D. sechellia, D. simulans, and D. yakuba. Because complete genome assemblies are available for these species this study presents a powerful test for the use of aCGH as a tool to measure sequence divergence. Results We found a consistent and linear relationship between hybridization ratio and sequence divergence of the sample to the platform species. At higher levels of sequence divergence (D. melanogaster ~84% of features had significantly less hybridization to the array in the heterologous species than the platform species, and thus could be identified as "diverged". At lower levels of divergence (≥ 97% identity, only 13% of genes were identified as diverged. While ~40% of the variation in hybridization ratio can be accounted for by variation in sequence identity of the heterologous sample relative to D. melanogaster, other individual characteristics of the DNA sequences, such as GC content, also contribute to variation in hybridization ratio, as does technical variation. Conclusions Here we demonstrate that aCGH can accurately be used as a proxy to estimate genome-wide divergence, thus providing an efficient way to evaluate how evolutionary processes and genomic architecture can shape species diversity in non-model systems. Given the increased number of species for which

  4. Mitochondrial genome sequences reveal deep divergences among Anopheles punctulatus sibling species in Papua New Guinea

    Directory of Open Access Journals (Sweden)

    Logue Kyle

    2013-02-01

    Full Text Available Abstract Background Members of the Anopheles punctulatus group (AP group are the primary vectors of human malaria in Papua New Guinea. The AP group includes 13 sibling species, most of them morphologically indistinguishable. Understanding why only certain species are able to transmit malaria requires a better comprehension of their evolutionary history. In particular, understanding relationships and divergence times among Anopheles species may enable assessing how malaria-related traits (e.g. blood feeding behaviours, vector competence have evolved. Methods DNA sequences of 14 mitochondrial (mt genomes from five AP sibling species and two species of the Anopheles dirus complex of Southeast Asia were sequenced. DNA sequences from all concatenated protein coding genes (10,770 bp were then analysed using a Bayesian approach to reconstruct phylogenetic relationships and date the divergence of the AP sibling species. Results Phylogenetic reconstruction using the concatenated DNA sequence of all mitochondrial protein coding genes indicates that the ancestors of the AP group arrived in Papua New Guinea 25 to 54 million years ago and rapidly diverged to form the current sibling species. Conclusion Through evaluation of newly described mt genome sequences, this study has revealed a divergence among members of the AP group in Papua New Guinea that would significantly predate the arrival of humans in this region, 50 thousand years ago. The divergence observed among the mtDNA sequences studied here may have resulted from reproductive isolation during historical changes in sea-level through glacial minima and maxima. This leads to a hypothesis that the AP sibling species have evolved independently for potentially thousands of generations. This suggests that the evolution of many phenotypes, such as insecticide resistance will arise independently in each of the AP sibling species studied here.

  5. The complete plastid genome sequence of Welwitschia mirabilis: an unusually compact plastome with accelerated divergence rates

    Directory of Open Access Journals (Sweden)

    Boore Jeffrey L

    2008-05-01

    Full Text Available Abstract Background Welwitschia mirabilis is the only extant member of the family Welwitschiaceae, one of three lineages of gnetophytes, an enigmatic group of gymnosperms variously allied with flowering plants or conifers. Limited sequence data and rapid divergence rates have precluded consensus on the evolutionary placement of gnetophytes based on molecular characters. Here we report on the first complete gnetophyte chloroplast genome sequence, from Welwitschia mirabilis, as well as analyses on divergence rates of protein-coding genes, comparisons of gene content and order, and phylogenetic implications. Results The chloroplast genome of Welwitschia mirabilis [GenBank: EU342371] is comprised of 119,726 base pairs and exhibits large and small single copy regions and two copies of the large inverted repeat (IR. Only 101 unique gene species are encoded. The Welwitschia plastome is the most compact photosynthetic land plant plastome sequenced to date; 66% of the sequence codes for product. The genome also exhibits a slightly expanded IR, a minimum of 9 inversions that modify gene order, and 19 genes that are lost or present as pseudogenes. Phylogenetic analyses, including one representative of each extant seed plant lineage and based on 57 concatenated protein-coding sequences, place Welwitschia at the base of all seed plants (distance, maximum parsimony or as the sister to Pinus (the only conifer representative in a monophyletic gymnosperm clade (maximum likelihood, bayesian. Relative rate tests on these gene sequences show the Welwitschia sequences to be evolving at faster rates than other seed plants. For these genes individually, a comparison of average pairwise distances indicates that relative divergence in Welwitschia ranges from amounts about equal to other seed plants to amounts almost three times greater than the average for non-gnetophyte seed plants. Conclusion Although the basic organization of the Welwitschia plastome is typical, its

  6. Phylogenetic analyses of complete mitochondrial genome sequences suggest a basal divergence of the enigmatic rodent Anomalurus

    Directory of Open Access Journals (Sweden)

    Gissi Carmela

    2007-02-01

    Full Text Available Abstract Background Phylogenetic relationships between Lagomorpha, Rodentia and Primates and their allies (Euarchontoglires have long been debated. While it is now generally agreed that Rodentia constitutes a monophyletic sister-group of Lagomorpha and that this clade (Glires is sister to Primates and Dermoptera, higher-level relationships within Rodentia remain contentious. Results We have sequenced and performed extensive evolutionary analyses on the mitochondrial genome of the scaly-tailed flying squirrel Anomalurus sp., an enigmatic rodent whose phylogenetic affinities have been obscure and extensively debated. Our phylogenetic analyses of the coding regions of available complete mitochondrial genome sequences from Euarchontoglires suggest that Anomalurus is a sister taxon to the Hystricognathi, and that this clade represents the most basal divergence among sampled Rodentia. Bayesian dating methods incorporating a relaxed molecular clock provide divergence-time estimates which are consistently in agreement with the fossil record and which indicate a rapid radiation within Glires around 60 million years ago. Conclusion Taken together, the data presented provide a working hypothesis as to the phylogenetic placement of Anomalurus, underline the utility of mitochondrial sequences in the resolution of even relatively deep divergences and go some way to explaining the difficulty of conclusively resolving higher-level relationships within Glires with available data and methodologies.

  7. Characterization and phylogenetic analysis of -gliadin gene sequences reveals significant genomic divergence in Triticeae species

    Indian Academy of Sciences (India)

    Guang-Rong Li; Tao Lang; En-Nian Yang; Cheng Liu; Zu-Jun Yang

    2014-12-01

    Although the unique properties of wheat -gliadin gene family are well characterized, little is known about the evolution and genomic divergence of -gliadin gene family within the Triticeae. We isolated a total of 203 -gliadin gene sequences from 11 representative diploid and polyploid Triticeae species, and found 108 sequences putatively functional. Our results indicate that -gliadin genes may have possibly originated from wild Secale species, where the sequences contain the shortest repetitive domains and display minimum variation. A miniature inverted-repeat transposable element insertion is reported for the first time in -gliadin gene sequence of Thinopyrum intermedium in this study, indicating that the transposable element might have contributed to the diversification of -gliadin genes family among Triticeae genomes. The phylogenetic analyses revealed that the -gliadin gene sequences of Dasypyrum, Australopyrum, Lophopyrum, Eremopyrum and Pseudoroengeria species have amplified several times. A search for four typical toxic epitopes for celiac disease within the Triticeae -gliadin gene sequences showed that the -gliadins of wild Secale, Australopyrum and Agropyron genomes lack all four epitopes, while other Triticeae species have accumulated these epitopes, suggesting that the evolution of these toxic epitopes sequences occurred during the course of speciation, domestication or polyploidization of Triticeae.

  8. Sequence divergence and conservation in genomes of Helicobacter cetorum strains from a dolphin and a whale.

    Directory of Open Access Journals (Sweden)

    Dangeruta Kersulyte

    Full Text Available BACKGROUND AND OBJECTIVES: Strains of Helicobacter cetorum have been cultured from several marine mammals and have been found to be closely related in 16 S rDNA sequence to the human gastric pathogen H. pylori, but their genomes were not characterized further. METHODS: The genomes of H. cetorum strains from a dolphin and a whale were sequenced completely using 454 technology and PCR and capillary sequencing. RESULTS: These genomes are 1.8 and 1.95 mb in size, some 7-26% larger than H. pylori genomes, and differ markedly from one another in gene content, and sequences and arrangements of shared genes. However, each strain is more related overall to H. pylori and its descendant H. acinonychis than to other known species. These H. cetorum strains lack cag pathogenicity islands, but contain novel alleles of the virulence-associated vacuolating cytotoxin (vacA gene. Of particular note are (i an extra triplet of vacA genes with ≤50% protein-level identity to each other in the 5' two-thirds of the gene needed for host factor interaction; (ii divergent sets of outer membrane protein genes; (iii several metabolic genes distinct from those of H. pylori; (iv genes for an iron-cofactored urease related to those of Helicobacter species from terrestrial carnivores, in addition to genes for a nickel co-factored urease; and (v members of the slr multigene family, some of which modulate host responses to infection and improve Helicobacter growth with mammalian cells. CONCLUSIONS: Our genome sequence data provide a glimpse into the novelty and great genetic diversity of marine helicobacters. These data should aid further analyses of microbial genome diversity and evolution and infection and disease mechanisms in vast and often fragile ocean ecosystems.

  9. Oil palm genome sequence reveals divergence of interfertile species in Old and New worlds.

    Science.gov (United States)

    Singh, Rajinder; Ong-Abdullah, Meilina; Low, Eng-Ti Leslie; Manaf, Mohamad Arif Abdul; Rosli, Rozana; Nookiah, Rajanaidu; Ooi, Leslie Cheng-Li; Ooi, Siew-Eng; Chan, Kuang-Lim; Halim, Mohd Amin; Azizi, Norazah; Nagappan, Jayanthi; Bacher, Blaire; Lakey, Nathan; Smith, Steven W; He, Dong; Hogan, Michael; Budiman, Muhammad A; Lee, Ernest K; DeSalle, Rob; Kudrna, David; Goicoechea, Jose Luis; Wing, Rod A; Wilson, Richard K; Fulton, Robert S; Ordway, Jared M; Martienssen, Robert A; Sambanthamurthi, Ravigadevi

    2013-08-15

    Oil palm is the most productive oil-bearing crop. Although it is planted on only 5% of the total world vegetable oil acreage, palm oil accounts for 33% of vegetable oil and 45% of edible oil worldwide, but increased cultivation competes with dwindling rainforest reserves. We report the 1.8-gigabase (Gb) genome sequence of the African oil palm Elaeis guineensis, the predominant source of worldwide oil production. A total of 1.535 Gb of assembled sequence and transcriptome data from 30 tissue types were used to predict at least 34,802 genes, including oil biosynthesis genes and homologues of WRINKLED1 (WRI1), and other transcriptional regulators, which are highly expressed in the kernel. We also report the draft sequence of the South American oil palm Elaeis oleifera, which has the same number of chromosomes (2n = 32) and produces fertile interspecific hybrids with E. guineensis but seems to have diverged in the New World. Segmental duplications of chromosome arms define the palaeotetraploid origin of palm trees. The oil palm sequence enables the discovery of genes for important traits as well as somaclonal epigenetic alterations that restrict the use of clones in commercial plantings, and should therefore help to achieve sustainability for biofuels and edible oils, reducing the rainforest footprint of this tropical plantation crop.

  10. Abundant sequence divergence in the native Japanese cattle Mishima-Ushi (Bos taurus) detected using whole-genome sequencing.

    Science.gov (United States)

    Tsuda, Kaoru; Kawahara-Miki, Ryouka; Sano, Satoshi; Imai, Misaki; Noguchi, Tatsuo; Inayoshi, Yousuke; Kono, Tomohiro

    2013-10-01

    The native Japanese cattle Mishima-Ushi, a designated national natural treasure, are bred on a remote island, which has resulted in the conservation of their genealogy. We examined the genetic characteristics of 8 Mishima-Ushi individuals by using single nucleotide polymorphisms (SNPs), insertions, and deletions obtained by whole-genome sequencing. Mapping analysis with various criteria showed that predicted heterozygous SNPs were more prevalent than predicted homozygous SNPs in the exonic region, especially non-synonymous SNPs. From the identified 6.54 million polymorphisms, we found 400 non-synonymous SNPs in 313 genes specific to each of the 8 Mishima-Ushi individuals. Additionally, 3,170,833 polymorphisms were found between the 8 Mishima-Ushi individuals. Phylogenetic analysis confirmed that the Mishima-Ushi population diverged from another strain of Japanese cattle. This study provides a framework for further genetic studies of Mishima-Ushi and research on the function of SNP-containing genes as well as understanding the genetic relationship between the domestic and native Japanese cattle breeds.

  11. DNA Barcoding: Amplification and sequence analysis of rbcl and matK genome regions in three divergent plant species

    Directory of Open Access Journals (Sweden)

    Javed Iqbal Wattoo

    2016-11-01

    Full Text Available Background: DNA barcoding is a novel method of species identification based on nucleotide diversity of conserved sequences. The establishment and refining of plant DNA barcoding systems is more challenging due to high genetic diversity among different species. Therefore, targeting the conserved nuclear transcribed regions would be more reliable for plant scientists to reveal genetic diversity, species discrimination and phylogeny. Methods: In this study, we amplified and sequenced the chloroplast DNA regions (matk+rbcl of Solanum nigrum, Euphorbia helioscopia and Dalbergia sissoo to study the functional annotation, homology modeling and sequence analysis to allow a more efficient utilization of these sequences among different plant species. These three species represent three families; Solanaceae, Euphorbiaceae and Fabaceae respectively. Biological sequence homology and divergence of amplified sequences was studied using Basic Local Alignment Tool (BLAST. Results: Both primers (matk+rbcl showed good amplification in three species. The sequenced regions reveled conserved genome information for future identification of different medicinal plants belonging to these species. The amplified conserved barcodes revealed different levels of biological homology after sequence analysis. The results clearly showed that the use of these conserved DNA sequences as barcode primers would be an accurate way for species identification and discrimination. Conclusion: The amplification and sequencing of conserved genome regions identified a novel sequence of matK in native species of Solanum nigrum. The findings of the study would be applicable in medicinal industry to establish DNA based identification of different medicinal plant species to monitor adulteration.

  12. Genomic organization, sequence divergence, and recombination of feline immunodeficiency virus from lions in the wild

    Directory of Open Access Journals (Sweden)

    Sondgeroth Kerry

    2008-02-01

    Full Text Available Abstract Background Feline immunodeficiency virus (FIV naturally infects multiple species of cat and is related to human immunodeficiency virus in humans. FIV infection causes AIDS-like disease and mortality in the domestic cat (Felis catus and serves as a natural model for HIV infection in humans. In African lions (Panthera leo and other exotic felid species, disease etiology introduced by FIV infection are less clear, but recent studies indicate that FIV causes moderate to severe CD4 depletion. Results In this study, comparative genomic methods are used to evaluate the full proviral genome of two geographically distinct FIV subtypes isolated from free-ranging lions. Genome organization of FIVPle subtype B (9891 bp from lions in the Serengeti National Park in Tanzania and FIVPle subtype E (9899 bp isolated from lions in the Okavango Delta in Botswana, both resemble FIV genome sequence from puma, Pallas cat and domestic cat across 5' LTR, gag, pol, vif, orfA, env, rev and 3'LTR regions. Comparative analyses of available full-length FIV consisting of subtypes A, B and C from FIVFca, Pallas cat FIVOma and two puma FIVPco subtypes A and B recapitulate the species-specific monophyly of FIV marked by high levels of genetic diversity both within and between species. Across all FIVPle gene regions except env, lion subtypes B and E are monophyletic, and marginally more similar to Pallas cat FIVOma than to other FIV. Sequence analyses indicate the SU and TM regions of env vary substantially between subtypes, with FIVPle subtype E more related to domestic cat FIVFca than to FIVPle subtype B and FIVOma likely reflecting recombination between strains in the wild. Conclusion This study demonstrates the necessity of whole-genome analysis to complement population/gene-based studies, which are of limited utility in uncovering complex events such as recombination that may lead to functional differences in virulence and pathogenicity. These full-length lion

  13. The genome sequence of Lone Star virus, a highly divergent bunyavirus found in the Amblyomma americanum tick.

    Directory of Open Access Journals (Sweden)

    Andrea Swei

    Full Text Available Viruses in the family Bunyaviridae infect a wide range of plant, insect, and animal hosts. Tick-borne bunyaviruses in the Phlebovirus genus, including Severe Fever with Thrombocytopenia Syndrome virus (SFTSV in China, Heartland virus (HRTV in the United States, and Bhanja virus in Eurasia and Africa have been associated with acute febrile illness in humans. Here we sought to characterize the growth characteristics and genome of Lone Star virus (LSV, an unclassified bunyavirus originally isolated from the lone star tick Amblyomma americanum. LSV was able to infect both human (HeLa and monkey (Vero cells. Cytopathic effects were seen within 72 h in both cell lines; vacuolization was observed in infected Vero, but not HeLa, cells. Viral culture supernatants were examined by unbiased deep sequencing and analysis using an in-house developed rapid computational pipeline for viral discovery, which definitively identified LSV as a phlebovirus. De novo assembly of the full genome revealed that LSV is highly divergent, sharing <61% overall amino acid identity with any other bunyavirus. Despite this sequence diversity, LSV was found by phylogenetic analysis to be part of a well-supported clade that includes members of the Bhanja group viruses, which are most closely related to SFSTV/HRTV. The genome sequencing of LSV is a critical first step in developing diagnostic tools to determine the risk of arbovirus transmission by A. americanum, a tick of growing importance given its expanding geographic range and competence as a disease vector. This study also underscores the power of deep sequencing analysis in rapidly identifying and sequencing the genomes of viruses of potential clinical and public health significance.

  14. Genome-wide analysis reveals diverged patterns of codon bias, gene expression, and rates of sequence evolution in picea gene families.

    Science.gov (United States)

    De La Torre, Amanda R; Lin, Yao-Cheng; Van de Peer, Yves; Ingvarsson, Pär K

    2015-03-05

    The recent sequencing of several gymnosperm genomes has greatly facilitated studying the evolution of their genes and gene families. In this study, we examine the evidence for expression-mediated selection in the first two fully sequenced representatives of the gymnosperm plant clade (Picea abies and Picea glauca). We use genome-wide estimates of gene expression (>50,000 expressed genes) to study the relationship between gene expression, codon bias, rates of sequence divergence, protein length, and gene duplication. We found that gene expression is correlated with rates of sequence divergence and codon bias, suggesting that natural selection is acting on Picea protein-coding genes for translational efficiency. Gene expression, rates of sequence divergence, and codon bias are correlated with the size of gene families, with large multicopy gene families having, on average, a lower expression level and breadth, lower codon bias, and higher rates of sequence divergence than single-copy gene families. Tissue-specific patterns of gene expression were more common in large gene families with large gene expression divergence than in single-copy families. Recent family expansions combined with large gene expression variation in paralogs and increased rates of sequence evolution suggest that some Picea gene families are rapidly evolving to cope with biotic and abiotic stress. Our study highlights the importance of gene expression and natural selection in shaping the evolution of protein-coding genes in Picea species, and sets the ground for further studies investigating the evolution of individual gene families in gymnosperms.

  15. Discovery of Highly Divergent Repeat Landscapes in Snake Genomes Using High-Throughput Sequencing

    Science.gov (United States)

    Castoe, Todd A.; Hall, Kathryn T.; Guibotsy Mboulas, Marcel L.; Gu, Wanjun; de Koning, A.P. Jason; Fox, Samuel E.; Poole, Alexander W.; Vemulapalli, Vijetha; Daza, Juan M.; Mockler, Todd; Smith, Eric N.; Feschotte, Cédric; Pollock, David D.

    2011-01-01

    We conducted a comprehensive assessment of genomic repeat content in two snake genomes, the venomous copperhead (Agkistrodon contortrix) and the Burmese python (Python molurus bivittatus). These two genomes are both relatively small (∼1.4 Gb) but have surprisingly extensive differences in the abundance and expansion histories of their repeat elements. In the python, the readily identifiable repeat element content is low (21%), similar to bird genomes, whereas that of the copperhead is higher (45%), similar to mammalian genomes. The copperhead's greater repeat content arises from the recent expansion of many different microsatellites and transposable element (TE) families, and the copperhead had 23-fold greater levels of TE-related transcripts than the python. This suggests the possibility that greater TE activity in the copperhead is ongoing. Expansion of CR1 LINEs in the copperhead genome has resulted in TE-mediated microsatellite expansion (“microsatellite seeding”) at a scale several orders of magnitude greater than previously observed in vertebrates. Snakes also appear to be prone to horizontal transfer of TEs, particularly in the copperhead lineage. The reason that the copperhead has such a small genome in the face of so much recent expansion of repeat elements remains an open question, although selective pressure related to extreme metabolic performance is an obvious candidate. TE activity can affect gene regulation as well as rates of recombination and gene duplication, and it is therefore possible that TE activity played a role in the evolution of major adaptations in snakes; some evidence suggests this may include the evolution of venom repertoires. PMID:21572095

  16. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  17. Complete genome sequences of two highly divergent Japanese isolates of Plantago asiatica mosaic virus

    NARCIS (Netherlands)

    Komatsu, Ken; Yamashita, Kazuo; Sugawara, Kota; Verbeek, Martin; Fujita, Naoko; Hanada, Kaoru; Uehara-Ichiki, Tamaki; Fuji, Shin Ichi

    2017-01-01

    Plantago asiatica mosaic virus (PlAMV) is a member of the genus Potexvirus and has an exceptionally wide host range. It causes severe damage to lilies. Here we report on the complete nucleotide sequences of two new Japanese PlAMV isolates, one from the eudicot weed Viola grypoceras (PlAMV-Vi), and t

  18. Genome sequencing of an Indian peste des petits ruminants virus isolate, Izatnagar/94, and its implications for virus diversity, divergence and phylogeography.

    Science.gov (United States)

    Sahu, Amit Ranjan; Wani, Sajad Ahmad; Saminathan, M; Rajak, Kaushal Kishor; Sahoo, Aditya Prasad; Pandey, Aruna; Saxena, Shikha; Kanchan, Sonam; Tiwari, Ashok Kumar; Mishra, Bina; Muthuchelvan, D; Singh, R P; Singh, Yaspal; Baig, Mumtaz; Mishra, Bishnu Prasad; Singh, Raj Kumar; Gandham, Ravi Kumar

    2017-06-01

    Peste des petits ruminants is an important transboundary disease infecting small ruminants. Genome or gene sequence analysis enriches our knowledge about the evolution and transboundary nature of the causative agent of this disease, peste des petits ruminants virus (PPRV). Although analysis using whole genome sequences of pathogens leads to more precise phylogenetic relationships, when compared to individual genes or partial sequences, there is still a need to identify specific genes/genomic regions that can provide evolutionary assessments consistent with those predicted with full-length genome sequences. Here the virulent Izatnagar/94 PPRV isolate was assembled and compared to all available complete genome sequences (currently in the NCBI database) to estimate nucleotide diversity and to deduce evolutionary relationships between genes/genomic regions and the full length genomes. Our aim was to identify the preferred candidate gene for use as a phylogenetic marker, as well as to predict divergence time and explore PPRV phylogeography. Among all the PPRV genes, the H gene was identified to be the most diverse with the highest evolutionary relationship with the full genome sequences. Hence it is considered as the most preferred candidate gene for phylogenetic study with 93% identity set as a nucleotide cutoff. A whole genome nucleotide sequence cutoff value of 94% permitted specific differentiation of PPRV lineages. All the isolates examined in the study were found to have a most recent common ancestor in the late 19th or in the early 20th century with high posterior probability values. The Bayesian skyline plot revealed a decrease in genetic diversity among lineage IV isolates since the start of the vaccination program and the network analysis localized the ancestry of PPRV to Africa.

  19. Functional divergence in the genus Oenococcus as predicted by genome sequencing of the newly-described species, Oenococcus kitaharae.

    Directory of Open Access Journals (Sweden)

    Anthony R Borneman

    Full Text Available Oenococcus kitaharae is only the second member of the genus Oenococcus to be identified and is the closest relative of the industrially important wine bacterium Oenococcus oeni. To provide insight into this new species, the genome of the type strain of O. kitaharae, DSM 17330, was sequenced. Comparison of the sequenced genomes of both species show that the genome of O. kitaharae DSM 17330 contains many genes with predicted functions in cellular defence (bacteriocins, antimicrobials, restriction-modification systems and a CRISPR locus which are lacking in O. oeni. The two genomes also appear to differentially encode several metabolic pathways associated with amino acid biosynthesis and carbohydrate utilization and which have direct phenotypic consequences. This would indicate that the two species have evolved different survival techniques to suit their particular environmental niches. O. oeni has adapted to survive in the harsh, but predictable, environment of wine that provides very few competitive species. However O. kitaharae appears to have adapted to a growth environment in which biological competition provides a significant selective pressure by accumulating biological defence molecules, such as bacteriocins and restriction-modification systems, throughout its genome.

  20. The complete genome sequence of Dickeya zeae EC1 reveals substantial divergence from other Dickeya strains and species.

    Science.gov (United States)

    Zhou, Jianuan; Cheng, Yingying; Lv, Mingfa; Liao, Lisheng; Chen, Yufan; Gu, Yanfang; Liu, Shiyin; Jiang, Zide; Xiong, Yuanyan; Zhang, Lianhui

    2015-08-04

    Dickeya zeae is a bacterial species that infects monocotyledons and dicotyledons. Two antibiotic-like phytotoxins named zeamine and zeamine II were reported to play an important role in rice seed germination, and two genes associated with zeamines production, i.e., zmsA and zmsK, have been thoroughly characterized. However, other virulence factors and its molecular mechanisms of host specificity and pathogenesis are hardly known. The complete genome of D. zeae strain EC1 isolated from diseased rice plants was sequenced, annotated, and compared with the genomes of other Dickeya spp.. The pathogen contains a chromosome of 4,532,364 bp with 4,154 predicted protein-coding genes. Comparative genomics analysis indicates that D. zeae EC1 is most co-linear with D. chrysanthemi Ech1591, most conserved with D. zeae Ech586 and least similar to D. paradisiaca Ech703. Substantial genomic rearrangement was revealed by comparing EC1 with Ech586 and Ech703. Most virulence genes were well-conserved in Dickeya strains except Ech703. Significantly, the zms gene cluster involved in biosynthesis of zeamines, which were shown previously as key virulence determinants, is present in D. zeae strains isolated from rice, and some D. solani strains, but absent in other Dickeya species and the D. zeae strains isolated from other plants or sources. In addition, a DNA fragment containing 9 genes associated with fatty acid biosynthesis was found inserted in the fli gene cluster encoding flagellar biosynthesis of strain EC1 and other two rice isolates but not in other strains. This gene cluster shares a high protein similarity to the fatty acid genes from Pantoea ananatis. Our findings delineate the genetic background of D. zeae EC1, which infects both dicotyledons and monocotyledons, and suggest that D. zeae strains isolated from rice could be grouped into a distinct pathovar, i.e., D. zeae subsp. oryzae. In addition, the results of this study also unveiled that the zms gene cluster presented in

  1. Long-read sequencing improves assembly of Trichinella genomes 10-fold, revealing substantial synteny between lineages diverged over seven million years

    Science.gov (United States)

    Genome evolution influences a parasite’s’s pathogenicity, host-pathogen interactions, environmental constraints, and invasion biology, while genome assemblies form the basis of comparative sequence analyses. Given that closely related organisms typically maintain appreciable synteny, the genome asse...

  2. Karyotype divergence and spreading of 5S rDNA sequences between genomes of two species: darter and emerald gobies ( Ctenogobius , Gobiidae).

    Science.gov (United States)

    Lima-Filho, P A; Bertollo, L A C; Cioffi, M B; Costa, G W W F; Molina, W F

    2014-01-01

    Karyotype analyses of the cryptobenthic marine species Ctenogobius boleosoma and C. smaragdus were performed by means of classical and molecular cytogenetics, including physical mapping of the multigene 18S and 5S rDNA families. C. boleosoma has 2n = 44 chromosomes (2 submetacentrics + 42 acrocentrics; FN = 46) with a single chromosome pair each carrying 18S and 5S ribosomal sites; whereas C. smaragdus has 2n = 48 chromosomes (2 submetacentrics + 46 acrocentrics; FN = 50), also with a single pair bearing 18S rDNA, but an extensive increase in the number of GC-rich 5S rDNA sites in 21 chromosome pairs. The highly divergent karyotypes among Ctenogobius species contrast with observations in several other marine fish groups, demonstrating an accelerated rate of chromosomal evolution mediated by both chromosomal rearrangements and the extensive dispersion of 5S rDNA sequences in the genome. © 2014 S. Karger AG, Basel.

  3. The complete chloroplast DNA sequence of the green alga Oltmannsiellopsis viridis reveals a distinctive quadripartite architecture in the chloroplast genome of early diverging ulvophytes

    Directory of Open Access Journals (Sweden)

    Lemieux Claude

    2006-02-01

    Full Text Available Abstract Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. The basal position of the Prasinophyceae has been well documented, but the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae is currently debated. The four complete chloroplast DNA (cpDNA sequences presently available for representatives of these classes have revealed extensive variability in overall structure, gene content, intron composition and gene order. The chloroplast genome of Pseudendoclonium (Ulvophyceae, in particular, is characterized by an atypical quadripartite architecture that deviates from the ancestral type by a large inverted repeat (IR featuring an inverted rRNA operon and a small single-copy (SSC region containing 14 genes normally found in the large single-copy (LSC region. To gain insights into the nature of the events that led to the reorganization of the chloroplast genome in the Ulvophyceae, we have determined the complete cpDNA sequence of Oltmannsiellopsis viridis, a representative of a distinct, early diverging lineage. Results The 151,933 bp IR-containing genome of Oltmannsiellopsis differs considerably from Pseudendoclonium and other chlorophyte cpDNAs in intron content and gene order, but shares close similarities with its ulvophyte homologue at the levels of quadripartite architecture, gene content and gene density. Oltmannsiellopsis cpDNA encodes 105 genes, contains five group I introns, and features many short dispersed repeats. As in Pseudendoclonium cpDNA, the rRNA genes in the IR are transcribed toward the single copy region featuring the genes typically found in the ancestral LSC region, and the opposite single copy region harbours genes characteristic of both the ancestral SSC and LSC regions. The 52 genes that were transferred from the ancestral LSC to SSC region include 12 of those observed in Pseudendoclonium cpDNA. Surprisingly, the overall gene organization of

  4. Sequencing of Pax6 loci from the elephant shark reveals a family of Pax6 genes in vertebrate genomes, forged by ancient duplications and divergences.

    Directory of Open Access Journals (Sweden)

    Vydianathan Ravi

    Full Text Available Pax6 is a developmental control gene essential for eye development throughout the animal kingdom. In addition, Pax6 plays key roles in other parts of the CNS, olfactory system, and pancreas. In mammals a single Pax6 gene encoding multiple isoforms delivers these pleiotropic functions. Here we provide evidence that the genomes of many other vertebrate species contain multiple Pax6 loci. We sequenced Pax6-containing BACs from the cartilaginous elephant shark (Callorhinchus milii and found two distinct Pax6 loci. Pax6.1 is highly similar to mammalian Pax6, while Pax6.2 encodes a paired-less Pax6. Using synteny relationships, we identify homologs of this novel paired-less Pax6.2 gene in lizard and in frog, as well as in zebrafish and in other teleosts. In zebrafish two full-length Pax6 duplicates were known previously, originating from the fish-specific genome duplication (FSGD and expressed in divergent patterns due to paralog-specific loss of cis-elements. We show that teleosts other than zebrafish also maintain duplicate full-length Pax6 loci, but differences in gene and regulatory domain structure suggest that these Pax6 paralogs originate from a more ancient duplication event and are hence renamed as Pax6.3. Sequence comparisons between mammalian and elephant shark Pax6.1 loci highlight the presence of short- and long-range conserved noncoding elements (CNEs. Functional analysis demonstrates the ancient role of long-range enhancers for Pax6 transcription. We show that the paired-less Pax6.2 ortholog in zebrafish is expressed specifically in the developing retina. Transgenic analysis of elephant shark and zebrafish Pax6.2 CNEs with homology to the mouse NRE/Pα internal promoter revealed highly specific retinal expression. Finally, morpholino depletion of zebrafish Pax6.2 resulted in a "small eye" phenotype, supporting a role in retinal development. In summary, our study reveals that the pleiotropic functions of Pax6 in vertebrates are served by

  5. Chloroplast genome evolution in early diverged leptosporangiate ferns.

    Science.gov (United States)

    Kim, Hyoung Tae; Chung, Myong Gi; Kim, Ki-Joong

    2014-05-01

    In this study, the chloroplast (cp) genome sequences from three early diverged leptosporangiate ferns were completed and analyzed in order to understand the evolution of the genome of the fern lineages. The complete cp genome sequence of Osmunda cinnamomea (Osmundales) was 142,812 base pairs (bp). The cp genome structure was similar to that of eusporangiate ferns. The gene/intron losses that frequently occurred in the cp genome of leptosporangiate ferns were not found in the cp genome of O. cinnamomea. In addition, putative RNA editing sites in the cp genome were rare in O. cinnamomea, even though the sites were frequently predicted to be present in leptosporangiate ferns. The complete cp genome sequence of Diplopterygium glaucum (Gleicheniales) was 151,007 bp and has a 9.7 kb inversion between the trnL-CAA and trnVGCA genes when compared to O. cinnamomea. Several repeated sequences were detected around the inversion break points. The complete cp genome sequence of Lygodium japonicum (Schizaeales) was 157,142 bp and a deletion of the rpoC1 intron was detected. This intron loss was shared by all of the studied species of the genus Lygodium. The GC contents and the effective numbers of codons (ENCs) in ferns varied significantly when compared to seed plants. The ENC values of the early diverged leptosporangiate ferns showed intermediate levels between eusporangiate and core leptosporangiate ferns. However, our phylogenetic tree based on all of the cp gene sequences clearly indicated that the cp genome similarity between O. cinnamomea (Osmundales) and eusporangiate ferns are symplesiomorphies, rather than synapomorphies. Therefore, our data is in agreement with the view that Osmundales is a distinct early diverged lineage in the leptosporangiate ferns.

  6. Mechanisms of protein sequence divergence and incompatibility.

    Directory of Open Access Journals (Sweden)

    Alon Wellner

    Full Text Available Alignments of orthologous protein sequences convey a complex picture. Some positions are utterly conserved whilst others have diverged to variable degrees. Amongst the latter, many are non-exchangeable between extant sequences. How do functionally critical and highly conserved residues diverge? Why and how did these exchanges become incompatible within contemporary sequences? Our model is phosphoglycerate kinase (PGK, where lysine 219 is an essential active-site residue completely conserved throughout Eukaryota and Bacteria, and serine is found only in archaeal PGKs. Contemporary sequences tested exhibited complete loss of function upon exchanges at 219. However, a directed evolution experiment revealed that two mutations were sufficient for human PGK to become functional with serine at position 219. These two mutations made position 219 permissive not only for serine and lysine, but also to a range of other amino acids seen in archaeal PGKs. The identified trajectories that enabled exchanges at 219 show marked sign epistasis - a relatively small loss of function with respect to one amino acid (lysine versus a large gain with another (serine, and other amino acids. Our findings support the view that, as theoretically described, the trajectories underlining the divergence of critical positions are dominated by sign epistatic interactions. Such trajectories are an outcome of rare mutational combinations. Nonetheless, as suggested by the laboratory enabled K219S exchange, given enough time and variability in selection levels, even utterly conserved and functionally essential residues may change.

  7. Yeast genome sequencing:

    DEFF Research Database (Denmark)

    Piskur, Jure; Langkjær, Rikke Breinhold

    2004-01-01

    For decades, unicellular yeasts have been general models to help understand the eukaryotic cell and also our own biology. Recently, over a dozen yeast genomes have been sequenced, providing the basis to resolve several complex biological questions. Analysis of the novel sequence data has shown...... of closely related species helps in gene annotation and to answer how many genes there really are within the genomes. Analysis of non-coding regions among closely related species has provided an example of how to determine novel gene regulatory sequences, which were previously difficult to analyse because...... they are short and degenerate and occupy different positions. Comparative genomics helps to understand the origin of yeasts and points out crucial molecular events in yeast evolutionary history, such as whole-genome duplication and horizontal gene transfer(s). In addition, the accumulating sequence data provide...

  8. Pike and salmon as sister taxa: detailed intraclade resolution and divergence time estimation of Esociformes + Salmoniformes based on whole mitochondrial genome sequences.

    Science.gov (United States)

    Campbell, Matthew A; López, J Andrés; Sado, Tetsuya; Miya, Masaki

    2013-11-01

    The increasing number of taxa and loci in molecular phylogenetic studies of basal euteleosts has brought stability in a controversial area. A key emerging aspect to these studies is a sister Esociformes (pike) and Salmoniformes (salmon) relationship. We evaluate mitochondrial genome support for a sister Esociformes and Salmoniformes hypothesis by surveying many potential outgroups for these taxa, employing multiple phylogenetic approaches, and utilizing a thorough sampling scheme. Secondly, we conduct a simultaneous divergence time estimation and phylogenetic inference in a Bayesian framework with fossil calibrations focusing on relationships within Esociformes+Salmoniformes. Our dataset supports a sister relationship between Esociformes and Salmoniformes; however the nearest relatives of Esociformes+Salmoniformes are inconsistent among analyses. Within the order Esociformes, we advocate for a single family, Esocidae. Subfamily relationships within Salmonidae are poorly supported as Salmoninae sister to Thymallinae+Coregoninae.

  9. Segmenting the human genome based on states of neutral genetic divergence.

    Science.gov (United States)

    Kuruppumullage Don, Prabhani; Ananda, Guruprasad; Chiaromonte, Francesca; Makova, Kateryna D

    2013-09-03

    Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation to specific regions of the genome. Using hidden Markov models on insertion, deletion, nucleotide substitution, and microsatellite divergence estimates inferred from human-orangutan alignments of neutrally evolving genomic sequences, we segment the human genome into regions corresponding to different divergence states--each uniquely characterized by specific combinations of divergence levels. We then parsed the mutagenic contributions of various biochemical processes associating divergence states with a broad range of genomic landscape features. We find that high divergence states inhabit guanine- and cytosine (GC)-rich, highly recombining subtelomeric regions; low divergence states cover inner parts of autosomes; chromosome X forms its own state with lowest divergence; and a state of elevated microsatellite mutability is interspersed across the genome. These general trends are mirrored in human diversity data from the 1000 Genomes Project, and departures from them highlight the evolutionary history of primate chromosomes. We also find that genes and noncoding functional marks [annotations from the Encyclopedia of DNA Elements (ENCODE)] are concentrated in high divergence states. Our results provide a powerful tool for biomedical data analysis: segmentations can be used to screen personal genome variants--including those associated with cancer and other diseases--and to improve computational predictions of noncoding functional elements.

  10. [Genomic structure of the autotetraploid oat species Avena macrostachya inferred from comparative analysis of the ITS1 and ITS2 sequences: on the oat karyotype evolution during the early stages of the Avena species divergence].

    Science.gov (United States)

    Rodionov, A V; Tiupa, N B; Kim, E S; Machs, E M; Loskutov, I G

    2005-05-01

    To examine the genomic structure of Avena macrostachya, internal transcribed spacers, ITS1 and ITS2, as well as nuclear 5.8S tRNA genes from three oat species with AsAs karyotype (A. wiestii, A. hirtula, and A. atlantica), and those from A. longiglumis (AlAl), A. canariensis (AcAc), A. ventricosa (CvCv), A. pilosa, and A. clauda (CpCp) were sequenced. All species of the genus Avena examined represented a monophyletic group (bootstrap index = 98), within which two branches, i.e., species with A- and C-genomes, were distinguished (bootstrap indices = 100). The subject of our study, A. macrostachya, albeit belonging to the phylogenetic branch of C-genome oat species (karyotype with submetacentic and subacrocentric chromosomes), has preserved an isobrachyal karyotype, (i.e., that containing metacentric chromosomes), probably typical of the common Avena ancestor. It was suggested to classify the A. macrostachya genome as a specific form of C-genome, Cm-genome. Among the species from other genera studied, Arrhenatherum elatius was found to be the closest to Avena in ITS1 and ITS structure. Phylogenetic relationships between Avena and Helictotrichon remain intriguingly uncertain. The HPR389153 sequence from H. pratense genome was closest to the ITS1 sequences specific to the Avena A-genomes (p-distance = 0.0237), while the differences of this sequence from the ITS1 of A. macrostachya reached 0.1221. On the other hand, HAD389117 from H. adsurgens was close to the ITS1 specific to Avena C-genomes (p-distance = 0.0189), while its differences from the A-genome specific ITS1 sequences reached 0.1221. It seems likely that the appearance of highly polyploid (2n = 12-21x) species of H. pratense and H. adsurgens could be associated with interspecific hybridization involving Mediterranean oat species carrying A- and C-genomes. A hypothesis on the pathways of Avena chromosomes evolution during the early stages the oat species divergence is proposed.

  11. Whole genome investigation of a divergent clade of the pathogen Streptococcus suis

    Directory of Open Access Journals (Sweden)

    Abiyad eBaig

    2015-11-01

    Full Text Available Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN and cpn60 did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70, of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species.

  12. Malaria Genome Sequencing Project

    Science.gov (United States)

    2004-01-01

    million cases and up to 2.7 million A whole chromosome shotgun sequencing strategy was used to deaths from malaria each year. The mortality levels are...deaths from malaria each year. The mortality levels are greatest in determine the genome sequence of P. falciparum clone 3D7. This sub-Saharan Africa...aminolevulinic acid dehydratase. Cura . Genet. 40, 391-398 (2002). 15. Lasonder, E. et al Analysis of the Plasmodium falciparum proteome by high-accuracy mass

  13. Genome sequencing conference II

    Energy Technology Data Exchange (ETDEWEB)

    1990-01-01

    Genome Sequencing Conference 2 was held September 30 to October 30, 1990. 26 speaker abstracts and 33 poster presentations were included in the program report. New and improved methods for DNA sequencing and genetic mapping were presented. Many of the papers were concerned with accuracy and speed of acquisition of data with computers and automation playing an increasing role. Individual papers have been processed separately for inclusion on the database.

  14. Divergence of RNA polymerase α subunits in angiosperm plastid genomes is mediated by genomic rearrangement

    Science.gov (United States)

    Blazier, J. Chris; Ruhlman, Tracey A.; Weng, Mao-Lun; Rehman, Sumaiyah K.; Sabir, Jamal S. M.; Jansen, Robert K.

    2016-01-01

    Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP α subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled and analyzed from each of the three angiosperm families. Multiple lines of evidence indicated that the rpoA sequences are likely functional despite retaining as low as 30% nucleotide sequence identity with rpoA genes from outgroups in the same angiosperm order. The ratio of non-synonymous to synonymous substitutions indicated that these genes are under purifying selection, and bioinformatic prediction of conserved domains indicated that functional domains are preserved. One of the lineages (Pelargonium, Geraniaceae) contains species with multiple rpoA-like ORFs that show evidence of ongoing inter-paralog gene conversion. The plastid genomes containing these divergent rpoA genes have experienced extensive structural rearrangement, including large expansions of the inverted repeat. We propose that illegitimate recombination, not positive selection, has driven the divergence of rpoA. PMID:27087667

  15. History repeats itself: genomic divergence in copepods.

    Science.gov (United States)

    Renaut, Sébastien; Dion-Côté, Anne-Marie

    2016-04-01

    Press stop, erase everything from now till some arbitrary time in the past and start recording life as it evolves once again. Would you see the same tape of life playing itself over and over, or would a different story unfold every time? The late Steven Jay Gould called this experiment replaying the tape of life and argued that any replay of the tape would lead evolution down a pathway radically different from the road actually taken (Gould 1989). This thought experiment has puzzled evolutionary biologists for a long time: how repeatable are evolutionary events? And if history does indeed repeat itself, what are the factors that may help us predict the path taken? A powerful means to address these questions at a small evolutionary scale is to study closely related populations that have evolved independently, under similar environmental conditions. This is precisely what Pereira et al. (2016) set out to do using marine copepods Tigriopus californicus, and present their results in this issue of Molecular Ecology. They show that evolution can be repeatable and even partly predictable, at least at the molecular level. As expected from theory, patterns of divergence were shaped by natural selection. At the same time, strong genetic drift due to small population sizes also constrained evolution down a similar evolutionary road, and probably contributed to repeatable patterns of genomic divergence.

  16. Strategies for complete plastid genome sequencing.

    Science.gov (United States)

    Twyford, Alex D; Ness, Rob W

    2016-10-28

    Plastid sequencing is an essential tool in the study of plant evolution. This high-copy organelle is one of the most technically accessible regions of the genome, and its sequence conservation makes it a valuable region for comparative genome evolution, phylogenetic analysis and population studies. Here, we discuss recent innovations and approaches for de novo plastid assembly that harness genomic tools. We focus on technical developments including low-cost sequence library preparation approaches for genome skimming, enrichment via hybrid baits and methylation-sensitive capture, sequence platforms with higher read outputs and longer read lengths, and automated tools for assembly. These developments allow for a much more streamlined assembly than via conventional short-range PCR. Although newer methods make complete plastid sequencing possible for any land plant or green alga, there are still challenges for producing finished plastomes particularly from herbarium material or from structurally divergent plastids such as those of parasitic plants.

  17. Sequencing the maize genome.

    Science.gov (United States)

    Martienssen, Robert A; Rabinowicz, Pablo D; O'Shaughnessy, Andrew; McCombie, W Richard

    2004-04-01

    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.

  18. Classifying Genomic Sequences by Sequence Feature Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  19. Testing models of speciation from genome sequences: divergence and asymmetric admixture in Island Southeast Asian Sus species during the Plio-Pleistocene climatic fluctuations

    NARCIS (Netherlands)

    Frantz, L.A.F.; Madsen, O.; Megens, H.J.W.C.; Groenen, M.; Lohse, H.

    2014-01-01

    In many temperate regions, ice ages promoted range contractions into refugia resulting in divergence (and potentially speciation), while warmer periods led to range expansions and hybridization. However, the impact these climatic oscillations had in many parts of the tropics remains elusive. Here, w

  20. A genomic island linked to ecotype divergence in Atlantic cod

    DEFF Research Database (Denmark)

    Hansen, Jakob Hemmer; Eg Nielsen, Einar; Therkildsen, Nina O.;

    2013-01-01

    gene flow and large effective population sizes, properties which theoretically could restrict divergence in local genomic regions. We identify a genomic region of strong population differentiation, extending over approximately 20 cM, between pairs of migratory and stationary ecotypes examined at two......The genomic architecture underlying ecological divergence and ecological speciation with gene flow is still largely unknown for most organisms. One central question is whether divergence is genome‐wide or localized in ‘genomic mosaics’ during early stages when gene flow is still pronounced....... Empirical work has so far been limited, and the relative impacts of gene flow and natural selection on genomic patterns have not been fully explored. Here, we use ecotypes of Atlantic cod to investigate genomic patterns of diversity and population differentiation in a natural system characterized by high...

  1. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  2. Next Generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis

    Science.gov (United States)

    The mitochondrial genome’s non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been define...

  3. Viral genome sequencing by random priming methods

    Directory of Open Access Journals (Sweden)

    Zhang Xinsheng

    2008-01-01

    Full Text Available Abstract Background Most emerging health threats are of zoonotic origin. For the overwhelming majority, their causative agents are RNA viruses which include but are not limited to HIV, Influenza, SARS, Ebola, Dengue, and Hantavirus. Of increasing importance therefore is a better understanding of global viral diversity to enable better surveillance and prediction of pandemic threats; this will require rapid and flexible methods for complete viral genome sequencing. Results We have adapted the SISPA methodology 123 to genome sequencing of RNA and DNA viruses. We have demonstrated the utility of the method on various types and sources of viruses, obtaining near complete genome sequence of viruses ranging in size from 3,000–15,000 kb with a median depth of coverage of 14.33. We used this technique to generate full viral genome sequence in the presence of host contaminants, using viral preparations from cell culture supernatant, allantoic fluid and fecal matter. Conclusion The method described is of great utility in generating whole genome assemblies for viruses with little or no available sequence information, viruses from greatly divergent families, previously uncharacterized viruses, or to more fully describe mixed viral infections.

  4. A Draft Sequence of the Neandertal Genome

    Science.gov (United States)

    Green, Richard E.; Li, Heng; Zhai, Weiwei; Fritz, Markus Hsi-Yang; Hansen, Nancy F.; Durand, Eric Y.; Malaspinas, Anna-Sapfo; Jensen, Jeffrey D.; Marques-Bonet, Tomas; Alkan, Can; Prüfer, Kay; Meyer, Matthias; Burbano, Hernán A.; Good, Jeffrey M.; Schultz, Rigo; Aximu-Petri, Ayinuer; Butthof, Anne; Höber, Barbara; Höffner, Barbara; Siegemund, Madlen; Weihmann, Antje; Nusbaum, Chad; Lander, Eric S.; Russ, Carsten; Novod, Nathaniel; Affourtit, Jason; Egholm, Michael; Verna, Christine; Rudan, Pavao; Brajkovic, Dejana; Kucan, Željko; Gušic, Ivan; Doronichev, Vladimir B.; Golovanova, Liubov V.; Lalueza-Fox, Carles; de la Rasilla, Marco; Fortea, Javier; Rosas, Antonio; Schmitz, Ralf W.; Johnson, Philip L. F.; Eichler, Evan E.; Falush, Daniel; Birney, Ewan; Mullikin, James C.; Slatkin, Montgomery; Nielsen, Rasmus; Kelso, Janet; Lachmann, Michael; Reich, David; Pääbo, Svante

    2016-01-01

    Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other. PMID:20448178

  5. The impact of selection, gene flow and demographic history on heterogeneous genomic divergence: three-spine sticklebacks in divergent environments.

    Science.gov (United States)

    Ferchaud, Anne-Laure; Hansen, Michael M

    2016-01-01

    Heterogeneous genomic divergence between populations may reflect selection, but should also be seen in conjunction with gene flow and drift, particularly population bottlenecks. Marine and freshwater three-spine stickleback (Gasterosteus aculeatus) populations often exhibit different lateral armour plate morphs. Moreover, strikingly parallel genomic footprints across different marine-freshwater population pairs are interpreted as parallel evolution and gene reuse. Nevertheless, in some geographic regions like the North Sea and Baltic Sea, different patterns are observed. Freshwater populations in coastal regions are often dominated by marine morphs, suggesting that gene flow overwhelms selection, and genomic parallelism may also be less pronounced. We used RAD sequencing for analysing 28 888 SNPs in two marine and seven freshwater populations in Denmark, Europe. Freshwater populations represented a variety of environments: river populations accessible to gene flow from marine sticklebacks and large and small isolated lakes with and without fish predators. Sticklebacks in an accessible river environment showed minimal morphological and genomewide divergence from marine populations, supporting the hypothesis of gene flow overriding selection. Allele frequency spectra suggested bottlenecks in all freshwater populations, and particularly two small lake populations. However, genomic footprints ascribed to selection could nevertheless be identified. No genomic regions were consistent freshwater-marine outliers, and parallelism was much lower than in other comparable studies. Two genomic regions previously described to be under divergent selection in freshwater and marine populations were outliers between different freshwater populations. We ascribe these patterns to stronger environmental heterogeneity among freshwater populations in our study as compared to most other studies, although the demographic history involving bottlenecks should also be considered in the

  6. Whole-exome/genome sequencing and genomics.

    Science.gov (United States)

    Grody, Wayne W; Thompson, Barry H; Hudgins, Louanne

    2013-12-01

    As medical genetics has progressed from a descriptive entity to one focused on the functional relationship between genes and clinical disorders, emphasis has been placed on genomics. Genomics, a subelement of genetics, is the study of the genome, the sum total of all the genes of an organism. The human genome, which is contained in the 23 pairs of nuclear chromosomes and in the mitochondrial DNA of each cell, comprises >6 billion nucleotides of genetic code. There are some 23,000 protein-coding genes, a surprisingly small fraction of the total genetic material, with the remainder composed of noncoding DNA, regulatory sequences, and introns. The Human Genome Project, launched in 1990, produced a draft of the genome in 2001 and then a finished sequence in 2003, on the 50th anniversary of the initial publication of Watson and Crick's paper on the double-helical structure of DNA. Since then, this mass of genetic information has been translated at an ever-increasing pace into useable knowledge applicable to clinical medicine. The recent advent of massively parallel DNA sequencing (also known as shotgun, high-throughput, and next-generation sequencing) has brought whole-genome analysis into the clinic for the first time, and most of the current applications are directed at children with congenital conditions that are undiagnosable by using standard genetic tests for single-gene disorders. Thus, pediatricians must become familiar with this technology, what it can and cannot offer, and its technical and ethical challenges. Here, we address the concepts of human genomic analysis and its clinical applicability for primary care providers.

  7. Two genome sequences of the same bacterial strain, Gluconacetobacter diazotrophicus PAl 5, suggest a new standard in genome sequence submission.

    Science.gov (United States)

    Giongo, Adriana; Tyler, Heather L; Zipperer, Ursula N; Triplett, Eric W

    2010-06-15

    Gluconacetobacter diazotrophicus PAl 5 is of agricultural significance due to its ability to provide fixed nitrogen to plants. Consequently, its genome sequence has been eagerly anticipated to enhance understanding of endophytic nitrogen fixation. Two groups have sequenced the PAl 5 genome from the same source (ATCC 49037), though the resulting sequences contain a surprisingly high number of differences. Therefore, an optical map of PAl 5 was constructed in order to determine which genome assembly more closely resembles the chromosomal DNA by aligning each sequence against a physical map of the genome. While one sequence aligned very well, over 98% of the second sequence contained numerous rearrangements. The many differences observed between these two genome sequences could be owing to either assembly errors or rapid evolutionary divergence. The extent of the differences derived from sequence assembly errors could be assessed if the raw sequencing reads were provided by both genome centers at the time of genome sequence submission. Hence, a new genome sequence standard is proposed whereby the investigator supplies the raw reads along with the closed sequence so that the community can make more accurate judgments on whether differences observed in a single stain may be of biological origin or are simply caused by differences in genome assembly procedures.

  8. Genome-Wide Divergence in the West-African Malaria Vector Anopheles melas

    Directory of Open Access Journals (Sweden)

    Kevin C. Deitz

    2016-09-01

    Full Text Available Anopheles melas is a member of the recently diverged An. gambiae species complex, a model for speciation studies, and is a locally important malaria vector along the West-African coast where it breeds in brackish water. A recent population genetic study of An. melas revealed species-level genetic differentiation between three population clusters. An. melas West extends from The Gambia to the village of Tiko, Cameroon. The other mainland cluster, An. melas South, extends from the southern Cameroonian village of Ipono to Angola. Bioko Island, Equatorial Guinea An. melas populations are genetically isolated from mainland populations. To examine how genetic differentiation between these An. melas forms is distributed across their genomes, we conducted a genome-wide analysis of genetic differentiation and selection using whole genome sequencing data of pooled individuals (Pool-seq from a representative population of each cluster. The An. melas forms exhibit high levels of genetic differentiation throughout their genomes, including the presence of numerous fixed differences between clusters. Although the level of divergence between the clusters is on a par with that of other species within the An. gambiae complex, patterns of genome-wide divergence and diversity do not provide evidence for the presence of pre- and/or postmating isolating mechanisms in the form of speciation islands. These results are consistent with an allopatric divergence process with little or no introgression.

  9. Shared and nonshared genomic divergence in parallel ecotypes of Littorina saxatilis at a local scale.

    Science.gov (United States)

    Ravinet, Mark; Westram, Anja; Johannesson, Kerstin; Butlin, Roger; André, Carl; Panova, Marina

    2016-01-01

    Parallel speciation occurs when selection drives repeated, independent adaptive divergence that reduces gene flow between ecotypes. Classical examples show parallel speciation originating from shared genomic variation, but this does not seem to be the case in the rough periwinkle (Littorina saxatilis) that has evolved considerable phenotypic diversity across Europe, including several distinct ecotypes. Small 'wave' ecotype snails inhabit exposed rocks and experience strong wave action, while thick-shelled, 'crab' ecotype snails are larger and experience crab predation on less exposed shores. Crab and wave ecotypes appear to have arisen in parallel, and recent evidence suggests only marginal sharing of molecular variation linked to evolution of similar ecotypes in different parts of Europe. However, the extent of genomic sharing is expected to increase with gene flow and more recent common ancestry. To test this, we used de novo RAD-sequencing to quantify the extent of shared genomic divergence associated with phenotypic similarities amongst ecotype pairs on three close islands (<10 km distance) connected by weak gene flow (Nm ~ 0.03) and with recent common ancestry (<10 000 years). After accounting for technical issues, including a large proportion of null alleles due to a large effective population size, we found ~8-28% of positive outliers were shared between two islands and ~2-9% were shared amongst all three islands. This low level of sharing suggests that parallel phenotypic divergence in this system is not matched by shared genomic divergence despite a high probability of gene flow and standing genetic variation.

  10. Pig genome sequence - analysis and publication strategy

    NARCIS (Netherlands)

    Archibald, A.L.; Bolund, L.; Churcher, C.; Fredholm, M.; Groenen, M.A.M.; Harlizius, B.

    2010-01-01

    Background - The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. Results - Assemblies of the B

  11. Genomic divergence during speciation driven by adaptation to altitude.

    Science.gov (United States)

    Chapman, Mark A; Hiscock, Simon J; Filatov, Dmitry A

    2013-12-01

    Even though Darwin's "On the Origin of Species" implied selection being the main driver of species formation, the role of natural selection in speciation remains poorly understood. In particular, it remains unclear how selection at a few genes can lead to genomewide divergence and the formation of distinct species. We used a particularly attractive clear-cut case of recent plant ecological speciation to investigate the demography and genomic bases of species formation driven by adaptation to contrasting conditions. High-altitude Senecio aethnensis and low-altitude S. chrysanthemifolius live at the extremes of a mountain slope on Mt. Etna, Sicily, and form a hybrid zone at intermediate altitudes but remain morphologically distinct. Genetic differentiation of these species was analyzed at the DNA polymorphism and gene expression levels by high-throughput sequencing of transcriptomes from multiple individuals. Out of ≈ 18,000 genes analyzed, only a small number (90) displayed differential expression between the two species. These genes showed significantly elevated species differentiation (FST and Dxy), consistent with diversifying selection acting on these genes. Genomewide genetic differentiation of the species is surprisingly low (FST = 0.19), while ≈ 200 genes showed significantly higher (false discovery rate 0.6) interspecific differentiation and evidence for local adaptation. Diversifying selection at only a handful of loci may be enough for the formation and maintenance of taxonomically well-defined species, despite ongoing gene flow. This provides an explanation of why many closely related species (in plants, in particular) remain phenotypically and ecologically distinct despite ongoing hybridization, a question that has long puzzled naturalists and geneticists alike.

  12. Discovery of a divergent HPIV4 from respiratory secretions using second and third generation metagenomic sequencing.

    Science.gov (United States)

    Alquezar-Planas, David E; Mourier, Tobias; Bruhn, Christian A W; Hansen, Anders J; Vitcetz, Sarah Nathalie; Mørk, Søren; Gorodkin, Jan; Nielsen, Hanne Abel; Guo, Yan; Sethuraman, Anand; Paxinos, Ellen E; Shan, Tongling; Delwart, Eric L; Nielsen, Lars P

    2013-01-01

    Molecular detection of viruses has been aided by high-throughput sequencing, permitting the genomic characterization of emerging strains. In this study, we comprehensively screened 500 respiratory secretions from children with upper and/or lower respiratory tract infections for viral pathogens. The viruses detected are described, including a divergent human parainfluenza virus type 4 from GS FLX pyrosequencing of 92 specimens. Complete full-genome characterization of the virus followed, using Single Molecule, Real-Time (SMRT) sequencing. Subsequent "primer walking" combined with Sanger sequencing validated the RS platform's utility in viral sequencing from complex clinical samples. Comparative genomics reveals the divergent strain clusters with the only completely sequenced HPIV4a subtype. However, it also exhibits various structural features present in one of the HPIV4b reference strains, opening questions regarding their lifecycle and evolutionary relationships among these viruses. Clinical data from patients infected with the strain, as well as viral prevalence estimates using real-time PCR, is also described.

  13. Genome size increases in recently diverged hornwort clades.

    Science.gov (United States)

    Bainard, Jillian D; Villarreal, Juan Carlos

    2013-08-01

    As our knowledge of plant genome size estimates continues to grow, one group has continually been neglected: the hornworts. Hornworts (Anthocerotophyta) have been traditionally grouped with liverworts and mosses because they share a haploid dominant life cycle; however, recent molecular studies place hornworts as the sister lineage to extant tracheophytes. Given the scarcity of information regarding the DNA content of hornworts, our objective was to estimate the 1C-value for a range of hornwort species within a phylogenetic context. Using flow cytometry, we estimated genome size for 36 samples representing 24 species. This accounts for roughly 10% of known hornwort species. Haploid genome sizes (1C-value) ranged from 160 Mbp or 0.16 pg (Leiosporoceros dussii) to 719 Mbp or 0.73 pg (Nothoceros endiviifolius). The average 1C-value was 261 ± 104 Mbp (0.27 ± 0.11 pg). Ancestral reconstruction of genome size on a hornwort phylogeny suggests a small ancestral genome size and revealed increases in genome size in the most recently divergent clades. Much more work is needed to understand DNA content variation in this phylogenetically important group, but this work has significantly increased our knowledge of genome size variation in hornworts.

  14. A profile-based method for identifying functional divergence of orthologous genes in bacterial genomes.

    Science.gov (United States)

    Wheeler, Nicole E; Barquist, Lars; Kingsley, Robert A; Gardner, Paul P

    2016-12-01

    Next generation sequencing technologies have provided us with a wealth of information on genetic variation, but predicting the functional significance of this variation is a difficult task. While many comparative genomics studies have focused on gene flux and large scale changes, relatively little attention has been paid to quantifying the effects of single nucleotide polymorphisms and indels on protein function, particularly in bacterial genomics. We present a hidden Markov model based approach we call delta-bitscore (DBS) for identifying orthologous proteins that have diverged at the amino acid sequence level in a way that is likely to impact biological function. We benchmark this approach with several widely used datasets and apply it to a proof-of-concept study of orthologous proteomes in an investigation of host adaptation in Salmonella enterica We highlight the value of the method in identifying functional divergence of genes, and suggest that this tool may be a better approach than the commonly used dN/dS metric for identifying functionally significant genetic changes occurring in recently diverged organisms. A program implementing DBS for pairwise genome comparisons is freely available at: https://github.com/UCanCompBio/deltaBS CONTACT: nicole.wheeler@pg.canterbury.ac.nz or lars.barquist@uni-wuerzburg.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  15. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol;

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  16. Chloroplast Genome Evolution in Actinidiaceae: clpP Loss, Heterogenous Divergence and Phylogenomic Practice.

    Science.gov (United States)

    Wang, Wen-Cai; Chen, Si-Yun; Zhang, Xian-Zhi

    2016-01-01

    Actinidiaceae is a well-known economically important plant family in asterids. To elucidate the chloroplast (cp) genome evolution within this family, here we present complete genomes of three species from two sister genera (Clematoclethra and Actinidia) in the Actinidiaceae via genome skimming technique. Comparative analyses revealed that the genome structure and content were rather conservative in three cp genomes in spite of different inheritance pattern, i.e.paternal in Actinidia and maternal in Clematoclethra. The clpP gene was lacked in all the three sequenced cp genomes examined here indicating that the clpP gene loss is likely a conspicuous synapomorphic characteristic during the cp genome evolution of Actinidiaceae. Comprehensive sequence comparisons in Actinidiaceae cp genomes uncovered that there were apparently heterogenous divergence patterns among the cpDNA regions, suggesting a preferred data-partitioned analysis for cp phylogenomics. Twenty non-coding cpDNA loci with fast evolutionary rates are further identified as potential molecular markers for systematics studies of Actinidiaceae. Moreover, the cp phylogenomic analyses including 31 angiosperm plastomes strongly supported the monophyly of Actinidia, being sister to Clematoclethra in Actinidiaceae which locates in the basal asterids, Ericales.

  17. Phycobilisomes linker family in cyanobacterial genomes: divergence and evolution

    Directory of Open Access Journals (Sweden)

    Xiangyu Guan, Song Qin, Fangqing Zhao, Xiaowen Zhang, Xuexi Tang

    2007-01-01

    Full Text Available Cyanobacteria are the oldest life form making important contributions to global CO2 fixation on the Earth. Phycobilisomes (PBSs are the major light harvesting systems of most cyanobacteria species. Recent availability of the whole genome database of cyanobacteria provides us a global and further view on the complex structural PBSs. A PBSs linker family is crucial in structure and function of major light-harvesting PBSs complexes. Linker polypeptides are considered to have the same ancestor with other phycobiliproteins (PBPs, and might have been diverged and evolved under particularly selective forces together. In this paper, a total of 192 putative linkers including 167 putative PBSs-associated linker genes and 25 Ferredoxin-NADP oxidoreductase (FNR genes were detected through whole genome analysis of all 25 cyanobacterial genomes (20 finished and 5 in draft state. We compared the PBSs linker family of cyanobacteria in terms of gene structure, chromosome location, conservation domain, and polymorphic variants, and discussed the features and functions of the PBSs linker family. Most of PBSs-associated linkers in PBSs linker family are assembled into gene clusters with PBPs. A phylogenetic analysis based on protein data demonstrates a possibility of six classes of the linker family in cyanobacteria. Emergence, divergence, and disappearance of PBSs linkers among cyanobacterial species were due to speciation, gene duplication, gene transfer, or gene loss, and acclimation to various environmental selective pressures especially light.

  18. Analysis of a native whitefly transcriptome and its sequence divergence with two invasive whitefly species

    Directory of Open Access Journals (Sweden)

    Wang Xiao-Wei

    2012-10-01

    Full Text Available Abstract Background Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1 and Mediterranean (MED, respectively. Results More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84% and much higher than that of MEAM1 and MED (0.83%. This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for

  19. Analysis of a native whitefly transcriptome and its sequence divergence with two invasive whitefly species.

    Science.gov (United States)

    Wang, Xiao-Wei; Zhao, Qiong-Yi; Luan, Jun-Bo; Wang, Yu-Jun; Yan, Gen-Hong; Liu, Shu-Sheng

    2012-10-04

    Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED), respectively. More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp) and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84%) and much higher than that of MEAM1 and MED (0.83%). This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for the investigation of association between

  20. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    OpenAIRE

    Brown, Pamela J.B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  1. Genome sequences of eight morphologically diverse Alphaproteobacteria.

    Science.gov (United States)

    Brown, Pamela J B; Kysela, David T; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-09-01

    The Alphaproteobacteria comprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.

  2. Genome Sequences of Eight Morphologically Diverse Alphaproteobacteria▿

    Science.gov (United States)

    Brown, Pamela J. B.; Kysela, David T.; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V.

    2011-01-01

    The Alphaproteobacteriacomprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium. PMID:21705585

  3. Genome Sequence of Mycobacteriophage Momo.

    Science.gov (United States)

    Pope, Welkin H; Bina, Elizabeth A; Brahme, Indraneel S; Hill, Amy B; Himmelstein, Philip H; Hunsicker, Sara M; Ish, Amanda R; Le, Tinh S; Martin, Mary M; Moscinski, Catherine N; Shetty, Sameer A; Swierzewski, Tomasz; Iyengar, Varun B; Kim, Hannah; Schafer, Claire E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-06-18

    Momo is a newly discovered phage of Mycobacterium smegmatis mc(2)155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. Copyright © 2015 Pope et al.

  4. Comparative Study of Genome Divergence in Salmonids with Various Rates of Genetic Isolation

    Directory of Open Access Journals (Sweden)

    Elena A. Shubina

    2013-01-01

    Full Text Available The aim of the study is a comparative investigation of changes that certain genome parts undergo during speciation. The research was focused on divergence of coding and noncoding sequences in different groups of salmonid fishes of the Salmonidae (Salmo, Parasalmo, Oncorhynchus, and Salvelinus genera and the Coregonidae families under different levels of reproductive isolation. Two basic approaches were used: (1 PCR-RAPD with a 20–22 nt primer design with subsequent cloning and sequencing of the products and (2 a modified endonuclease restriction analysis. The restriction fragments were shown with sequencing to represent satellite DNA. Effects of speciation are found in repetitive sequences. The revelation of expressed sequences in the majority of the employed anonymous loci allows for assuming the adaptive selection during allopatric speciation in isolated char forms.

  5. Comparative study of genome divergence in salmonids with various rates of genetic isolation.

    Science.gov (United States)

    Shubina, Elena A; Nikitin, Mikhail A; Ponomareva, Ekaterina V; Goryunov, Denis V; Gritsenko, Oleg F

    2013-01-01

    The aim of the study is a comparative investigation of changes that certain genome parts undergo during speciation. The research was focused on divergence of coding and noncoding sequences in different groups of salmonid fishes of the Salmonidae (Salmo, Parasalmo, Oncorhynchus, and Salvelinus genera) and the Coregonidae families under different levels of reproductive isolation. Two basic approaches were used: (1) PCR-RAPD with a 20-22 nt primer design with subsequent cloning and sequencing of the products and (2) a modified endonuclease restriction analysis. The restriction fragments were shown with sequencing to represent satellite DNA. Effects of speciation are found in repetitive sequences. The revelation of expressed sequences in the majority of the employed anonymous loci allows for assuming the adaptive selection during allopatric speciation in isolated char forms.

  6. Complete Genome Sequences of Bordetella pertussis Vaccine Reference Strains 134 and 10536

    Science.gov (United States)

    Peng, Yanhui; Loparev, Vladimir; Batra, Dhwani; Burroughs, Mark; Johnson, Taccara; Juieng, Phalasy; Rowe, Lori; Tondella, M. Lucia; Williams, Margaret M.

    2016-01-01

    Vaccine formulations and vaccination programs against whooping cough (pertussis) vary worldwide. Here, we report the complete genome sequences of two divergent Bordetella pertussis reference strains used in the production of pertussis vaccines. PMID:27635001

  7. Complete Genome Sequences of Bordetella pertussis Vaccine Reference Strains 134 and 10536.

    Science.gov (United States)

    Weigand, Michael R; Peng, Yanhui; Loparev, Vladimir; Batra, Dhwani; Burroughs, Mark; Johnson, Taccara; Juieng, Phalasy; Rowe, Lori; Tondella, M Lucia; Williams, Margaret M

    2016-09-15

    Vaccine formulations and vaccination programs against whooping cough (pertussis) vary worldwide. Here, we report the complete genome sequences of two divergent Bordetella pertussis reference strains used in the production of pertussis vaccines.

  8. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    Directory of Open Access Journals (Sweden)

    Claros M Gonzalo

    2010-06-01

    Full Text Available Abstract Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used

  9. Northern Bobwhite (Colinus virginianus Mitochondrial Population Genomics Reveals Structure, Divergence, and Evidence for Heteroplasmy.

    Directory of Open Access Journals (Sweden)

    Yvette A Halley

    Full Text Available Herein, we evaluated the concordance of population inferences and conclusions resulting from the analysis of short mitochondrial fragments (i.e., partial or complete D-Loop nucleotide sequences versus complete mitogenome sequences for 53 bobwhites representing six ecoregions across TX and OK (USA. Median joining (MJ haplotype networks demonstrated that analyses performed using small mitochondrial fragments were insufficient for estimating the true (i.e., complete mitogenome haplotype structure, corresponding levels of divergence, and maternal population history of our samples. Notably, discordant demographic inferences were observed when mismatch distributions of partial (i.e., partial D-Loop versus complete mitogenome sequences were compared, with the reduction in mitochondrial genomic information content observed to encourage spurious inferences in our samples. A probabilistic approach to variant prediction for the complete bobwhite mitogenomes revealed 344 segregating sites corresponding to 347 total mutations, including 49 putative nonsynonymous single nucleotide variants (SNVs distributed across 12 protein coding genes. Evidence of gross heteroplasmy was observed for 13 bobwhites, with 10 of the 13 heteroplasmies involving one moderate to high frequency SNV. Haplotype network and phylogenetic analyses for the complete bobwhite mitogenome sequences revealed two divergent maternal lineages (dXY = 0.00731; FST = 0.849; P < 0.05, thereby supporting the potential for two putative subspecies. However, the diverged lineage (n = 103 variants almost exclusively involved bobwhites geographically classified as Colinus virginianus texanus, which is discordant with the expectations of previous geographic subspecies designations. Tests of adaptive evolution for functional divergence (MKT, frequency distribution tests (D, FS and phylogenetic analyses (RAxML provide no evidence for positive selection or hybridization with the sympatric scaled quail

  10. Three divergent mitochondrial genomes from California populations of the copepod Tigriopus californicus.

    Science.gov (United States)

    Burton, Ronald S; Byrne, Rosemary J; Rawson, Paul D

    2007-11-15

    Previous work on the harpacticoid copepod Tigriopus californicus has focused on the extensive population differentiation in three mtDNA protein coding genes (COXI, COXII, Cytb). In order to get a more complete understanding of mtDNA evolution in this species, we sequenced three complete mitochondrial genomes (one from each of three California populations) and compared them to two published mtDNA genomes from an Asian congener, Tigriopus japonicus. Several features of the mtDNA genome appear to be conserved within the genus: 1) the unique order of the protein coding genes, rRNA genes and most of the tRNA genes, 2) the genome is compact, varying between 14.3 and 14.6 kb, and 3) all genes are encoded on the same strand of the mtDNA. Within T. californicus, extremely high levels of nucleotide divergence (>20%) are observed across much of the mitochondrial genome. Inferred amino acid sequences of the proteins encoded in the mtDNAs also show high levels of divergence; at the extreme, the three ND3 variants in T. californicus showed >25% amino acid substitutions, compared with californicus mtDNAs. Although not previously noted, this feature is also conserved in T. japonicus mtDNAs; whether this sequence is processed into a functional tRNA has not been determined. The putative control region contains a duplicated segment of different length (from 88 to 155 bp) in each of the T. californicus sequences. In each case, the duplicated segments are not tandem repeats; despite their different lengths, the distance between the start of the first and the start of the second repeat is conserved (520 bp). The functional significance, if any, of this repeat structure remains unknown.

  11. Sequence divergence of Mus spretus and Mus musculus across a skin cancer susceptibility locus

    Directory of Open Access Journals (Sweden)

    Balmain Allan

    2008-12-01

    Full Text Available Abstract Background Mus spretus diverged from Mus musculus over one million years ago. These mice are genetically and phenotypically divergent. Despite the value of utilizing M. musculus and M. spretus for quantitative trait locus (QTL mapping, relatively little genomic information on M. spretus exists, and most of the available sequence and polymorphic data is for one strain of M. spretus, Spret/Ei. In previous work, we mapped fifteen loci for skin cancer susceptibility using four different M. spretus by M. musculus F1 backcrosses. One locus, skin tumor susceptibility 5 (Skts5 on chromosome 12, shows strong linkage in one cross. Results To identify potential candidate genes for Skts5, we sequenced 65 named and unnamed genes and coding elements mapping to the peak linkage area in outbred spretus, Spret/EiJ, FVB/NJ, and NIH/Ola. We identified polymorphisms in 62 of 65 genes including 122 amino acid substitutions. To look for polymorphisms consistent with the linkage data, we sequenced exons with amino acid polymorphisms in two additional M. spretus strains and one additional M. musculus strain generating 40.1 kb of sequence data. Eight candidate variants were identified that fit with the linkage data. To determine the degree of variation across M. spretus, we conducted phylogenetic analyses. The relatedness of the M. spretus strains at this locus is consistent with the proximity of region of ascertainment of the ancestral mice. Conclusion Our analyses suggest that, if Skts5 on chromosome 12 is representative of other regions in the genome, then published genomic data for Spret/EiJ are likely to be of high utility for genomic studies in other M. spretus strains.

  12. Translational genomics for plant breeding with the genome sequence explosion.

    Science.gov (United States)

    Kang, Yang Jae; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Jeong, Haneul; Satyawan, Dani; Kim, Moon Young; Lee, Suk-Ha

    2016-04-01

    The use of next-generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies.

  13. RAD sequencing reveals genomewide divergence between independent invasions of the European green crab (Carcinus maenas) in the Northwest Atlantic.

    Science.gov (United States)

    Jeffery, Nicholas W; DiBacco, Claudio; Van Wyngaarden, Mallory; Hamilton, Lorraine C; Stanley, Ryan R E; Bernier, Renée; FitzGerald, Jennifer; Matheson, K; McKenzie, C H; Nadukkalam Ravindran, Praveen; Beiko, Robert; Bradbury, Ian R

    2017-04-01

    Genomic studies of invasive species can reveal both invasive pathways and functional differences underpinning patterns of colonization success. The European green crab (Carcinus maenas) was initially introduced to eastern North America nearly 200 years ago where it expanded northwards to eastern Nova Scotia. A subsequent invasion to Nova Scotia from a northern European source allowed further range expansion, providing a unique opportunity to study the invasion genomics of a species with multiple invasions. Here, we use restriction-site-associated DNA sequencing-derived SNPs to explore fine-scale genomewide differentiation between these two invasions. We identified 9137 loci from green crab sampled from 11 locations along eastern North America and compared spatial variation to mitochondrial COI sequence variation used previously to characterize these invasions. Overall spatial divergence among invasions was high (pairwise FST ~0.001 to 0.15) and spread across many loci, with a mean FST ~0.052 and 52% of loci examined characterized by FST values >0.05. The majority of the most divergent loci (i.e., outliers, ~1.2%) displayed latitudinal clines in allele frequency highlighting extensive genomic divergence among the invasions. Discriminant analysis of principal components (both neutral and outlier loci) clearly resolved the two invasions spatially and was highly correlated with mitochondrial divergence. Our results reveal extensive cryptic intraspecific genomic diversity associated with differing patterns of colonization success and demonstrates clear utility for genomic approaches to delineating the distribution and colonization success of aquatic invasive species.

  14. Divergence in cis-regulatory sequences surrounding the opsin gene arrays of African cichlid fishes

    Directory of Open Access Journals (Sweden)

    Streelman J Todd

    2011-05-01

    Full Text Available Abstract Background Divergence within cis-regulatory sequences may contribute to the adaptive evolution of gene expression, but functional alleles in these regions are difficult to identify without abundant genomic resources. Among African cichlid fishes, the differential expression of seven opsin genes has produced adaptive differences in visual sensitivity. Quantitative genetic analysis suggests that cis-regulatory alleles near the SWS2-LWS opsins may contribute to this variation. Here, we sequence BACs containing the opsin genes of two cichlids, Oreochromis niloticus and Metriaclima zebra. We use phylogenetic footprinting and shadowing to examine divergence in conserved non-coding elements, promoter sequences, and 3'-UTRs surrounding each opsin in search of candidate cis-regulatory sequences that influence cichlid opsin expression. Results We identified 20 conserved non-coding elements surrounding the opsins of cichlids and other teleosts, including one known enhancer and a retinal microRNA. Most conserved elements contained computationally-predicted binding sites that correspond to transcription factors that function in vertebrate opsin expression; O. niloticus and M. zebra were significantly divergent in two of these. Similarly, we found a large number of relevant transcription factor binding sites within each opsin's proximal promoter, and identified five opsins that were considerably divergent in both expression and the number of transcription factor binding sites shared between O. niloticus and M. zebra. We also found several microRNA target sites within the 3'-UTR of each opsin, including two 3'-UTRs that differ significantly between O. niloticus and M. zebra. Finally, we examined interspecific divergence among 18 phenotypically diverse cichlids from Lake Malawi for one conserved non-coding element, two 3'-UTRs, and five opsin proximal promoters. We found that all regions were highly conserved with some evidence of CRX transcription

  15. Genomic islands of divergence are not affected by geography of speciation in sunflowers.

    Science.gov (United States)

    Renaut, S; Grassa, C J; Yeaman, S; Moyers, B T; Lai, Z; Kane, N C; Bowers, J E; Burke, J M; Rieseberg, L H

    2013-01-01

    Genomic studies of speciation often report the presence of highly differentiated genomic regions interspersed within a milieu of weakly diverged loci. The formation of these speciation islands is generally attributed to reduced inter-population gene flow near loci under divergent selection, but few studies have critically evaluated this hypothesis. Here, we report on transcriptome scans among four recently diverged pairs of sunflower (Helianthus) species that vary in the geographical context of speciation. We find that genetic divergence is lower in sympatric and parapatric comparisons, consistent with a role for gene flow in eroding neutral differences. However, genomic islands of divergence are numerous and small in all comparisons, and contrary to expectations, island number and size are not significantly affected by levels of interspecific gene flow. Rather, island formation is strongly associated with reduced recombination rates. Overall, our results indicate that the functional architecture of genomes plays a larger role in shaping genomic divergence than does the geography of speciation.

  16. Sequencing intractable DNA to close microbial genomes.

    Science.gov (United States)

    Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  17. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  18. Fungal genome sequencing: basic biology to biotechnology.

    Science.gov (United States)

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.

  19. Value of a newly sequenced bacterial genome

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Aburjaile, Flavia F; Ramos, Rommel Tj

    2014-01-01

    and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses...

  20. Ongoing ecological divergence in an emerging genomic model.

    Science.gov (United States)

    Arnegard, Matthew E

    2009-07-01

    Much of Earth's biodiversity has arisen through adaptive radiation. Important avenues of phenotypic divergence during this process include the evolution of body size and life history (Schluter 2000). Extensive adaptive radiations of cichlid fishes have occurred in the Great Lakes of Africa, giving rise to behaviours that are remarkably sophisticated and diverse across species. In Tanganyikan shell-brooding cichlids of the tribe Lamprologini, tremendous intraspecific variation in body size accompanies complex breeding systems and use of empty snail shells to hide from predators and rear offspring. A study by Takahashi et al. (2009) in this issue of Molecular Ecology reveals the first case of genetic divergence between dwarf and normal-sized morphs of the same nominal lamprologine species, Telmatochromis temporalis. Patterns of population structure suggest that the dwarf, shell-dwelling morph of T. temporalis might have arisen from the normal, rock-dwelling morph independently in more than one region of the lake, and that pairs of morphs at different sites may represent different stages early in the process of ecological speciation. The findings of Takahashi et al. are important first steps towards understanding the evolution of these intriguing morphs, yet many questions remain unanswered about the mating system, gene flow, plasticity and selection. Despite these limitations, descriptive work like theirs takes on much significance in African cichlids due to forthcoming resources for comparative genomics.

  1. Real-time, portable genome sequencing for Ebola surveillance

    Science.gov (United States)

    Bore, Joseph Akoi; Koundouno, Raymond; Dudas, Gytis; Mikhail, Amy; Ouédraogo, Nobila; Afrough, Babak; Bah, Amadou; Baum, Jonathan HJ; Becker-Ziaja, Beate; Boettcher, Jan-Peter; Cabeza-Cabrerizo, Mar; Camino-Sanchez, Alvaro; Carter, Lisa L.; Doerrbecker, Juiliane; Enkirch, Theresa; Dorival, Isabel Graciela García; Hetzelt, Nicole; Hinzmann, Julia; Holm, Tobias; Kafetzopoulou, Liana Eleni; Koropogui, Michel; Kosgey, Abigail; Kuisma, Eeva; Logue, Christopher H; Mazzarelli, Antonio; Meisel, Sarah; Mertens, Marc; Michel, Janine; Ngabo, Didier; Nitzsche, Katja; Pallash, Elisa; Patrono, Livia Victoria; Portmann, Jasmine; Repits, Johanna Gabriella; Rickett, Natasha Yasmin; Sachse, Andrea; Singethan, Katrin; Vitoriano, Inês; Yemanaberhan, Rahel L; Zekeng, Elsa G; Trina, Racine; Bello, Alexander; Sall, Amadou Alpha; Faye, Ousmane; Faye, Oumar; Magassouba, N’Faly; Williams, Cecelia V.; Amburgey, Victoria; Winona, Linda; Davis, Emily; Gerlach, Jon; Washington, Franck; Monteil, Vanessa; Jourdain, Marine; Bererd, Marion; Camara, Alimou; Somlare, Hermann; Camara, Abdoulaye; Gerard, Marianne; Bado, Guillaume; Baillet, Bernard; Delaune, Déborah; Nebie, Koumpingnin Yacouba; Diarra, Abdoulaye; Savane, Yacouba; Pallawo, Raymond Bernard; Gutierrez, Giovanna Jaramillo; Milhano, Natacha; Roger, Isabelle; Williams, Christopher J; Yattara, Facinet; Lewandowski, Kuiama; Taylor, Jamie; Rachwal, Philip; Turner, Daniel; Pollakis, Georgios; Hiscox, Julian A.; Matthews, David A.; O’Shea, Matthew K.; Johnston, Andrew McD; Wilson, Duncan; Hutley, Emma; Smit, Erasmus; Di Caro, Antonino; Woelfel, Roman; Stoecker, Kilian; Fleischmann, Erna; Gabriel, Martin; Weller, Simon A.; Koivogui, Lamine; Diallo, Boubacar; Keita, Sakoba; Rambaut, Andrew; Formenty, Pierre; Gunther, Stephan; Carroll, Miles W.

    2016-01-01

    The Ebola virus disease (EVD) epidemic in West Africa is the largest on record, responsible for >28,599 cases and >11,299 deaths 1. Genome sequencing in viral outbreaks is desirable in order to characterize the infectious agent to determine its evolutionary rate, signatures of host adaptation, identification and monitoring of diagnostic targets and responses to vaccines and treatments. The Ebola virus genome (EBOV) substitution rate in the Makona strain has been estimated at between 0.87 × 10−3 to 1.42 × 10−3 mutations per site per year. This is equivalent to 16 to 27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic 2-7. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought-after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions 8. Genomic surveillance during the epidemic has been sporadic due to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities 9. In order to address this problem, we devised a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. Here we present sequence data and analysis of 142 Ebola virus (EBOV) samples collected during the period March to October 2015. We were able to generate results in less than 24 hours after receiving an Ebola positive sample, with the sequencing process taking as little as 15-60 minutes. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks. PMID:26840485

  2. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  3. Draft Genome Sequence of Lactobacillus rhamnosus 2166.

    OpenAIRE

    Karlyshev, Andrey V.; Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

    2014-01-01

    In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

  4. Value of a newly sequenced bacterial genome

    Institute of Scientific and Technical Information of China (English)

    Eudes; GV; Barbosa; Flavia; F; Aburjaile; Rommel; TJ; Ramos; Adriana; R; Carneiro; Yves; Le; Loir; Jan; Baumbach; Anderson; Miyoshi; Artur; Silva; Vasco; Azevedo

    2014-01-01

    Next-generation sequencing(NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft(partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the "scientific value" of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.

  5. Genome divergence during evolutionary diversification as revealed in replicate lake-stream stickleback population pairs.

    Science.gov (United States)

    Roesti, Marius; Hendry, Andrew P; Salzburger, Walter; Berner, Daniel

    2012-06-01

    Evolutionary diversification is often initiated by adaptive divergence between populations occupying ecologically distinct environments while still exchanging genes. The genetic foundations of this divergence process are largely unknown and are here explored through genome scans in multiple independent lake-stream population pairs of threespine stickleback. We find that across the pairs, overall genomic divergence is associated with the magnitude of divergence in phenotypes known to be under divergent selection. Along this same axis of increasing diversification, genomic divergence becomes increasingly biased towards the centre of chromosomes as opposed to the peripheries. We explain this pattern by within-chromosome variation in the physical extent of hitchhiking, as recombination is greatly reduced in chromosome centres. Correcting for this effect suggests that a great number of genes distributed widely across the genome are involved in the divergence into lake vs. stream habitats. Analyzing additional allopatric population pairs, however, reveals that strong divergence in some genomic regions has been driven by selection unrelated to lake-stream ecology. Our study highlights a major contribution of large-scale variation in recombination rate to generating heterogeneous genomic divergence and indicates that elucidating the genetic basis of adaptive divergence might be more challenging than currently recognized.

  6. Microbial genomics: from sequence to function.

    OpenAIRE

    Schwartz, I

    2000-01-01

    The era of genomics (the study of genes and their function) began a scant dozen years ago with a suggestion by James Watson that the complete DNA sequence of the human genome be determined. Since that time, the human genome project has attracted a great deal of attention in the scientific world and the general media; the scope of the sequencing effort, and the extraordinary value that it will provide, has served to mask the enormous progress in sequencing other genomes. Microbial genome seque...

  7. Conservation and divergence of ADAM family proteins in the Xenopus genome

    Directory of Open Access Journals (Sweden)

    Shah Anoop

    2010-07-01

    Full Text Available Abstract Background Members of the disintegrin metalloproteinase (ADAM family play important roles in cellular and developmental processes through their functions as proteases and/or binding partners for other proteins. The amphibian Xenopus has long been used as a model for early vertebrate development, but genome-wide analyses for large gene families were not possible until the recent completion of the X. tropicalis genome sequence and the availability of large scale expression sequence tag (EST databases. In this study we carried out a systematic analysis of the X. tropicalis genome and uncovered several interesting features of ADAM genes in this species. Results Based on the X. tropicalis genome sequence and EST databases, we identified Xenopus orthologues of mammalian ADAMs and obtained full-length cDNA clones for these genes. The deduced protein sequences, synteny and exon-intron boundaries are conserved between most human and X. tropicalis orthologues. The alternative splicing patterns of certain Xenopus ADAM genes, such as adams 22 and 28, are similar to those of their mammalian orthologues. However, we were unable to identify an orthologue for ADAM7 or 8. The Xenopus orthologue of ADAM15, an active metalloproteinase in mammals, does not contain the conserved zinc-binding motif and is hence considered proteolytically inactive. We also found evidence for gain of ADAM genes in Xenopus as compared to other species. There is a homologue of ADAM10 in Xenopus that is missing in most mammals. Furthermore, a single scaffold of X. tropicalis genome contains four genes encoding ADAM28 homologues, suggesting genome duplication in this region. Conclusions Our genome-wide analysis of ADAM genes in X. tropicalis revealed both conservation and evolutionary divergence of these genes in this amphibian species. On the one hand, all ADAMs implicated in normal development and health in other species are conserved in X. tropicalis. On the other hand, some

  8. The mitochondrial genome of Indonesian coelacanth Latimeria menadoensis (Sarcopterygii: Coelacanthiformes) and divergence time estimation between the two coelacanths.

    Science.gov (United States)

    Inoue, Jun G; Miya, Masaki; Venkatesh, Byrappa; Nishida, Mutsumi

    2005-04-11

    We determined the whole mitochondrial genome sequence for Indonesian coelacanth Latimeria menadoensis. The genome content and organization were identical to that of typical vertebrates including Comoran coelacanth, Latimeria chalumnae. The overall nucleotide differences between the two species (excluding the control region) was 4.28%. The divergence time between the two species was estimated using whole mitochondrial genome data from the two coelacanths and 26 actinopterygians that represent major actinopterygian lineages plus an outgroup. Partitioned Bayesian analyses were conducted with the two data sets that comprised concatenated amino acid sequences from 12 protein-coding genes (excluding ND6 gene) and concatenated nucleotide sequences from 12 protein-coding genes (without 3rd codon positions), 22 transfer RNA genes, and two ribosomal RNA genes. The molecular clock analysis was also conducted with the concatenated amino acid sequences from the 12 protein-coding genes after removing faster or more slowly evolving sequences. Using the sarcopterygian-actinopterygian split as a calibration point (450 Mya), divergence time estimation between L. menadoensis and L. chalumnae fell in the range of 40-30 Mya, which is much older than those of the previous studies (coelacanth habitat disjunction that allowed populations on either side of India to diverge.

  9. Complete Mitochondrial Genome Sequence of the Eastern Gorilla (Gorilla beringei) and Implications for African Ape Biogeography

    OpenAIRE

    Das, Ranajit; Hergenrother, Scott D.; Soto-Calderón, Iván D.; Dew, J. Larry; Anthony, Nicola M; Jensen-Seaman, Michael I

    2014-01-01

    The Western and Eastern species of gorillas (Gorilla gorilla and Gorilla beringei) began diverging in the mid-Pleistocene, but in a complex pattern with ongoing gene flow following their initial split. We sequenced the complete mitochondrial genomes of 1 Eastern and 1 Western gorilla to provide the most accurate date for their mitochondrial divergence, and to analyze patterns of nucleotide substitutions. The most recent common ancestor of these genomes existed about 1.9 million years ago, sli...

  10. Sequence of inequalities among fuzzy mean difference divergence measures and their applications.

    Science.gov (United States)

    Tomar, Vijay Prakash; Ohlan, Anshu

    2014-01-01

    This paper presents a sequence of fuzzy mean difference divergence measures. The validity of these fuzzy mean difference divergence measures is proved axiomatically. In addition, it introduces a sequence of inequalities among some of these fuzzy mean difference divergence measures. The applications of proposed fuzzy mean difference divergence measures in the context of pattern recognition have been presented using a numerical example. It is shown that the proposed fuzzy mean difference divergence measures are well suited to use with linguistic variables. Finally, on establishing inequalities, we find that our proposed measures are computationally much more efficient.

  11. Genome Sequence of Lactobacillus rhamnosus ATCC 8530

    OpenAIRE

    Pittet, Vanessa; Ewen, Emily; Bushell, Barry R.; Ziola, Barry

    2012-01-01

    Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.

  12. Maize genome sequencing by methylation filtration.

    Science.gov (United States)

    Palmer, Lance E; Rabinowicz, Pablo D; O'Shaughnessy, Andrew L; Balija, Vivekanand S; Nascimento, Lidia U; Dike, Sujit; de la Bastide, Melissa; Martienssen, Robert A; McCombie, W Richard

    2003-12-19

    Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.

  13. Human Genome Sequencing in Health and Disease

    Science.gov (United States)

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  14. Streamlining and core genome conservation among highly divergent members of the SAR11 clade.

    Science.gov (United States)

    Grote, Jana; Thrash, J Cameron; Huggett, Megan J; Landry, Zachary C; Carini, Paul; Giovannoni, Stephen J; Rappé, Michael S

    2012-01-01

    SAR11 is an ancient and diverse clade of heterotrophic bacteria that are abundant throughout the world's oceans, where they play a major role in the ocean carbon cycle. Correlations between the phylogenetic branching order and spatiotemporal patterns in cell distributions from planktonic ocean environments indicate that SAR11 has evolved into perhaps a dozen or more specialized ecotypes that span evolutionary distances equivalent to a bacterial order. We isolated and sequenced genomes from diverse SAR11 cultures that represent three major lineages and encompass the full breadth of the clade. The new data expand observations about genome evolution and gene content that previously had been restricted to the SAR11 Ia subclade, providing a much broader perspective on the clade's origins, evolution, and ecology. We found small genomes throughout the clade and a very high proportion of core genome genes (48 to 56%), indicating that small genome size is probably an ancestral characteristic. In their level of core genome conservation, the members of SAR11 are outliers, the most conserved free-living bacteria known. Shared features of the clade include low GC content, high gene synteny, a large hypervariable region bounded by rRNA genes, and low numbers of paralogs. Variation among the genomes included genes for phosphorus metabolism, glycolysis, and C1 metabolism, suggesting that adaptive specialization in nutrient resource utilization is important to niche partitioning and ecotype divergence within the clade. These data provide support for the conclusion that streamlining selection for efficient cell replication in the planktonic habitat has occurred throughout the evolution and diversification of this clade. IMPORTANCE The SAR11 clade is the most abundant group of marine microorganisms worldwide, making them key players in the global carbon cycle. Growing knowledge about their biochemistry and metabolism is leading to a more mechanistic understanding of organic carbon

  15. Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals

    DEFF Research Database (Denmark)

    Hellmann, Ines; Mang, Yuan; Gu, Zhiping

    2008-01-01

    for individual reads. Applying this method to data from the Celera human genome sequencing and SNP discovery project, we obtain estimates of nucleotide diversity in windows spanning the human genome and show that the diversity to divergence ratio is reduced in regions of low recombination. Furthermore, we show...

  16. New complete genome sequences of human rhinoviruses shed light on their phylogeny and genomic features

    Directory of Open Access Journals (Sweden)

    Zdobnov Evgeny M

    2007-07-01

    Full Text Available Abstract Background Human rhinoviruses (HRV, the most frequent cause of respiratory infections, include 99 different serotypes segregating into two species, A and B. Rhinoviruses share extensive genomic sequence similarity with enteroviruses and both are part of the picornavirus family. Nevertheless they differ significantly at the phenotypic level. The lack of HRV full-length genome sequences and the absence of analysis comparing picornaviruses at the whole genome level limit our knowledge of the genomic features supporting these differences. Results Here we report complete genome sequences of 12 HRV-A and HRV-B serotypes, more than doubling the current number of available HRV sequences. The whole-genome maximum-likelihood phylogenetic analysis suggests that HRV-B and human enteroviruses (HEV diverged from the last common ancestor after their separation from HRV-A. On the other hand, compared to HEV, HRV-B are more related to HRV-A in the capsid and 3B-C regions. We also identified the presence of a 2C cis-acting replication element (cre in HRV-B that is not present in HRV-A, and that had been previously characterized only in HEV. In contrast to HEV viruses, HRV-A and HRV-B share also markedly lower GC content along the whole genome length. Conclusion Our findings provide basis to speculate about both the biological similarities and the differences (e.g. tissue tropism, temperature adaptation or acid lability of these three groups of viruses.

  17. The genome sequence of parrot bornavirus 5.

    Science.gov (United States)

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus.

  18. Genomic sequencing of Pleistocene cave bears

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  19. Heterogeneous genome divergence, differential introgression, and the origin and structure of hybrid zones.

    Science.gov (United States)

    Harrison, Richard G; Larson, Erica L

    2016-06-01

    Hybrid zones have been promoted as windows on the evolutionary process and as laboratories for studying divergence and speciation. Patterns of divergence between hybridizing species can now be characterized on a genomewide scale, and recent genome scans have focused on the presence of 'islands' of divergence. Patterns of heterogeneous genomic divergence may reflect differential introgression following secondary contact and provide insights into which genome regions contribute to local adaptation, hybrid unfitness and positive assortative mating. However, heterogeneous genome divergence can also arise in the absence of any gene flow, as a result of variation in selection and recombination across the genome. We suggest that to understand hybrid zone origins and dynamics, it is essential to distinguish between genome regions that are divergent between pure parental populations and regions that show restricted introgression where these populations interact in hybrid zones. The latter, more so than the former, reveal the likely genetic architecture of reproductive isolation. Mosaic hybrid zones, because of their complex structure and multiple contacts, are particularly good subjects for distinguishing primary intergradation from secondary contact. Comparisons among independent hybrid zones or transects that involve the 'same' species pair can also help to distinguish between divergence with gene flow and secondary contact. However, data from replicate hybrid zones or replicate transects do not reveal consistent patterns; in a few cases, patterns of introgression are similar across independent transects, but for many taxa, there is distinct lack of concordance, presumably due to variation in environmental context and/or variation in the genetics of the interacting populations.

  20. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    Science.gov (United States)

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  1. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence.

    Directory of Open Access Journals (Sweden)

    Kacy L Gordon

    2015-05-01

    Full Text Available Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2 from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements.

  2. "Islands of Divergence" in the Atlantic Cod Genome Represent Polymorphic Chromosomal Rearrangements.

    Science.gov (United States)

    Sodeland, Marte; Jorde, Per Erik; Lien, Sigbjørn; Jentoft, Sissel; Berg, Paul R; Grove, Harald; Kent, Matthew P; Arnyasi, Mariann; Olsen, Esben Moland; Knutsen, Halvor

    2016-04-11

    In several species genetic differentiation across environmental gradients or between geographically separate populations has been reported to center at "genomic islands of divergence," resulting in heterogeneous differentiation patterns across genomes. Here, genomic regions of elevated divergence were observed on three chromosomes of the highly mobile fish Atlantic cod (Gadus morhua) within geographically fine-scaled coastal areas. The "genomic islands" extended at least 5, 9.5, and 13 megabases on linkage groups 2, 7, and 12, respectively, and coincided with large blocks of linkage disequilibrium. For each of these three chromosomes, pairs of segregating, highly divergent alleles were identified, with little or no gene exchange between them. These patterns of recombination and divergence mirror genomic signatures previously described for large polymorphic inversions, which have been shown to repress recombination across extensive chromosomal segments. The lack of genetic exchange permits divergence between noninverted and inverted chromosomes in spite of gene flow. For the rearrangements on linkage groups 2 and 12, allelic frequency shifts between coastal and oceanic environments suggest a role in ecological adaptation, in agreement with recently reported associations between molecular variation within these genomic regions and temperature, oxygen, and salinity levels. Elevated genetic differentiation in these genomic regions has previously been described on both sides of the Atlantic Ocean, and we therefore suggest that these polymorphisms are involved in adaptive divergence across the species distributional range. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  3. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  4. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics.

    Directory of Open Access Journals (Sweden)

    Lincoln D Stein

    2003-11-01

    Full Text Available The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp and C. elegans (100.3 Mbp genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C

  5. Cucumis melo endornavirus: Genome organization, host range and co-divergence with the host.

    Science.gov (United States)

    Sabanadzovic, Sead; Wintermantel, William M; Valverde, Rodrigo A; McCreight, James D; Aboughanem-Sabanadzovic, Nina

    2016-03-02

    A high molecular weight dsRNA was isolated from a Cucumis melo L. plant (referred to as 'CL01') of an unknown cultivar and completely sequenced. Sequence analyses showed that dsRNA is associated with an endornavirus for which a name Cucumis melo endornavirus (CmEV) is proposed. The genome of CmEV-CL01 consists of 15,078 nt, contains a single, 4939 codons-long ORF and terminates with a stretch of 10 cytosine residues. Comparisons of the putative CmEV-encoded polyprotein with available references in protein databases revealed a unique genome organization characterized by the presence of the following domains: viral helicase Superfamily 1 (Hel-1), three glucosyltransferases (doublet of putative capsular polysaccharide synthesis proteins and a putative C_28_Glycosyltransferase), and an RNA-dependent RNA polymerase (RdRp). The presence of three glycome-related domains of different origin makes the genome organization of CmEV unique among endornaviruses. Phylogenetic analyses of viral RdRp domains showed that CmEV belongs to a specific lineage within the family Endornaviridae made exclusively of plant-infecting endornaviruses. An RT-PCR based survey demonstrated high incidence of CmEV among melon germplasm accession (>87% of tested samples). Analyses of partial genome sequences of CmEV isolates from 26 different melon genotypes suggest fine-tuned virus adaptation and co-divergence with the host. Finally, results of the present study revealed that CmEV is present in plants belonging to three different genera in the family Cucurbitaceae. Such diverse host range is unreported for known endornaviruses and suggests a long history of CmEV association with cucurbits predating their speciation.

  6. Genome Sequence of the Yeast Clavispora lusitaniae Type Strain CBS 6936.

    Science.gov (United States)

    Durrens, Pascal; Klopp, Christophe; Biteau, Nicolas; Fitton-Ouhabi, Valérie; Dementhon, Karine; Accoceberry, Isabelle; Sherman, David J; Noël, Thierry

    2017-08-03

    Clavispora lusitaniae, an environmental saprophytic yeast belonging to the CTG clade of Candida, can behave occasionally as an opportunistic pathogen in humans. We report here the genome sequence of the type strain CBS 6936. Comparison with sequences of strain ATCC 42720 indicates conservation of chromosomal structure but significant nucleotide divergence. Copyright © 2017 Durrens et al.

  7. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Directory of Open Access Journals (Sweden)

    Francesca Bertolini

    Full Text Available Few studies investigated the donkey (Equus asinus at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca. The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing and Ion Torrent (RRL runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  8. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    Science.gov (United States)

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  9. Detecting functional divergence after gene duplication through evolutionary changes in posttranslational regulatory sequences.

    Science.gov (United States)

    Nguyen Ba, Alex N; Strome, Bob; Hua, Jun Jie; Desmond, Jonathan; Gagnon-Arsenault, Isabelle; Weiss, Eric L; Landry, Christian R; Moses, Alan M

    2014-12-01

    Gene duplication is an important evolutionary mechanism that can result in functional divergence in paralogs due to neo-functionalization or sub-functionalization. Consistent with functional divergence after gene duplication, recent studies have shown accelerated evolution in retained paralogs. However, little is known in general about the impact of this accelerated evolution on the molecular functions of retained paralogs. For example, do new functions typically involve changes in enzymatic activities, or changes in protein regulation? Here we study the evolution of posttranslational regulation by examining the evolution of important regulatory sequences (short linear motifs) in retained duplicates created by the whole-genome duplication in budding yeast. To do so, we identified short linear motifs whose evolutionary constraint has relaxed after gene duplication with a likelihood-ratio test that can account for heterogeneity in the evolutionary process by using a non-central chi-squared null distribution. We find that short linear motifs are more likely to show changes in evolutionary constraints in retained duplicates compared to single-copy genes. We examine changes in constraints on known regulatory sequences and show that for the Rck1/Rck2, Fkh1/Fkh2, Ace2/Swi5 paralogs, they are associated with previously characterized differences in posttranslational regulation. Finally, we experimentally confirm our prediction that for the Ace2/Swi5 paralogs, Cbk1 regulated localization was lost along the lineage leading to SWI5 after gene duplication. Our analysis suggests that changes in posttranslational regulation mediated by short regulatory motifs systematically contribute to functional divergence after gene duplication.

  10. Microbial species delineation using whole genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  11. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls...

  12. Genomic prediction using QTL derived from whole genome sequence data

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    This study investigated the gain in accuracy of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k SNP data. Analyses were performed for Nordic Holstein and Danish Jersey animals, using eithe...

  13. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    DEFF Research Database (Denmark)

    Larsen, Mette Voldby; Cosentino, Salvatore; Rasmussen, Simon

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS...... the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56...... MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types...

  14. The characterization of twenty sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Kimberly Pelak

    2010-09-01

    Full Text Available We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.

  15. The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.

    Science.gov (United States)

    Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin

    2013-01-01

    Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.

  16. The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.

    Directory of Open Access Journals (Sweden)

    Jun Qian

    Full Text Available Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp and small (SSC, 17,555 bp single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp. It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.

  17. Genome sequence and analysis of Lactobacillus helveticus

    Directory of Open Access Journals (Sweden)

    Paola eCremonesi

    2013-01-01

    Full Text Available The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of L. helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract.As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones.

  18. Sequencing and comparing whole mitochondrial genomes ofanimals

    Energy Technology Data Exchange (ETDEWEB)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  19. Mechanistically Distinct Pathways of Divergent Regulatory DNA Creation Contribute to Evolution of Human-Specific Genomic Regulatory Networks Driving Phenotypic Divergence of Homo sapiens.

    Science.gov (United States)

    Glinsky, Gennadi V

    2016-09-19

    Thousands of candidate human-specific regulatory sequences (HSRS) have been identified, supporting the hypothesis that unique to human phenotypes result from human-specific alterations of genomic regulatory networks. Collectively, a compendium of multiple diverse families of HSRS that are functionally and structurally divergent from Great Apes could be defined as the backbone of human-specific genomic regulatory networks. Here, the conservation patterns analysis of 18,364 candidate HSRS was carried out requiring that 100% of bases must remap during the alignments of human, chimpanzee, and bonobo sequences. A total of 5,535 candidate HSRS were identified that are: (i) highly conserved in Great Apes; (ii) evolved by the exaptation of highly conserved ancestral DNA; (iii) defined by either the acceleration of mutation rates on the human lineage or the functional divergence from non-human primates. The exaptation of highly conserved ancestral DNA pathway seems mechanistically distinct from the evolution of regulatory DNA segments driven by the species-specific expansion of transposable elements. Genome-wide proximity placement analysis of HSRS revealed that a small fraction of topologically associating domains (TADs) contain more than half of HSRS from four distinct families. TADs that are enriched for HSRS and termed rapidly evolving in humans TADs (revTADs) comprise 0.8-10.3% of 3,127 TADs in the hESC genome. RevTADs manifest distinct correlation patterns between placements of human accelerated regions, human-specific transcription factor-binding sites, and recombination rates. There is a significant enrichment within revTAD boundaries of hESC-enhancers, primate-specific CTCF-binding sites, human-specific RNAPII-binding sites, hCONDELs, and H3K4me3 peaks with human-specific enrichment at TSS in prefrontal cortex neurons (P Homo sapiens is driven by the evolution of human-specific genomic regulatory networks via at least two mechanistically distinct pathways of

  20. Low-pass sequencing for microbial comparative genomics

    Directory of Open Access Journals (Sweden)

    Kennedy Sean

    2004-01-01

    Full Text Available Abstract Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1 the metabolically versatile Haloarcula marismortui; (2 the non-pigmented Natrialba asiatica; (3 the psychrophile Halorubrum lacusprofundi and (4 the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI for their predicted proteins. Multiple insertion sequence (IS elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP and transcription factor IIB (TFB homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1 high GC content and (2 low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the

  1. Complete Genome Sequences of Nine Enterovirus D68 Strains from Patients of the Lower Hudson Valley, New York, 2016

    Science.gov (United States)

    Huang, Weihua; Yin, Changhong; Zhuge, Jian; Farooq, Taliya; Yoon, Esther C.; Nolan, Sheila M.; Chen, Donald; Fallon, John T.

    2016-01-01

    Complete genome sequences of nine enterovirus D68 (EV-D68) strains from patients in New York were obtained in 2016 by metagenomic next-generation sequencing. Comparative genomic analysis suggests that a new subclade B3, with ~4.5% nucleotide divergence from subclade B1 strains causing the 2014 outbreak, is circulating in the United States in 2016. PMID:27979945

  2. Complete genome sequence of arracacha mottle virus.

    Science.gov (United States)

    Orílio, Anelise F; Lucinda, Natalia; Dusi, André N; Nagata, Tatsuya; Inoue-Nagata, Alice K

    2013-01-01

    Arracacha mottle virus (AMoV) is the only potyvirus reported to infect arracacha (Arracacia xanthorrhiza) in Brazil. Here, the complete genome sequence of an isolate of AMoV was determined to be 9,630 nucleotides in length, excluding the 3' poly-A tail, and encoding a polyprotein of 3,135 amino acids and a putative P3N-PIPO protein. Its genomic organization is typical of a member of the genus Potyvirus, containing all conserved motifs. Its full genome sequence shared 56.2 % nucleotide identity with sunflower chlorotic mottle virus and verbena virus Y, the most closely related viruses.

  3. Simple sequence repeats in mycobacterial genomes

    Indian Academy of Sciences (India)

    Vattipally B Sreenu; Pankaj Kumar; Javaregowda Nagaraju; Hampapathalu A Nagarajaram

    2007-01-01

    Simple sequence repeats (SSRs) or microsatellites are the repetitive nucleotide sequences of motifs of length 1–6 bp. They are scattered throughout the genomes of all the known organisms ranging from viruses to eukaryotes. Microsatellites undergo mutations in the form of insertions and deletions (INDELS) of their repeat units with some bias towards insertions that lead to microsatellite tract expansion. Although prokaryotic genomes derive some plasticity due to microsatellite mutations they have in-built mechanisms to arrest undue expansions of microsatellites and one such mechanism is constituted by post-replicative DNA repair enzymes MutL, MutH and MutS. The mycobacterial genomes lack these enzymes and as a null hypothesis one could expect these genomes to harbour many long tracts. It is therefore interesting to analyse the mycobacterial genomes for distribution and abundance of microsatellites tracts and to look for potentially polymorphic microsatellites. Available mycobacterial genomes, Mycobacterium avium, M. leprae, M. bovis and the two strains of M. tuberculosis (CDC1551 and H37Rv) were analysed for frequencies and abundance of SSRs. Our analysis revealed that the SSRs are distributed throughout the mycobacterial genomes at an average of 220–230 SSR tracts per kb. All the mycobacterial genomes contain few regions that are conspicuously denser or poorer in microsatellites compared to their expected genome averages. The genomes distinctly show scarcity of long microsatellites despite the absence of a post-replicative DNA repair system. Such severe scarcity of long microsatellites could arise as a result of strong selection pressures operating against long and unstable sequences although influence of GC-content and role of point mutations in arresting microsatellite expansions can not be ruled out. Nonetheless, the long tracts occasionally found in coding as well as non-coding regions may account for limited genome plasticity in these genomes.

  4. Local synteny and codon usage contribute to asymmetric sequence divergence of Saccharomyces cerevisiae gene duplicates

    Directory of Open Access Journals (Sweden)

    Bergthorsson Ulfar

    2011-09-01

    Full Text Available Abstract Background Duplicated genes frequently experience asymmetric rates of sequence evolution. Relaxed selective constraints and positive selection have both been invoked to explain the observation that one paralog within a gene-duplicate pair exhibits an accelerated rate of sequence evolution. In the majority of studies where asymmetric divergence has been established, there is no indication as to which gene copy, ancestral or derived, is evolving more rapidly. In this study we investigated the effect of local synteny (gene-neighborhood conservation and codon usage on the sequence evolution of gene duplicates in the S. cerevisiae genome. We further distinguish the gene duplicates into those that originated from a whole-genome duplication (WGD event (ohnologs versus small-scale duplications (SSD to determine if there exist any differences in their patterns of sequence evolution. Results For SSD pairs, the derived copy evolves faster than the ancestral copy. However, there is no relationship between rate asymmetry and synteny conservation (ancestral-like versus derived-like in ohnologs. mRNA abundance and optimal codon usage as measured by the CAI is lower in the derived SSD copies relative to ancestral paralogs. Moreover, in the case of ohnologs, the faster-evolving copy has lower CAI and lowered expression. Conclusions Together, these results suggest that relaxation of selection for codon usage and gene expression contribute to rate asymmetry in the evolution of duplicated genes and that in SSD pairs, the relaxation of selection stems from the loss of ancestral regulatory information in the derived copy.

  5. Sequencing the Cotton Genomes-Gossypium spp.

    Institute of Scientific and Technical Information of China (English)

    PATERSON Andrew H

    2008-01-01

    @@ The genomes of most major crops,including cotton,will be fully sequenced in the next fewyears.Cotton is unusual,although not unique,in that we will need to sequence not only cultivated(tetraploid) genotypes but their diploid progenitors,to understand how elite cottons have surpassedthe productivity and quality of their progenitors.

  6. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    Science.gov (United States)

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  7. Comparative analysis of wolbachia genomes reveals streamlining and divergence of minimalist two-component systems.

    Science.gov (United States)

    Christensen, Steen; Serbus, Laura Renee

    2015-03-24

    Two-component regulatory systems are commonly used by bacteria to coordinate intracellular responses with environmental cues. These systems are composed of functional protein pairs consisting of a sensor histidine kinase and cognate response regulator. In contrast to the well-studied Caulobacter crescentus system, which carries dozens of these pairs, the streamlined bacterial endosymbiont Wolbachia pipientis encodes only two pairs: CckA/CtrA and PleC/PleD. Here, we used bioinformatic tools to compare characterized two-component system relays from C. crescentus, the related Anaplasmataceae species Anaplasma phagocytophilum and Ehrlichia chaffeensis, and 12 sequenced Wolbachia strains. We found the core protein pairs and a subset of interacting partners to be highly conserved within Wolbachia and these other Anaplasmataceae. Genes involved in two-component signaling were positioned differently within the various Wolbachia genomes, whereas the local context of each gene was conserved. Unlike Anaplasma and Ehrlichia, Wolbachia two-component genes were more consistently found clustered with metabolic genes. The domain architecture and key functional residues standard for two-component system proteins were well-conserved in Wolbachia, although residues that specify cognate pairing diverged substantially from other Anaplasmataceae. These findings indicate that Wolbachia two-component signaling pairs share considerable functional overlap with other α-proteobacterial systems, whereas their divergence suggests the potential for regulatory differences and cross-talk.

  8. Analysis of a native whitefly transcriptome and its sequence divergence with two invasive whitefly species

    National Research Council Canada - National Science Library

    Wang, Xiao-Wei; Zhao, Qiong-Yi; Luan, Jun-Bo; Wang, Yu-Jun; Yan, Gen-Hong; Liu, Shu-Sheng

    2012-01-01

    .... In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies...

  9. Using multi-locus allelic sequence data to estimate genetic divergence among four Lilium (Liliaceae) cultivars

    NARCIS (Netherlands)

    Shahin, A.; Smulders, M.J.M.; Tuyl, van J.M.; Arens, P.F.P.; Bakker, F.T.

    2014-01-01

    Next Generation Sequencing (NGS) may enable estimating relationships among genotypes using allelic variation of multiple nuclear genes simultaneously. We explored the potential and caveats of this strategy in four genetically distant Lilium cultivars to estimate their genetic divergence from transcr

  10. Divergence in cis-regulatory sequences surrounding the opsin gene arrays of African cichlid fishes

    National Research Council Canada - National Science Library

    O'Quin, Kelly E; Smith, Daniel; Naseer, Zan; Schulte, Jane; Engel, Samuel D; Loh, Yong-Hwee E; Streelman, J Todd; Boore, Jeffrey L; Carleton, Karen L

    2011-01-01

    .... We use phylogenetic footprinting and shadowing to examine divergence in conserved non-coding elements, promoter sequences, and 3'-UTRs surrounding each opsin in search of candidate cis-regulatory...

  11. Genome Sequence of the Palaeopolyploid soybean

    Energy Technology Data Exchange (ETDEWEB)

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  12. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Science.gov (United States)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  13. Next-generation sequencing reveals recent horizontal transfer of a DNA transposon between divergent mosquitoes.

    Directory of Open Access Journals (Sweden)

    Yupu Diao

    Full Text Available Horizontal transfer of genetic material between complex organisms often involves transposable elements (TEs. For example, a DNA transposon mariner has been shown to undergo horizontal transfer between different orders of insects and between different phyla of animals. Here we report the discovery and characterization of an ITmD37D transposon, MJ1, in Anopheles sinensis. We show that some MJ1 elements in Aedes aegypti and An. sinensis contain intact open reading frames and share nearly 99% nucleotide identity over the entire transposon, which is unexpectedly high given that these two genera had diverged 145-200 million years ago. Chromosomal hybridization and TE-display showed that MJ1 copy number is low in An. sinensis. Among 24 mosquito species surveyed, MJ1 is only found in Ae. aegypti and the hyrcanus group of anopheline mosquitoes to which An. sinensis belongs. Phylogenetic analysis is consistent with horizontal transfer and provides the basis for inference of its timing and direction. Although report of horizontal transfer of DNA transposons between higher eukaryotes is accumulating, our analysis is one of a small number of cases in which horizontal transfer of nearly identical TEs among highly divergent species has been thoroughly investigated and strongly supported. Horizontal transfer involving mosquitoes is of particular interest because there are ongoing investigations of the possibility of spreading pathogen-resistant genes into mosquito populations to control malaria and other infectious diseases. The initial indication of horizontal transfer of MJ1 came from comparisons between a 0.4x coverage An. sinensis 454 sequence database and available TEs in mosquito genomes. Therefore we have shown that it is feasible to use low coverage sequencing to systematically uncover horizontal transfer events. Expanding such efforts across a wide range of species will generate novel insights into the relative frequency of horizontal transfer of

  14. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    Science.gov (United States)

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  15. Genome-wide effects of long-term divergent selection.

    Directory of Open Access Journals (Sweden)

    Anna M Johansson

    2010-11-01

    Full Text Available To understand the genetic mechanisms leading to phenotypic differentiation, it is important to identify genomic regions under selection. We scanned the genome of two chicken lines from a single trait selection experiment, where 50 generations of selection have resulted in a 9-fold difference in body weight. Analyses of nearly 60,000 SNP markers showed that the effects of selection on the genome are dramatic. The lines were fixed for alternative alleles in more than 50 regions as a result of selection. Another 10 regions displayed strong evidence for ongoing differentiation during the last 10 generations. Many more regions across the genome showed large differences in allele frequency between the lines, indicating that the phenotypic evolution in the lines in 50 generations is the result of an exploitation of standing genetic variation at 100s of loci across the genome.

  16. Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication.

    Science.gov (United States)

    Amores, Angel; Catchen, Julian; Ferrara, Allyse; Fontenot, Quenton; Postlethwait, John H

    2011-08-01

    Genomic resources for hundreds of species of evolutionary, agricultural, economic, and medical importance are unavailable due to the expense of well-assembled genome sequences and difficulties with multigenerational studies. Teleost fish provide many models for human disease but possess anciently duplicated genomes that sometimes obfuscate connectivity. Genomic information representing a fish lineage that diverged before the teleost genome duplication (TGD) would provide an outgroup for exploring the mechanisms of evolution after whole-genome duplication. We exploited massively parallel DNA sequencing to develop meiotic maps with thrift and speed by genotyping F(1) offspring of a single female and a single male spotted gar (Lepisosteus oculatus) collected directly from nature utilizing only polymorphisms existing in these two wild individuals. Using Stacks, software that automates the calling of genotypes from polymorphisms assayed by Illumina sequencing, we constructed a map containing 8406 markers. RNA-seq on two map-cross larvae provided a reference transcriptome that identified nearly 1000 mapped protein-coding markers and allowed genome-wide analysis of conserved synteny. Results showed that the gar lineage diverged from teleosts before the TGD and its genome is organized more similarly to that of humans than teleosts. Thus, spotted gar provides a critical link between medical models in teleost fish, to which gar is biologically similar, and humans, to which gar is genomically similar. Application of our F(1) dense mapping strategy to species with no prior genome information promises to facilitate comparative genomics and provide a scaffold for ordering the numerous contigs arising from next generation genome sequencing.

  17. Genomic and polyploid evolution in genus Avena as revealed by RFLPs of repeated DNA sequences.

    Science.gov (United States)

    Morikawa, Toshinobu; Nishihara, Miho

    2009-06-01

    Phylogenetic relationships and genome affinities were investigated by utilizing all the biological Avena species consisting of 11 diploid species (15 accessions), 8 tetraploid species (9 accessions) and 4 hexaploid species (5 accessions). Genomic DNA regions of As120a, avenin, and globulin were amplified by PCR. A total of 130 polymorphic fragments were detected out of 156 fragments generated by digesting the PCR-amplified fragments with 11 restriction enzymes. The number of fragments generated by PCR-amplification followed by digestion with restriction enzymes was almost the same as those among the three repeated DNA sequences. A high level of genetic distance was detected between A. damascena (Ad) and A. canariensis (Ac) genomes, which reflected their different morphology and reproductive isolation. The A. longiglumis (Al) and A. prostrata (Ap) genomes were closely related to the As genome group. The AB genome species formed a cluster with the AsAs genome artificial autotetraploid and the As genome diploids indicating near-autotetraploid origin. The A. macrostachya is an outbreeding autotetraploid closely related with the C genome diploid and the AC genome tetraploid species. The differences of genetic distances estimated from the repeated DNA sequence divergence among the Avena species were consistent with genome divergences and it was possible to compare the genetic intra- and inter-ploidy relationships produced by RFLPs. These results suggested that the PCR-mediated analysis of repeated DNA polymorphism can be used as a tool to examine genomic relationships of polyploidy species.

  18. The highest-copy repeats are methylated in the small genome of the early divergent vascular plant Selaginella moellendorffii

    Directory of Open Access Journals (Sweden)

    Quan Hui

    2008-06-01

    Full Text Available Abstract Background The lycophyte Selaginella moellendorffii is a vascular plant that diverged from the fern/seed plant lineage at least 400 million years ago. Although genomic information for S. moellendorffii is starting to be produced, little is known about basic aspects of its molecular biology. In order to provide the first glimpse to the epigenetic landscape of this early divergent vascular plant, we used the methylation filtration technique. Methylation filtration genomic libraries select unmethylated DNA clones due to the presence of the methylation-dependent restriction endonuclease McrBC in the bacterial host. Results We conducted a characterization of the DNA methylation patterns of the S. moellendorffii genome by sequencing a set of S. moellendorffii shotgun genomic clones, along with a set of methylation filtered clones. Chloroplast DNA, which is typically unmethylated, was enriched in the filtered library relative to the shotgun library, showing that there is DNA methylation in the extremely small S. moellendorffii genome. The filtered library also showed enrichment in expressed and gene-like sequences, while the highest-copy repeats were largely under-represented in this library. These results show that genes and repeats are differentially methylated in the S. moellendorffii genome, as occurs in other plants studied. Conclusion Our results shed light on the genome methylation pattern in a member of a relatively unexplored plant lineage. The DNA methylation data reported here will help understanding the involvement of this epigenetic mark in fundamental biological processes, as well as the evolutionary aspects of epigenetics in land plants.

  19. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

    Science.gov (United States)

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-07-20

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures.

  20. Sequencing and Analysis of a Genomic Fragment Provide an Insight into the Dunaliella viridis Genomic Sequence

    Institute of Scientific and Technical Information of China (English)

    Xiao-Ming SUN; Yuan-Ping TANG; Xiang-Zong MENG; Wen-Wen ZHANG; Shan LI; Zhi-Rui DENG; Zheng-Kai XU; Ren-Tao SONG

    2006-01-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)n type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features.

  1. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    Science.gov (United States)

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Sequencing of Australian wild rice genomes reveals ancestral relationships with domesticated rice.

    Science.gov (United States)

    Brozynska, Marta; Copetti, Dario; Furtado, Agnelo; Wing, Rod A; Crayn, Darren; Fox, Glen; Ishikawa, Ryuji; Henry, Robert J

    2016-11-27

    The related A genome species of the Oryza genus are the effective gene pool for rice. Here, we report draft genomes for two Australian wild A genome taxa: O. rufipogon-like population, referred to as Taxon A, and O. meridionalis-like population, referred to as Taxon B. These two taxa were sequenced and assembled by integration of short- and long-read next-generation sequencing (NGS) data to create a genomic platform for a wider rice gene pool. Here, we report that, despite the distinct chloroplast genome, the nuclear genome of the Australian Taxon A has a sequence that is much closer to that of domesticated rice (O. sativa) than to the other Australian wild populations. Analysis of 4643 genes in the A genome clade showed that the Australian annual, O. meridionalis, and related perennial taxa have the most divergent (around 3 million years) genome sequences relative to domesticated rice. A test for admixture showed possible introgression into the Australian Taxon A (diverged around 1.6 million years ago) especially from the wild indica/O. nivara clade in Asia. These results demonstrate that northern Australia may be the centre of diversity of the A genome Oryza and suggest the possibility that this might also be the centre of origin of this group and represent an important resource for rice improvement.

  3. A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species.

    Directory of Open Access Journals (Sweden)

    Thomas Mailund

    Full Text Available We present a hidden Markov model (HMM for inferring gradual isolation between two populations during speciation, modelled as a time interval with restricted gene flow. The HMM describes the history of adjacent nucleotides in two genomic sequences, such that the nucleotides can be separated by recombination, can migrate between populations, or can coalesce at variable time points, all dependent on the parameters of the model, which are the effective population sizes, splitting times, recombination rate, and migration rate. We show by extensive simulations that the HMM can accurately infer all parameters except the recombination rate, which is biased downwards. Inference is robust to variation in the mutation rate and the recombination rate over the sequence and also robust to unknown phase of genomes unless they are very closely related. We provide a test for whether divergence is gradual or instantaneous, and we apply the model to three key divergence processes in great apes: (a the bonobo and common chimpanzee, (b the eastern and western gorilla, and (c the Sumatran and Bornean orang-utan. We find that the bonobo and chimpanzee appear to have undergone a clear split, whereas the divergence processes of the gorilla and orang-utan species occurred over several hundred thousands years with gene flow stopping quite recently. We also apply the model to the Homo/Pan speciation event and find that the most likely scenario involves an extended period of gene flow during speciation.

  4. Genomic Sequence Variation Markup Language (GSVML).

    Science.gov (United States)

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as

  5. Sorghum genome sequencing by methylation filtration.

    Directory of Open Access Journals (Sweden)

    Joseph A Bedell

    2005-01-01

    Full Text Available Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  6. Sorghum genome sequencing by methylation filtration.

    Science.gov (United States)

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  7. Comparative genomics reveals surprising divergence of two closely related strains of uncultivated UCYN-A cyanobacteria.

    Science.gov (United States)

    Bombar, Deniz; Heller, Philip; Sanchez-Baracaldo, Patricia; Carter, Brandon J; Zehr, Jonathan P

    2014-12-01

    Marine planktonic cyanobacteria capable of fixing molecular nitrogen (termed 'diazotrophs') are key in biogeochemical cycling, and the nitrogen fixed is one of the major external sources of nitrogen to the open ocean. Candidatus Atelocyanobacterium thalassa (UCYN-A) is a diazotrophic cyanobacterium known for its widespread geographic distribution in tropical and subtropical oligotrophic oceans, unusually reduced genome and symbiosis with a single-celled prymnesiophyte alga. Recently a novel strain of this organism was also detected in coastal waters sampled from the Scripps Institute of Oceanography pier. We analyzed the metagenome of this UCYN-A2 population by concentrating cells by flow cytometry. Phylogenomic analysis provided strong bootstrap support for the monophyly of UCYN-A (here called UCYN-A1) and UCYN-A2 within the marine Crocosphaera sp. and Cyanothece sp. clade. UCYN-A2 shares 1159 of the 1200 UCYN-A1 protein-coding genes (96.6%) with high synteny, yet the average amino-acid sequence identity between these orthologs is only 86%. UCYN-A2 lacks the same major pathways and proteins that are absent in UCYN-A1, suggesting that both strains can be grouped at the same functional and ecological level. Our results suggest that UCYN-A1 and UCYN-A2 had a common ancestor and diverged after genome reduction. These two variants may reflect adaptation of the host to different niches, which could be coastal and open ocean habitats.

  8. Comparative genome sequencing of drosophila pseudoobscura: Chromosomal, gene and cis-element evolution

    Energy Technology Data Exchange (ETDEWEB)

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Todd, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catherine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenee; Verduzco, Daniel; Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.

    2004-04-01

    The genome sequence of a second fruit fly, D. pseudoobscura, presents an opportunity for comparative analysis of a primary model organism D. melanogaster. The vast majority of Drosophila genes have remained on the same arm, but within each arm gene order has been extensively reshuffled leading to the identification of approximately 1300 syntenic blocks. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 35 My since divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome wide average consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than control sequences between the species but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a picture of repeat mediated chromosomal rearrangement, and high co-adaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.

  9. Intrachromosomal recombination between highly diverged DNA sequences is enabled in human cells deficient in Bloom helicase.

    Science.gov (United States)

    Wang, Yibin; Li, Shen; Smith, Krissy; Waldman, Barbara Criscuolo; Waldman, Alan S

    2016-05-01

    Mutation of Bloom helicase (BLM) causes Bloom syndrome (BS), a rare human genetic disorder associated with genome instability, elevation of sister chromatid exchanges, and predisposition to cancer. Deficiency in BLM homologs in Drosophila and yeast brings about significantly increased rates of recombination between imperfectly matched sequences ("homeologous recombination," or HeR). To assess whether BLM deficiency provokes an increase in HeR in human cells, we transfected an HeR substrate into a BLM-null cell line derived from a BS patient. The substrate contained a thymidine kinase (tk)-neo fusion gene disrupted by the recognition site for endonuclease I-SceI, as well as a functional tk gene to serve as a potential recombination partner for the tk-neo gene. The two tk sequences on the substrate displayed 19% divergence. A double-strand break was introduced by expression of I-SceI and repair events were recovered by selection for G418-resistant clones. Among 181 events recovered, 30 were accomplished via HeR with the balance accomplished by nonhomologous end-joining. The frequency of HeR events in the BS cells was elevated significantly compared to that seen in normal human fibroblasts or in BS cells complemented for BLM expression. We conclude that BLM deficiency enables HeR in human cells.

  10. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    Science.gov (United States)

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  11. Genome duplication, subfunction partitioning, and lineage divergence: Sox9 in stickleback and zebrafish.

    Science.gov (United States)

    Cresko, William A; Yan, Yi-Lin; Baltrus, David A; Amores, Angel; Singer, Amy; Rodríguez-Marí, Adriana; Postlethwait, John H

    2003-11-01

    Teleosts are the most species-rich group of vertebrates, and a genome duplication (tetraploidization) event in ray-fin fish appears to have preceded this remarkable explosion of biodiversity. What is the relationship of the ray-fin genome duplication to the teleost radiation? Genome duplication may have facilitated lineage divergence by partitioning different ancestral gene subfunctions among co-orthologs of tetrapod genes in different teleost lineages. To test this hypothesis, we investigated gene expression patterns for Sox9 gene duplicates in stickleback and zebrafish, teleosts whose lineages diverged early in Euteleost evolution. Most expression domains appear to have been partitioned between Sox9a and Sox9b before the divergence of stickleback and zebrafish lineages, but some ancestral expression domains were distributed differentially in each lineage. We conclude that some gene subfunctions, as represented by lineage-specific expression domains, may have assorted differently in separate lineages and that these may have contributed to lineage diversification during teleost evolution.

  12. Divergence is focused on few genomic regions early in speciation: incipient speciation of sunflower ecotypes.

    Science.gov (United States)

    Andrew, Rose L; Rieseberg, Loren H

    2013-09-01

    Early in speciation, as populations undergo the transition from local adaptation to incipient species, is when a number of transient, but potentially important, processes appear to be most easily detected. These include signatures of selective sweeps that can point to asymmetry in selection between habitats, divergence hitchhiking, and associations of adaptive genes with environments. In a genomic comparison of ecotypes of the prairie sunflower, Helianthus petiolaris, occurring at Great Sand Dunes National Park and Preserve (Colorado), we found that selective sweeps were mainly restricted to the dune ecotype and that there was variation across the genome in whether proximity to the nondune population constrained or promoted divergence. The major regions of divergence were few and large between ecotypes, in contrast with an interspecific comparison between H. petiolaris and a sympatric congener, Helianthus annuus. In general, the large regions of divergence observed in the ecotypic comparison swamped locus-specific associations with environmental variables. In both comparisons, regions of high divergence occurred in portions of the genetic map with high marker density, probably reflecting regions of low recombination. The difference in genomic distributions of highly divergent regions between ecotypic and interspecific comparisons highlights the value of studies spanning the spectrum of speciation in related taxa.

  13. Cactus: Algorithms for genome multiple sequence alignment

    OpenAIRE

    Paten, Benedict; Earl, Dent; Nguyen, Ngan; Diekhans, Mark; Zerbino, Daniel; Haussler, David

    2011-01-01

    Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms...

  14. Mapping and Sequencing the Human Genome

    Science.gov (United States)

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  15. [Mapping and human genome sequence program].

    Science.gov (United States)

    Weissenbach, J

    1997-03-01

    Until recently, human genome programs focused primarily on establishing maps that would provide signposts to researchers seeking to identify genes responsible for inherited diseases, as well as a basis for genome sequencing studies. Preestablished gene mapping goals have been reached. The over 7,000 microsatellite markers identified to date provide a map of sufficient density to allow localization of the gene of a monogenic disease with a precision of 1 to 2 million base pairs. The physical map, based on systematically arranged overlapping sets of artificial yeast chromosomes (YACs), has also made considerable headway during the last few years. The most recently published map covers more than 90% of the genome. However, currently available physical maps cannot be used for sequencing studies because multiple rearrangements occur in YACs. The recently developed sets of radioinduced hybrids are extremely useful for incorporating genes into existing maps. A network of American and European laboratories has successfully used these radioinduced hybrids to map 15,000 gene tags from large-scale cDNA library sequencing programs. There are increasingly pressing reasons for initiating large scale human genome sequencing studies.

  16. Genome sequence of Lactobacillus farciminis KCTC 3681.

    Science.gov (United States)

    Nam, Seong-Hyeuk; Choi, Sang-Haeng; Kang, Aram; Kim, Dong-Wook; Kim, Ryong Nam; Kim, Aeri; Kim, Dae-Soo; Park, Hong-Seog

    2011-04-01

    Lactobacillus farciminis is one of the most prevalent lactic acid bacterial species present during the manufacturing process of kimchi, the best-known traditional Korean dish. Here, we present the draft genome sequence of the type strain Lactobacillus farciminis KCTC 3681 (2,498,309 bp, with a G+C content of 36.4%), which consists of 5 scaffolds.

  17. The diploid genome sequence of an Asian individual

    DEFF Research Database (Denmark)

    Wang, Jun; Wang, Wei; Li, Ruiqiang

    2008-01-01

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we...

  18. Efficient algorithm for video sequence matching using the Hausdorff distance and the directed divergence

    Science.gov (United States)

    Kim, Sang Hyun; Park, Rae-Hong

    2000-12-01

    To manipulate large video databases, effective video indexing and retrieval are required. While most algorithms for video retrieval can be used for frame-wise user query or video content query, video sequence matching has not been investigated much. In this paper, we propose an efficient algorithm to match the video sequences using the modified Hausdorff distance, and a video indexing method using the directed divergence of histograms between successive frames. To effectively match the video sequences and to reduce the computational complexity, we use the key frames extracted by the cumulative directed divergence, and compare the set of key frames, using the Hausdorff distance. Experimental results show that the proposed video sequence matching and video indexing algorithms using the Hausdorff distance and the directed divergence yield the remarkably high accuracy and performances compared with conventional algorithms such as histogram difference or histogram intersection methods.

  19. Sympatric speciation revealed by genome-wide divergence in the blind mole rat Spalax.

    Science.gov (United States)

    Li, Kexin; Hong, Wei; Jiao, Hengwu; Wang, Guo-Dong; Rodriguez, Karl A; Buffenstein, Rochelle; Zhao, Yang; Nevo, Eviatar; Zhao, Huabin

    2015-09-22

    Sympatric speciation (SS), i.e., speciation within a freely breeding population or in contiguous populations, was first proposed by Darwin [Darwin C (1859) On the Origins of Species by Means of Natural Selection] and is still controversial despite theoretical support [Gavrilets S (2004) Fitness Landscapes and the Origin of Species (MPB-41)] and mounting empirical evidence. Speciation of subterranean mammals generally, including the genus Spalax, was considered hitherto allopatric, whereby new species arise primarily through geographic isolation. Here we show in Spalax a case of genome-wide divergence analysis in mammals, demonstrating that SS in continuous populations, with gene flow, encompasses multiple widespread genomic adaptive complexes, associated with the sharply divergent ecologies. The two abutting soil populations of S. galili in northern Israel habituate the ancestral Senonian chalk population and abutting derivative Plio-Pleistocene basalt population. Population divergence originated ∼0.2-0.4 Mya based on both nuclear and mitochondrial genome analyses. Population structure analysis displayed two distinctly divergent clusters of chalk and basalt populations. Natural selection has acted on 300+ genes across the genome, diverging Spalax chalk and basalt soil populations. Gene ontology enrichment analysis highlights strong but differential soil population adaptive complexes: in basalt, sensory perception, musculature, metabolism, and energetics, and in chalk, nutrition and neurogenetics are outstanding. Population differentiation of chemoreceptor genes suggests intersoil population's mate and habitat choice substantiating SS. Importantly, distinctions in protein degradation may also contribute to SS. Natural selection and natural genetic engineering [Shapiro JA (2011) Evolution: A View From the 21st Century] overrule gene flow, evolving divergent ecological adaptive complexes. Sharp ecological divergences abound in nature; therefore, SS appears to be an

  20. Draft genome sequence of the dimorphic yeast Yarrowia lipolytica, strain W29

    Energy Technology Data Exchange (ETDEWEB)

    Pomraning, Kyle R.; Baker, Scott E.

    2015-11-25

    Here we present the draft genome sequence of the dimorphic ascomycete yeast Yarrowia lipolytica strain W29 (ATCC20460TM). Y. lipolytica is a commonly employed model for industrial production of lipases, small molecules, and more recently for its ability to accumulate lipids. It has also been used to study genome evolution in yeast and filamentous fungi due to its position as an early diverging branch of the subphylum Sacchromycotina.

  1. The mitochondrial genomes of Campodea fragilis and C. lubbocki(Hexapoda: Diplura): high genetic divergence in a morphologically uniformtaxon

    Energy Technology Data Exchange (ETDEWEB)

    Podsiadlowski, L.; Carapelli, A.; Nardi, F.; Dallai, R.; Koch,M.; Boore, J.L.; Frati, F.

    2005-12-01

    Mitochondrial genomes from two dipluran hexapods of the genus Campodea have been sequenced. Gene order is the same as in most other hexapods and crustaceans. Secondary structures of tRNAs reveal specific structural changes in tRNA-C, tRNA-R, tRNA-S1 and tRNA-S2. Comparative analyses of nucleotide and amino acid composition, as well as structural features of both ribosomal RNA subunits, reveal substantial differences among the analyzed taxa. Although the two Campodea species are morphologically highly uniform, genetic divergence is larger than expected, suggesting a long evolutionary history under stable ecological conditions.

  2. Genomic Analysis of Phylotype I Strain EP1 Reveals Substantial Divergence from Other Strains in the Ralstonia solanacearum Species Complex

    Science.gov (United States)

    Li, Peng; Wang, Dechen; Yan, Jinli; Zhou, Jianuan; Deng, Yinyue; Jiang, Zide; Cao, Bihao; He, Zifu; Zhang, Lianhui

    2016-01-01

    Ralstonia solanacearum species complex is a devastating group of phytopathogens with an unusually wide host range and broad geographical distribution. R. solanacearum isolates may differ considerably in various properties including host range and pathogenicity, but the underlying genetic bases remain vague. Here, we conducted the genome sequencing of strain EP1 isolated from Guangdong Province of China, which belongs to phylotype I and is highly virulent to a range of solanaceous crops. Its complete genome contains a 3.95-Mb chromosome and a 2.05-Mb mega-plasmid, which is considerably bigger than reported genomes of other R. solanacearum strains. Both the chromosome and the mega-plasmid have essential house-keeping genes and many virulence genes. Comparative analysis of strain EP1 with other 3 phylotype I and 3 phylotype II, III, IV strains unveiled substantial genome rearrangements, insertions and deletions. Genome sequences are relatively conserved among the 4 phylotype I strains, but more divergent among strains of different phylotypes. Moreover, the strains exhibited considerable variations in their key virulence genes, including those encoding secretion systems and type III effectors. Our results provide valuable information for further elucidation of the genetic basis of diversified virulences and host range of R. solanacearum species. PMID:27833603

  3. BSMAP: whole genome bisulfite sequence MAPping program

    Directory of Open Access Journals (Sweden)

    Li Wei

    2009-07-01

    Full Text Available Abstract Background Bisulfite sequencing is a powerful technique to study DNA cytosine methylation. Bisulfite treatment followed by PCR amplification specifically converts unmethylated cytosines to thymine. Coupled with next generation sequencing technology, it is able to detect the methylation status of every cytosine in the genome. However, mapping high-throughput bisulfite reads to the reference genome remains a great challenge due to the increased searching space, reduced complexity of bisulfite sequence, asymmetric cytosine to thymine alignments, and multiple CpG heterogeneous methylation. Results We developed an efficient bisulfite reads mapping algorithm BSMAP to address the above issues. BSMAP combines genome hashing and bitwise masking to achieve fast and accurate bisulfite mapping. Compared with existing bisulfite mapping approaches, BSMAP is faster, more sensitive and more flexible. Conclusion BSMAP is the first general-purpose bisulfite mapping software. It is able to map high-throughput bisulfite reads at whole genome level with feasible memory and CPU usage. It is freely available under GPL v3 license at http://code.google.com/p/bsmap/.

  4. Agaricus bisporus genome sequence: a commentary.

    Science.gov (United States)

    Kerrigan, Richard W; Challen, Michael P; Burton, Kerry S

    2013-06-01

    The genomes of two isolates of Agaricus bisporus have been sequenced recently. This soil-inhabiting fungus has a wide geographical distribution in nature and it is also cultivated in an industrialized indoor process ($4.7bn annual worldwide value) to produce edible mushrooms. Previously this lignocellulosic fungus has resisted precise econutritional classification, i.e. into white- or brown-rot decomposers. The generation of the genome sequence and transcriptomic analyses has revealed a new classification, 'humicolous', for species adapted to grow in humic-rich, partially decomposed leaf material. The Agaricus biporus genomes contain a collection of polysaccharide and lignin-degrading genes and more interestingly an expanded number of genes (relative to other lignocellulosic fungi) that enhance degradation of lignin derivatives, i.e. heme-thiolate peroxidases and β-etherases. A motif that is hypothesized to be a promoter element in the humicolous adaptation suite is present in a large number of genes specifically up-regulated when the mycelium is grown on humic-rich substrate. The genome sequence of A. bisporus offers a platform to explore fungal biology in carbon-rich soil environments and terrestrial cycling of carbon, nitrogen, phosphorus and potassium.

  5. Synaptotagmin gene content of the sequenced genomes

    Directory of Open Access Journals (Sweden)

    Craxton Molly

    2004-07-01

    Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their

  6. Improved genome sequencing using an engineered transposase.

    Science.gov (United States)

    Kia, Amirali; Gloeckner, Christian; Osothprarop, Trina; Gormley, Niall; Bomati, Erin; Stephenson, Michelle; Goryshin, Igor; He, Molly Min

    2017-01-17

    Next-generation sequencing (NGS) has transformed genomic research by reducing turnaround time and cost. However, no major breakthrough has been made in the upstream library preparation methods until the transposase-based Nextera method was invented. Nextera combines DNA fragmentation and barcoding in a single tube reaction and therefore enables a very fast workflow to sequencing-ready DNA libraries within a couple of hours. When compared to the traditional ligation-based methods, transposed-based Nextera has a slight insertion bias. Here we present the discovery of a mutant transposase (Tn5-059) with a lowered GC insertion bias through protein engineering. We demonstrate Tn5-059 reduces AT dropout and increases uniformity of genome coverage in both bacterial genomes and human genome. We also observe higher library diversity generated by Tn5-059 when compared to Nextera v2 for human exomes, which leads to less sequencing and lower cost per genome. In addition, when used for human exomes, Tn5-059 delivers consistent library insert size over a range of input DNA, allowing up to a tenfold variance from the 50 ng input recommendation. Enhanced DNA input tolerance of Tn5-059 can translate to flexibility and robustness of workflow. DNA input tolerance together with superior uniformity of coverage and lower AT dropouts extend the applications of transposase based library preps. We discuss possible mechanisms of improvements in Tn5-059, and potential advantages of using the new mutant in varieties of applications including microbiome sequencing and chromatin profiling.

  7. The sequence and de novo assembly of the giant panda genome

    Science.gov (United States)

    Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

    2013-01-01

    Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809

  8. Highly effective sequencing whole chloroplast genomes of angiosperms by nine novel universal primer pairs.

    Science.gov (United States)

    Yang, Jun-Bo; Li, De-Zhu; Li, Hong-Tao

    2014-09-01

    Chloroplast genomes supply indispensable information that helps improve the phylogenetic resolution and even as organelle-scale barcodes. Next-generation sequencing technologies have helped promote sequencing of complete chloroplast genomes, but compared with the number of angiosperms, relatively few chloroplast genomes have been sequenced. There are two major reasons for the paucity of completely sequenced chloroplast genomes: (i) massive amounts of fresh leaves are needed for chloroplast sequencing and (ii) there are considerable gaps in the sequenced chloroplast genomes of many plants because of the difficulty of isolating high-quality chloroplast DNA, preventing complete chloroplast genomes from being assembled. To overcome these obstacles, all known angiosperm chloroplast genomes available to date were analysed, and then we designed nine universal primer pairs corresponding to the highly conserved regions. Using these primers, angiosperm whole chloroplast genomes can be amplified using long-range PCR and sequenced using next-generation sequencing methods. The primers showed high universality, which was tested using 24 species representing major clades of angiosperms. To validate the functionality of the primers, eight species representing major groups of angiosperms, that is, early-diverging angiosperms, magnoliids, monocots, Saxifragales, fabids, malvids and asterids, were sequenced and assembled their complete chloroplast genomes. In our trials, only 100 mg of fresh leaves was used. The results show that the universal primer set provided an easy, effective and feasible approach for sequencing whole chloroplast genomes in angiosperms. The designed universal primer pairs provide a possibility to accelerate genome-scale data acquisition and will therefore magnify the phylogenetic resolution and species identification in angiosperms. © 2014 John Wiley & Sons Ltd.

  9. Complete Genome Sequence of Treponema paraluiscuniculi, Strain Cuniculi A: The Loss of Infectivity to Humans Is Associated with Genome Decay

    Science.gov (United States)

    Šmajs, David; Zobaníková, Marie; Strouhal, Michal; Čejková, Darina; Dugan-Rocha, Shannon; Pospíšilová, Petra; Norris, Steven J.; Albert, Tom; Qin, Xiang; Hallsworth-Pepin, Kym; Buhay, Christian; Muzny, Donna M.; Chen, Lei; Gibbs, Richard A.; Weinstock, George M.

    2011-01-01

    Treponema paraluiscuniculi is the causative agent of rabbit venereal spirochetosis. It is not infectious to humans, although its genome structure is very closely related to other pathogenic Treponema species including Treponema pallidum subspecies pallidum, the etiological agent of syphilis. In this study, the genome sequence of Treponema paraluiscuniculi, strain Cuniculi A, was determined by a combination of several high-throughput sequencing strategies. Whereas the overall size (1,133,390 bp), arrangement, and gene content of the Cuniculi A genome closely resembled those of the T. pallidum genome, the T. paraluiscuniculi genome contained a markedly higher number of pseudogenes and gene fragments (51). In addition to pseudogenes, 33 divergent genes were also found in the T. paraluiscuniculi genome. A set of 32 (out of 84) affected genes encoded proteins of known or predicted function in the Nichols genome. These proteins included virulence factors, gene regulators and components of DNA repair and recombination. The majority (52 or 61.9%) of the Cuniculi A pseudogenes and divergent genes were of unknown function. Our results indicate that T. paraluiscuniculi has evolved from a T. pallidum-like ancestor and adapted to a specialized host-associated niche (rabbits) during loss of infectivity to humans. The genes that are inactivated or altered in T. paraluiscuniculi are candidates for virulence factors important in the infectivity and pathogenesis of T. pallidum subspecies. PMID:21655244

  10. Complete genome sequence of Treponema paraluiscuniculi, strain Cuniculi A: the loss of infectivity to humans is associated with genome decay.

    Directory of Open Access Journals (Sweden)

    David Šmajs

    Full Text Available Treponema paraluiscuniculi is the causative agent of rabbit venereal spirochetosis. It is not infectious to humans, although its genome structure is very closely related to other pathogenic Treponema species including Treponema pallidum subspecies pallidum, the etiological agent of syphilis. In this study, the genome sequence of Treponema paraluiscuniculi, strain Cuniculi A, was determined by a combination of several high-throughput sequencing strategies. Whereas the overall size (1,133,390 bp, arrangement, and gene content of the Cuniculi A genome closely resembled those of the T. pallidum genome, the T. paraluiscuniculi genome contained a markedly higher number of pseudogenes and gene fragments (51. In addition to pseudogenes, 33 divergent genes were also found in the T. paraluiscuniculi genome. A set of 32 (out of 84 affected genes encoded proteins of known or predicted function in the Nichols genome. These proteins included virulence factors, gene regulators and components of DNA repair and recombination. The majority (52 or 61.9% of the Cuniculi A pseudogenes and divergent genes were of unknown function. Our results indicate that T. paraluiscuniculi has evolved from a T. pallidum-like ancestor and adapted to a specialized host-associated niche (rabbits during loss of infectivity to humans. The genes that are inactivated or altered in T. paraluiscuniculi are candidates for virulence factors important in the infectivity and pathogenesis of T. pallidum subspecies.

  11. PHYRN: a robust method for phylogenetic analysis of highly divergent sequences.

    Directory of Open Access Journals (Sweden)

    Gaurav Bhardwaj

    Full Text Available Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤ 25% amino acid identity. Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian against a novel MSA-independent method (PHYRN described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position, PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.

  12. Insights into three whole-genome duplications gleaned from the Paramecium caudatum genome sequence.

    Science.gov (United States)

    McGrath, Casey L; Gout, Jean-Francois; Doak, Thomas G; Yanagi, Akira; Lynch, Michael

    2014-08-01

    Paramecium has long been a model eukaryote. The sequence of the Paramecium tetraurelia genome reveals a history of three successive whole-genome duplications (WGDs), and the sequences of P. biaurelia and P. sexaurelia suggest that these WGDs are shared by all members of the aurelia species complex. Here, we present the genome sequence of P. caudatum, a species closely related to the P. aurelia species group. P. caudatum shares only the most ancient of the three WGDs with the aurelia complex. We found that P. caudatum maintains twice as many paralogs from this early event as the P. aurelia species, suggesting that post-WGD gene retention is influenced by subsequent WGDs and supporting the importance of selection for dosage in gene retention. The availability of P. caudatum as an outgroup allows an expanded analysis of the aurelia intermediate and recent WGD events. Both the Guanine+Cytosine (GC) content and the expression level of preduplication genes are significant predictors of duplicate retention. We find widespread asymmetrical evolution among aurelia paralogs, which is likely caused by gradual pseudogenization rather than by neofunctionalization. Finally, cases of divergent resolution of intermediate WGD duplicates between aurelia species implicate this process acts as an ongoing reinforcement mechanism of reproductive isolation long after a WGD event.

  13. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise......, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA...... sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein...

  14. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  15. Draft Genome Sequence of Rubrivivax gelatinosus CBS

    Energy Technology Data Exchange (ETDEWEB)

    Hu, P. S.; Lang, J.; Wawrousek, K.; Yu, J. P.; Maness, P. C.; Chen, J.

    2012-06-01

    Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N{sub 2} as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H{sub 2}. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

  16. An EST-based genome scan using 454 sequencing in the marine snail Littorina saxatilis.

    Science.gov (United States)

    Galindo, J; Grahame, J W; Butlin, R K

    2010-09-01

    Genome scans have been used in the studies of ecological speciation to find genomic regions ('outlier loci') showing reduced gene flow between divergent populations/species. High-throughput sequencing ('454') offers new opportunities in this field via transcriptome sequencing. Divergent ecotypes of the marine gastropod Littorina saxatilis represent a good example of incipient ecological speciation. We performed a 454-based genome scan between H and M ecotypes of L. saxatilis from the British Isles using cDNA of pooled individuals. Allele frequencies were calculated for 2454 single nucleotide polymorphisms (SNPs), within 572 contigs, and 7% of loci were detected as outliers. Functional annotation of the contigs containing outlier SNPs showed that they included shell matrix and muscle proteins (lithostathine, mucin, titin), proteins involved in energetic metabolism (arginine kinase, NADH dehydrogenase) and reverse transcriptases. Follow-up investigations into these proteins and unannotated outliers will be a promising route in the study of ecological speciation in L. saxatilis.

  17. Genome sequencing highlights the dynamic early history of dogs.

    Directory of Open Access Journals (Sweden)

    Adam H Freedman

    2014-01-01

    Full Text Available To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11-16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog

  18. Genome sequencing highlights the dynamic early history of dogs.

    Directory of Open Access Journals (Sweden)

    Adam H Freedman

    2014-01-01

    Full Text Available To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11-16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog

  19. Genome scans for divergent selection in natural populations of the widespread hardwood species Eucalyptus grandis (Myrtaceae) using microsatellites

    Science.gov (United States)

    Song, Zhijiao; Zhang, Miaomiao; Li, Fagen; Weng, Qijie; Zhou, Chanpin; Li, Mei; Li, Jie; Huang, Huanhua; Mo, Xiaoyong; Gan, Siming

    2016-01-01

    Identification of loci or genes under natural selection is important for both understanding the genetic basis of local adaptation and practical applications, and genome scans provide a powerful means for such identification purposes. In this study, genome-wide simple sequence repeats markers (SSRs) were used to scan for molecular footprints of divergent selection in Eucalyptus grandis, a hardwood species occurring widely in costal areas from 32° S to 16° S in Australia. High population diversity levels and weak population structure were detected with putatively neutral genomic SSRs. Using three FST outlier detection methods, a total of 58 outlying SSRs were collectively identified as loci under divergent selection against three non-correlated climatic variables, namely, mean annual temperature, isothermality and annual precipitation. Using a spatial analysis method, nine significant associations were revealed between FST outlier allele frequencies and climatic variables, involving seven alleles from five SSR loci. Of the five significant SSRs, two (EUCeSSR1044 and Embra394) contained alleles of putative genes with known functional importance for response to climatic factors. Our study presents critical information on the population diversity and structure of the important woody species E. grandis and provides insight into the adaptive responses of perennial trees to climatic variations. PMID:27748400

  20. Intermediate divergence levels maximize the strength of structure-sequence correlations in enzymes and viral proteins.

    Science.gov (United States)

    Jackson, Eleisha L; Shahmoradi, Amir; Spielman, Stephanie J; Jack, Benjamin R; Wilke, Claus O

    2016-07-01

    Structural properties such as solvent accessibility and contact number predict site-specific sequence variability in many proteins. However, the strength and significance of these structure-sequence relationships vary widely among different proteins, with absolute correlation strengths ranging from 0 to 0.8. In particular, two recent works have made contradictory observations. Yeh et al. (Mol. Biol. Evol. 31:135-139, 2014) found that both relative solvent accessibility (RSA) and weighted contact number (WCN) are good predictors of sitewise evolutionary rate in enzymes, with WCN clearly out-performing RSA. Shahmoradi et al. (J. Mol. Evol. 79:130-142, 2014) considered these same predictors (as well as others) in viral proteins and found much weaker correlations and no clear advantage of WCN over RSA. Because these two studies had substantial methodological differences, however, a direct comparison of their results is not possible. Here, we reanalyze the datasets of the two studies with one uniform analysis pipeline, and we find that many apparent discrepancies between the two analyses can be attributed to the extent of sequence divergence in individual alignments. Specifically, the alignments of the enzyme dataset are much more diverged than those of the virus dataset, and proteins with higher divergence exhibit, on average, stronger structure-sequence correlations. However, the highest structure-sequence correlations are observed at intermediate divergence levels, where both highly conserved and highly variable sites are present in the same alignment.

  1. Genome sequence of Psychrobacter cibarius strain W1

    DEFF Research Database (Denmark)

    Raghupathi, Prem Krishnan; Herschend, Jakob; Røder, Henriette Lyng

    2016-01-01

    Here, we report the draft genome sequence of Psychrobacter cibarius strain W1, which was isolated at a slaughterhouse in Denmark. The 3.63-Mb genome sequence was assembled into 241 contigs.......Here, we report the draft genome sequence of Psychrobacter cibarius strain W1, which was isolated at a slaughterhouse in Denmark. The 3.63-Mb genome sequence was assembled into 241 contigs....

  2. Evaluation of intra- and interspecific divergence of satellite DNA sequences by nucleotide frequency calculation and pairwise sequence comparison

    Directory of Open Access Journals (Sweden)

    Kato Mikio

    2003-01-01

    Full Text Available Satellite DNA sequences are known to be highly variable and to have been subjected to concerted evolution that homogenizes member sequences within species. We have analyzed the mode of evolution of satellite DNA sequences in four fishes from the genus Diplodus by calculating the nucleotide frequency of the sequence array and the phylogenetic distances between member sequences. Calculation of nucleotide frequency and pairwise sequence comparison enabled us to characterize the divergence among member sequences in this satellite DNA family. The results suggest that the evolutionary rate of satellite DNA in D. bellottii is about two-fold greater than the average of the other three fishes, and that the sequence homogenization event occurred in D. puntazzo more recently than in the others. The procedures described here are effective to characterize mode of evolution of satellite DNA.

  3. Draft genome sequence of an aflatoxigenic Aspergillus species, A. bombycis

    Science.gov (United States)

    The genome of the A. bombycis Type strain was sequenced using a Personal Genome Machine, followed by annotation of its predicted genes. The genome size for A. bombycis was found to be approximately 37 Mb and contained 12,266 genes. This announcement introduces a sequenced genome for an aflatoxigenic...

  4. What Will We Do with a Cotton Genome Sequence?

    Institute of Scientific and Technical Information of China (English)

    BRUBAKER Curt

    2008-01-01

    @@ With the publication of "Toward Sequencing Cotton (Gossypium) Genomes" [Chen et al.PlantPhysiology,2007,145:1303-1310-] a clear consensus emerged from the cotton genomics community not only that cotton genome sequences were a critical resource for research and commercial innovationin cotton genomics,but that there was a logical means of achieving this goal.

  5. Sequencing of a Cultivated Diploid Cotton Genome-Gossypium arboreum

    Institute of Scientific and Technical Information of China (English)

    WILKINS; Thea; A

    2008-01-01

    Sequencing the genomes of crop species and model systems contributes significantly to our understanding of the organization,structure and function of plant genomes.In a `white paper' published in 2007,the cotton community set forth a strategic plan for sequencing the AD genome of cultivated upland cotton that initially targets less complex diploid genomes.This strategy banks on the high degree

  6. Identification of ancient remains through genomic sequencing

    Science.gov (United States)

    Blow, Matthew J.; Zhang, Tao; Woyke, Tanja; Speller, Camilla F.; Krivoshapkin, Andrei; Yang, Dongya Y.; Derevianko, Anatoly; Rubin, Edward M.

    2008-01-01

    Studies of ancient DNA have been hindered by the preciousness of remains, the small quantities of undamaged DNA accessible, and the limitations associated with conventional PCR amplification. In these studies, we developed and applied a genomewide adapter-mediated emulsion PCR amplification protocol for ancient mammalian samples estimated to be between 45,000 and 69,000 yr old. Using 454 Life Sciences (Roche) and Illumina sequencing (formerly Solexa sequencing) technologies, we examined over 100 megabases of DNA from amplified extracts, revealing unbiased sequence coverage with substantial amounts of nonredundant nuclear sequences from the sample sources and negligible levels of human contamination. We consistently recorded over 500-fold increases, such that nanogram quantities of starting material could be amplified to microgram quantities. Application of our protocol to a 50,000-yr-old uncharacterized bone sample that was unsuccessful in mitochondrial PCR provided sufficient nuclear sequences for comparison with extant mammals and subsequent phylogenetic classification of the remains. The combined use of emulsion PCR amplification and high-throughput sequencing allows for the generation of large quantities of DNA sequence data from ancient remains. Using such techniques, even small amounts of ancient remains with low levels of endogenous DNA preservation may yield substantial quantities of nuclear DNA, enabling novel applications of ancient DNA genomics to the investigation of extinct phyla. PMID:18426903

  7. Enhanced Dynamic Algorithm of Genome Sequence Alignments

    Directory of Open Access Journals (Sweden)

    Arabi E. keshk

    2014-05-01

    Full Text Available The merging of biology and computer science has created a new field called computational biology that explore the capacities of computers to gain knowledge from biological data, bioinformatics. Computational biology is rooted in life sciences as well as computers, information sciences, and technologies. The main problem in computational biology is sequence alignment that is a way of arranging the sequences of DNA, RNA or protein to identify the region of similarity and relationship between sequences. This paper introduces an enhancement of dynamic algorithm of genome sequence alignment, which called EDAGSA. It is filling the three main diagonals without filling the entire matrix by the unused data. It gets the optimal solution with decreasing the execution time and therefore the performance is increased. To illustrate the effectiveness of optimizing the performance of the proposed algorithm, it is compared with the traditional methods such as Needleman-Wunsch, Smith-Waterman and longest common subsequence algorithms. Also, database is implemented for using the algorithm in multi-sequence alignments for searching the optimal sequence that matches the given sequence.

  8. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  9. Comparative Genomics of Early-Diverging Brucella Strains Reveals a Novel Lipopolysaccharide Biosynthesis Pathway

    Science.gov (United States)

    Wattam, Alice R.; Inzana, Thomas J.; Williams, Kelly P.; Mane, Shrinivasrao P.; Shukla, Maulik; Almeida, Nalvo F.; Dickerman, Allan W.; Mason, Steven; Moriyón, Ignacio; O’Callaghan, David; Whatmore, Adrian M.; Sobral, Bruno W.; Tiller, Rebekah V.; Hoffmaster, Alex R.; Frace, Michael A.; De Castro, Cristina; Molinaro, Antonio; Boyle, Stephen M.; De, Barun K.; Setubal, João C.

    2012-01-01

    ABSTRACT Brucella species are Gram-negative bacteria that infect mammals. Recently, two unusual strains (Brucella inopinata BO1T and B. inopinata-like BO2) have been isolated from human patients, and their similarity to some atypical brucellae isolated from Australian native rodent species was noted. Here we present a phylogenomic analysis of the draft genome sequences of BO1T and BO2 and of the Australian rodent strains 83-13 and NF2653 that shows that they form two groups well separated from the other sequenced Brucella spp. Several important differences were noted. Both BO1T and BO2 did not agglutinate significantly when live or inactivated cells were exposed to monospecific A and M antisera against O-side chain sugars composed of N-formyl-perosamine. While BO1T maintained the genes required to synthesize a typical Brucella O-antigen, BO2 lacked many of these genes but still produced a smooth LPS (lipopolysaccharide). Most missing genes were found in the wbk region involved in O-antigen synthesis in classic smooth Brucella spp. In their place, BO2 carries four genes that other bacteria use for making a rhamnose-based O-antigen. Electrophoretic, immunoblot, and chemical analyses showed that BO2 carries an antigenically different O-antigen made of repeating hexose-rich oligosaccharide units that made the LPS water-soluble, which contrasts with the homopolymeric O-antigen of other smooth brucellae that have a phenol-soluble LPS. The results demonstrate the existence of a group of early-diverging brucellae with traits that depart significantly from those of the Brucella species described thus far. PMID:22930339

  10. Transforming clinical microbiology with bacterial genome sequencing.

    Science.gov (United States)

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  11. Ancient Duplications and Expression Divergence in the Globin Gene Superfamily of Vertebrates: Insights from the Elephant Shark Genome and Transcriptome.

    Science.gov (United States)

    Opazo, Juan C; Lee, Alison P; Hoffmann, Federico G; Toloza-Villalobos, Jessica; Burmester, Thorsten; Venkatesh, Byrappa; Storz, Jay F

    2015-07-01

    Comparative analyses of vertebrate genomes continue to uncover a surprising diversity of genes in the globin gene superfamily, some of which have very restricted phyletic distributions despite their antiquity. Genomic analysis of the globin gene repertoire of cartilaginous fish (Chondrichthyes) should be especially informative about the duplicative origins and ancestral functions of vertebrate globins, as divergence between Chondrichthyes and bony vertebrates represents the most basal split within the jawed vertebrates. Here, we report a comparative genomic analysis of the vertebrate globin gene family that includes the complete globin gene repertoire of the elephant shark (Callorhinchus milii). Using genomic sequence data from representatives of all major vertebrate classes, integrated analyses of conserved synteny and phylogenetic relationships revealed that the last common ancestor of vertebrates possessed a repertoire of at least seven globin genes: single copies of androglobin and neuroglobin, four paralogous copies of globin X, and the single-copy progenitor of the entire set of vertebrate-specific globins. Combined with expression data, the genomic inventory of elephant shark globins yielded four especially surprising findings: 1) there is no trace of the neuroglobin gene (a highly conserved gene that is present in all other jawed vertebrates that have been examined to date), 2) myoglobin is highly expressed in heart, but not in skeletal muscle (reflecting a possible ancestral condition in vertebrates with single-circuit circulatory systems), 3) elephant shark possesses two highly divergent globin X paralogs, one of which is preferentially expressed in gonads, and 4) elephant shark possesses two structurally distinct α-globin paralogs, one of which is preferentially expressed in the brain. Expression profiles of elephant shark globin genes reveal distinct specializations of function relative to orthologs in bony vertebrates and suggest hypotheses about

  12. Detecting overlapping coding sequences in virus genomes

    Directory of Open Access Journals (Sweden)

    Brown Chris M

    2006-02-01

    Full Text Available Abstract Background Detecting new coding sequences (CDSs in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret – especially within overlapping genes; and viruses often employ non-canonical translational mechanisms – e.g. frameshifting, stop codon read-through, leaky-scanning and internal ribosome entry sites – which can conceal potentially coding open reading frames (ORFs. Results In a previous paper we introduced a new statistic – MLOGD (Maximum Likelihood Overlapping Gene Detector – for detecting and analysing overlapping CDSs. Here we present (a an improved MLOGD statistic, (b a greatly extended suite of software using MLOGD, (c a database of results for 640 virus sequence alignments, and (d a web-interface to the software and database. Tests show that, from an alignment with just 20 mutations, MLOGD can discriminate non-overlapping CDSs from non-coding ORFs with a typical accuracy of up to 98%, and can detect CDSs overlapping known CDSs with a typical accuracy of 90%. In addition, the software produces a variety of statistics and graphics, useful for analysing an input multiple sequence alignment. Conclusion MLOGD is an easy-to-use tool for virus genome annotation, detecting new CDSs – in particular overlapping or short CDSs – and for analysing overlapping CDSs following frameshift sites. The software, web-server, database and supplementary material are available at http://guinevere.otago.ac.nz/mlogd.html.

  13. An evaluation of Comparative Genome Sequencing (CGS by comparing two previously-sequenced bacterial genomes

    Directory of Open Access Journals (Sweden)

    Herring Christopher D

    2007-08-01

    Full Text Available Abstract Background With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. Results In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome Sequencing service provided by Nimblegen Systems Inc., we resequenced the E. coli strain W3110 Kohara using MG1655 as a reference, both of which have been completely sequenced using traditional sequencing methods. CGS detected 7 of 8 small sequence differences, one large deletion, and 9 of 12 IS element insertions present in W3110, but did not detect a large chromosomal inversion. In addition, we confirmed that CGS also detected 2 SNPs, one deletion and 7 IS element insertions that are not present in the genome sequence, which we attribute to changes that occurred after the creation of the W3110 lambda clone library. The false positive rate for SNPs was one per 244 Kb of genome sequence. Conclusion CGS is an effective way to detect multiple mutations present in one bacterium relative to another, and while highly cost-effective, is prone to certain errors. Mutations occurring in repeated sequences or in sequences with a high degree of secondary structure may go undetected. It is also critical to follow up on regions of interest in which SNPs were not called because they often indicate deletions or IS element insertions.

  14. Why Assembling Plant Genome Sequences Is So Challenging

    Directory of Open Access Journals (Sweden)

    Pedro Seoane

    2012-09-01

    Full Text Available In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed.

  15. Why Assembling Plant Genome Sequences Is So Challenging

    Science.gov (United States)

    Claros, Manuel Gonzalo; Bautista, Rocío; Guerrero-Fernández, Darío; Benzerki, Hicham; Seoane, Pedro; Fernández-Pozo, Noé

    2012-01-01

    In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. PMID:24832233

  16. Shedding Light on the Grey Zone of Speciation along a Continuum of Genomic Divergence.

    Science.gov (United States)

    Roux, Camille; Fraïsse, Christelle; Romiguier, Jonathan; Anciaux, Yoann; Galtier, Nicolas; Bierne, Nicolas

    2016-12-01

    Speciation results from the progressive accumulation of mutations that decrease the probability of mating between parental populations or reduce the fitness of hybrids-the so-called species barriers. The speciation genomic literature, however, is mainly a collection of case studies, each with its own approach and specificities, such that a global view of the gradual process of evolution from one to two species is currently lacking. Of primary importance is the prevalence of gene flow between diverging entities, which is central in most species concepts and has been widely discussed in recent years. Here, we explore the continuum of speciation thanks to a comparative analysis of genomic data from 61 pairs of populations/species of animals with variable levels of divergence. Gene flow between diverging gene pools is assessed under an approximate Bayesian computation (ABC) framework. We show that the intermediate "grey zone" of speciation, in which taxonomy is often controversial, spans from 0.5% to 2% of net synonymous divergence, irrespective of species life history traits or ecology. Thanks to appropriate modeling of among-locus variation in genetic drift and introgression rate, we clarify the status of the majority of ambiguous cases and uncover a number of cryptic species. Our analysis also reveals the high incidence in animals of semi-isolated species (when some but not all loci are affected by barriers to gene flow) and highlights the intrinsic difficulty, both statistical and conceptual, of delineating species in the grey zone of speciation.

  17. Comparative Genomics Including the Early-Diverging Smut Fungus Ceraceosorus bombacis Reveals Signatures of Parallel Evolution within Plant and Animal Pathogens of Fungi and Oomycetes.

    Science.gov (United States)

    Sharma, Rahul; Xia, Xiaojuan; Riess, Kai; Bauer, Robert; Thines, Marco

    2015-08-27

    Ceraceosorus bombacis is an early-diverging lineage of smut fungi and a pathogen of cotton trees (Bombax ceiba). To study the evolutionary genomics of smut fungi in comparison with other fungal and oomycete pathogens, the genome of C. bombacis was sequenced and comparative genomic analyses were performed. The genome of 26.09 Mb encodes for 8,024 proteins, of which 576 are putative-secreted effector proteins (PSEPs). Orthology analysis revealed 30 ortholog PSEPs among six Ustilaginomycotina genomes, the largest groups of which are lytic enzymes, such as aspartic peptidase and glycoside hydrolase. Positive selection analyses revealed the highest percentage of positively selected PSEPs in C. bombacis compared with other Ustilaginomycotina genomes. Metabolic pathway analyses revealed the absence of genes encoding for nitrite and nitrate reductase in the genome of the human skin pathogen Malassezia globosa, but these enzymes are present in the sequenced plant pathogens in smut fungi. Interestingly, these genes are also absent in cultivable oomycete animal pathogens, while nitrate reductase has been lost in cultivable oomycete plant pathogens. Similar patterns were also observed for obligate biotrophic and hemi-biotrophic fungal and oomycete pathogens. Furthermore, it was found that both fungal and oomycete animal pathogen genomes are lacking cutinases and pectinesterases. Overall, these findings highlight the parallel evolution of certain genomic traits, revealing potential common evolutionary trajectories among fungal and oomycete pathogens, shaping the pathogen genomes according to their lifestyle.

  18. Complete mitochondrial genome sequence of the Eastern gorilla (Gorilla beringei) and implications for african ape biogeography.

    Science.gov (United States)

    Das, Ranajit; Hergenrother, Scott D; Soto-Calderón, Iván D; Dew, J Larry; Anthony, Nicola M; Jensen-Seaman, Michael I

    2014-01-01

    The Western and Eastern species of gorillas (Gorilla gorilla and Gorilla beringei) began diverging in the mid-Pleistocene, but in a complex pattern with ongoing gene flow following their initial split. We sequenced the complete mitochondrial genomes of 1 Eastern and 1 Western gorilla to provide the most accurate date for their mitochondrial divergence, and to analyze patterns of nucleotide substitutions. The most recent common ancestor of these genomes existed about 1.9 million years ago, slightly more recent than that of chimpanzee and bonobo. We in turn use this date as a calibration to reanalyze sequences from the Eastern lowland and mountain gorilla subspecies to estimate their mitochondrial divergence at approximately 380000 years ago. These dates help frame a hypothesis whereby populations became isolated nearly 2 million years ago with restricted maternal gene flow, followed by ongoing male migration until the recent past. This process of divergence with prolonged hybridization occurred against the backdrop of the African Pleistocene, characterized by intense fluctuations in temperature and aridity, while at the same time experiencing tectonic uplifting and consequent shifts in the drainage of major river systems. Interestingly, this same pattern of introgression following divergence and discrepancies between mitochondrial and nuclear loci is seen in fossil hominins from Eurasia, suggesting that such processes may be common in hominids and that living gorillas may provide a useful model for understanding isolation and migration in our extinct relatives.

  19. Inference of gorilla demographic and selective history from whole-genome sequence data.

    Science.gov (United States)

    McManus, Kimberly F; Kelley, Joanna L; Song, Shiya; Veeramah, Krishna R; Woerner, August E; Stevison, Laurie S; Ryder, Oliver A; Ape Genome Project, Great; Kidd, Jeffrey M; Wall, Jeffrey D; Bustamante, Carlos D; Hammer, Michael F

    2015-03-01

    Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection.

  20. Ancient wolf genome reveals an early divergence of domestic dog ancestors and admixture into high-latitude breeds.

    Science.gov (United States)

    Skoglund, Pontus; Ersmark, Erik; Palkopoulou, Eleftheria; Dalén, Love

    2015-06-01

    The origin of domestic dogs is poorly understood [1-15], with suggested evidence of dog-like features in fossils that predate the Last Glacial Maximum [6, 9, 10, 14, 16] conflicting with genetic estimates of a more recent divergence between dogs and worldwide wolf populations [13, 15, 17-19]. Here, we present a draft genome sequence from a 35,000-year-old wolf from the Taimyr Peninsula in northern Siberia. We find that this individual belonged to a population that diverged from the common ancestor of present-day wolves and dogs very close in time to the appearance of the domestic dog lineage. We use the directly dated ancient wolf genome to recalibrate the molecular timescale of wolves and dogs and find that the mutation rate is substantially slower than assumed by most previous studies, suggesting that the ancestors of dogs were separated from present-day wolves before the Last Glacial Maximum. We also find evidence of introgression from the archaic Taimyr wolf lineage into present-day dog breeds from northeast Siberia and Greenland, contributing between 1.4% and 27.3% of their ancestry. This demonstrates that the ancestry of present-day dogs is derived from multiple regional wolf populations.

  1. Comparative population genomics of Fusarium graminearum reveals adaptive divergence among cereal head blight pathogens

    Science.gov (United States)

    In this study we sequenced the genomes of 60 Fusarium graminearum, the major fungal pathogen responsible for Fusarium head blight (FHB) in cereal crops world-wide. To investigate adaptive evolution of FHB pathogens, we performed population-level analyses to characterize genomic structure, signatures...

  2. Building the sequence map of the human pan-genome

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Li, Yingrui; Zheng, Hancheng

    2010-01-01

    Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified approximately 5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel...... analysis of predicted genes indicated that the novel sequences contain potentially functional coding regions. We estimate that a complete human pan-genome would contain approximately 19-40 Mb of novel sequence not present in the extant reference genome. The extensive amount of novel sequence contributing...... to the genetic variation of the pan-genome indicates the importance of using complete genome sequencing and de novo assembly....

  3. Comparative Genomics of Campylobacter fetus from Reptiles and Mammals Reveals Divergent Evolution in Host-Associated Lineages.

    Science.gov (United States)

    Gilbert, Maarten J; Miller, William G; Yee, Emma; Zomer, Aldert L; van der Graaf-van Bloois, Linda; Fitzgerald, Collette; Forbes, Ken J; Méric, Guillaume; Sheppard, Samuel K; Wagenaar, Jaap A; Duim, Birgitta

    2016-07-02

    Campylobacter fetus currently comprises three recognized subspecies, which display distinct host association. Campylobacter fetus subsp. fetus and C fetus subsp. venerealis are both associated with endothermic mammals, primarily ruminants, whereas C fetus subsp. testudinum is primarily associated with ectothermic reptiles. Both C. fetus subsp. testudinum and C. fetus subsp. fetus have been associated with severe infections, often with a systemic component, in immunocompromised humans. To study the genetic factors associated with the distinct host dichotomy in C. fetus, whole-genome sequencing and comparison of mammal- and reptile-associated C fetus was performed. The genomes of C fetus subsp. testudinum isolated from either reptiles or humans were compared with elucidate the genetic factors associated with pathogenicity in humans. Genomic comparisons showed conservation of gene content and organization among C fetus subspecies, but a clear distinction between mammal- and reptile-associated C fetus was observed. Several genomic regions appeared to be subspecies specific, including a putative tricarballylate catabolism pathway, exclusively present in C fetus subsp. testudinum strains. Within C fetus subsp. testudinum, sapA, sapB, and sapAB type strains were observed. The recombinant locus iamABC (mlaFED) was exclusively associated with invasive C fetus subsp. testudinum strains isolated from humans. A phylogenetic reconstruction was consistent with divergent evolution in host-associated strains and the existence of a barrier to lateral gene transfer between mammal- and reptile-associated C fetus Overall, this study shows that reptile-associated C fetus subsp. testudinum is genetically divergent from mammal-associated C fetus subspecies.

  4. Comparative Genomics of Campylobacter fetus from Reptiles and Mammals Reveals Divergent Evolution in Host-Associated Lineages

    Science.gov (United States)

    Gilbert, Maarten J.; Miller, William G.; Yee, Emma; Zomer, Aldert L.; van der Graaf-van Bloois, Linda; Fitzgerald, Collette; Forbes, Ken J.; Méric, Guillaume; Sheppard, Samuel K.; Wagenaar, Jaap A.; Duim, Birgitta

    2016-01-01

    Campylobacter fetus currently comprises three recognized subspecies, which display distinct host association. Campylobacter fetus subsp. fetus and C. fetus subsp. venerealis are both associated with endothermic mammals, primarily ruminants, whereas C. fetus subsp. testudinum is primarily associated with ectothermic reptiles. Both C. fetus subsp. testudinum and C. fetus subsp. fetus have been associated with severe infections, often with a systemic component, in immunocompromised humans. To study the genetic factors associated with the distinct host dichotomy in C. fetus, whole-genome sequencing and comparison of mammal- and reptile-associated C. fetus was performed. The genomes of C. fetus subsp. testudinum isolated from either reptiles or humans were compared with elucidate the genetic factors associated with pathogenicity in humans. Genomic comparisons showed conservation of gene content and organization among C. fetus subspecies, but a clear distinction between mammal- and reptile-associated C. fetus was observed. Several genomic regions appeared to be subspecies specific, including a putative tricarballylate catabolism pathway, exclusively present in C. fetus subsp. testudinum strains. Within C. fetus subsp. testudinum, sapA, sapB, and sapAB type strains were observed. The recombinant locus iamABC (mlaFED) was exclusively associated with invasive C. fetus subsp. testudinum strains isolated from humans. A phylogenetic reconstruction was consistent with divergent evolution in host-associated strains and the existence of a barrier to lateral gene transfer between mammal- and reptile-associated C. fetus. Overall, this study shows that reptile-associated C. fetus subsp. testudinum is genetically divergent from mammal-associated C. fetus subspecies. PMID:27333878

  5. Using multi-locus allelic sequence data to estimate genetic divergence among four Lilium (Liliaceae) cultivars.

    Science.gov (United States)

    Shahin, Arwa; Smulders, Marinus J M; van Tuyl, Jaap M; Arens, Paul; Bakker, Freek T

    2014-01-01

    Next Generation Sequencing (NGS) may enable estimating relationships among genotypes using allelic variation of multiple nuclear genes simultaneously. We explored the potential and caveats of this strategy in four genetically distant Lilium cultivars to estimate their genetic divergence from transcriptome sequences using three approaches: POFAD (Phylogeny of Organisms from Allelic Data, uses allelic information of sequence data), RAxML (Randomized Accelerated Maximum Likelihood, tree building based on concatenated consensus sequences) and Consensus Network (constructing a network summarizing among gene tree conflicts). Twenty six gene contigs were chosen based on the presence of orthologous sequences in all cultivars, seven of which also had an orthologous sequence in Tulipa, used as out-group. The three approaches generated the same topology. Although the resolution offered by these approaches is high, in this case there was no extra benefit in using allelic information. We conclude that these 26 genes can be widely applied to construct a species tree for the genus Lilium.

  6. Using multi-locus allelic sequence data to estimate genetic divergence among four Lilium (Liliaceae cultivars

    Directory of Open Access Journals (Sweden)

    Arwa eShahin

    2014-10-01

    Full Text Available Next Generation Sequencing (NGS may enable estimating relationships among genotypes using allelic variation of multiple nuclear genes simultaneously. We explored the potential and caveats of this strategy in four genetically distant Lilium cultivars to estimate their genetic divergence from transcriptome sequences using three approaches: POFAD (Phylogeny of Organisms from Allelic Data, uses allelic information of sequence data, RAxML (Randomized Accelerated Maximum Likelihood, tree building based on concatenated consensus sequences and Consensus Network (constructing a network summarizing among gene tree conflicts. Twenty six gene contigs were chosen based on the presence of orthologous sequences in all cultivars, seven of which also had an orthologous sequence in Tulipa, used as out-group. The three approaches generated the same topology. Although the resolution offered by these approaches is high, in this case there was no extra benefit in using allelic information. We conclude that these 26 genes can be widely applied to construct a species tree for the genus Lilium.

  7. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    Science.gov (United States)

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing.

  8. Analysis of Complete Nucleotide Sequences of 12 Gossypium Chloroplast Genomes: Origin and Evolution of Allotetraploids

    Science.gov (United States)

    Xu, Qin; Xiong, Guanjun; Li, Pengbo; He, Fei; Huang, Yi; Wang, Kunbo; Li, Zhaohu; Hua, Jinping

    2012-01-01

    Background Cotton (Gossypium spp.) is a model system for the analysis of polyploidization. Although ascertaining the donor species of allotetraploid cotton has been intensively studied, sequence comparison of Gossypium chloroplast genomes is still of interest to understand the mechanisms underlining the evolution of Gossypium allotetraploids, while it is generally accepted that the parents were A- and D-genome containing species. Here we performed a comparative analysis of 13 Gossypium chloroplast genomes, twelve of which are presented here for the first time. Methodology/Principal Findings The size of 12 chloroplast genomes under study varied from 159,959 bp to 160,433 bp. The chromosomes were highly similar having >98% sequence identity. They encoded the same set of 112 unique genes which occurred in a uniform order with only slightly different boundary junctions. Divergence due to indels as well as substitutions was examined separately for genome, coding and noncoding sequences. The genome divergence was estimated as 0.374% to 0.583% between allotetraploid species and A-genome, and 0.159% to 0.454% within allotetraploids. Forty protein-coding genes were completely identical at the protein level, and 20 intergenic sequences were completely conserved. The 9 allotetraploids shared 5 insertions and 9 deletions in whole genome, and 7-bp substitutions in protein-coding genes. The phylogenetic tree confirmed a close relationship between allotetraploids and the ancestor of A-genome, and the allotetraploids were divided into four separate groups. Progenitor allotetraploid cotton originated 0.43–0.68 million years ago (MYA). Conclusion Despite high degree of conservation between the Gossypium chloroplast genomes, sequence variations among species could still be detected. Gossypium chloroplast genomes preferred for 5-bp indels and 1–3-bp indels are mainly attributed to the SSR polymorphisms. This study supports that the common ancestor of diploid A-genome species in

  9. Detecting long tandem duplications in genomic sequences

    Directory of Open Access Journals (Sweden)

    Audemard Eric

    2012-05-01

    Full Text Available Abstract Background Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. Results In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,a we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS  Conclusions ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations.

  10. Adaptive divergence despite strong genetic drift: genomic analysis of the evolutionary mechanisms causing genetic differentiation in the island fox (Urocyon littoralis).

    Science.gov (United States)

    Funk, W Chris; Lovich, Robert E; Hohenlohe, Paul A; Hofman, Courtney A; Morrison, Scott A; Sillett, T Scott; Ghalambor, Cameron K; Maldonado, Jesus E; Rick, Torben C; Day, Mitch D; Polato, Nicholas R; Fitzpatrick, Sarah W; Coonan, Timothy J; Crooks, Kevin R; Dillon, Adam; Garcelon, David K; King, Julie L; Boser, Christina L; Gould, Nicholas; Andelt, William F

    2016-05-01

    The evolutionary mechanisms generating the tremendous biodiversity of islands have long fascinated evolutionary biologists. Genetic drift and divergent selection are predicted to be strong on islands and both could drive population divergence and speciation. Alternatively, strong genetic drift may preclude adaptation. We conducted a genomic analysis to test the roles of genetic drift and divergent selection in causing genetic differentiation among populations of the island fox (Urocyon littoralis). This species consists of six subspecies, each of which occupies a different California Channel Island. Analysis of 5293 SNP loci generated using Restriction-site Associated DNA (RAD) sequencing found support for genetic drift as the dominant evolutionary mechanism driving population divergence among island fox populations. In particular, populations had exceptionally low genetic variation, small Ne (range = 2.1-89.7; median = 19.4), and significant genetic signatures of bottlenecks. Moreover, islands with the lowest genetic variation (and, by inference, the strongest historical genetic drift) were most genetically differentiated from mainland grey foxes, and vice versa, indicating genetic drift drives genome-wide divergence. Nonetheless, outlier tests identified 3.6-6.6% of loci as high FST outliers, suggesting that despite strong genetic drift, divergent selection contributes to population divergence. Patterns of similarity among populations based on high FST outliers mirrored patterns based on morphology, providing additional evidence that outliers reflect adaptive divergence. Extremely low genetic variation and small Ne in some island fox populations, particularly on San Nicolas Island, suggest that they may be vulnerable to fixation of deleterious alleles, decreased fitness and reduced adaptive potential.

  11. Chaperonin genes on the rise: new divergent classes and intense duplication in human and other vertebrate genomes

    Directory of Open Access Journals (Sweden)

    Macario Alberto JL

    2010-03-01

    Full Text Available Abstract Background Chaperonin proteins are well known for the critical role they play in protein folding and in disease. However, the recent identification of three diverged chaperonin paralogs associated with the human Bardet-Biedl and McKusick-Kaufman Syndromes (BBS and MKKS, respectively indicates that the eukaryotic chaperonin-gene family is larger and more differentiated than previously thought. The availability of complete genome sequences makes possible a definitive characterization of the complete set of chaperonin sequences in human and other species. Results We identified fifty-four chaperonin-like sequences in the human genome and similar numbers in the genomes of the model organisms mouse and rat. In mammal genomes we identified, besides the well-known CCT chaperonin genes and the three genes associated with the MKKS and BBS pathological conditions, a newly-defined class of chaperonin genes named CCT8L, represented in human by the two sequences CCT8L1 and CCT8L2. Comparative analyses from several vertebrate genomes established the monophyletic origin of chaperonin-like MKKS and BBS genes from the CCT8 lineage. The CCT8L gene originated from a later duplication also in the CCT8 lineage at the onset of mammal evolution and duplicated in primate genomes. The functionality of CCT8L genes in different species was confirmed by evolutionary analyses and in human by expression data. Detailed sequence analysis and structural predictions of MKKS, BBS and CCT8L proteins strongly suggested that they conserve a typical chaperonin-like core structure but that they are unlikely to form a CCT-like oligomeric complex. The characterization of many newly-discovered chaperonin pseudogenes uncovered the intense duplication activity of eukaryotic chaperonin genes. Conclusions In vertebrates, chaperonin genes, driven by intense duplication processes, have diversified into multiple classes and functionalities that extend beyond their well-known protein

  12. Rapid whole genome sequencing and precision neonatology.

    Science.gov (United States)

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care.

  13. A Mitochondrial Genome Sequence of the Tibetan Antelope( Pantholops hodgsonii )

    Institute of Scientific and Technical Information of China (English)

    Shu-Qing Xu; Xiao-Guang Zheng; Ri-Li Ge; Ying-Zhong Yang; Jun Zhou; Guo-En Jing; Yun-Tian Chen; Jun Wang; Huan-Ming Yang; Jian Wang; Jun Yu

    2005-01-01

    To investigate genetic mechanisms of high altitude adaptations of native mammals on the Tibetan Plateau, we compared mitochondrial sequences of the endangered Pantholops hodgsonii with its lowland distant relatives Ovis aries and Capra hircus, as well as other mammals. The complete mitochondrial genome of P. hodgsonii (16,498 bp) revealed a similar gene order as of other mammals. Because of tandem duplications, the control region of P. hodgsonii mitochondrial genome is shorter than those of O. aries and C. hircus, but longer than those of Bos species. Phylogenetic analysis based on alignments of the entire cytochrome b genes suggested that P. hodgsonii is more closely related to O. aries and C. hircus, rather than to species of the Antilopinae subfamily. The estimated divergence time between P.hodgsonii and O. aries is about 2.25 million years ago. Further analysis on natural selection indicated that the COXI (cytochrome c oxidase subunit I) gene was under positive selection in P. hodgsonii and Bos grunniens. Considering the same climates and environments shared by these two mammalian species, we proposed that the mitochondrial COXI gene is probably relevant for these native mammals to adapt the high altitude environment unique to the Tibetan Plateau.

  14. Genomic Sequence Comparisons, 1987-2003 Final Report

    Energy Technology Data Exchange (ETDEWEB)

    George M. Church

    2004-07-29

    This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

  15. Genome sequence and analysis of the tuber crop potato

    DEFF Research Database (Denmark)

    Xu, X.; Pan, S.; Cheng, S.

    2011-01-01

    and assemble 86% of the 844-megabase genome. We predict 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin. As the first genome sequence of an asterid, the potato genome reveals 2,642 genes specific to this large angiosperm clade...

  16. Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication.

    Science.gov (United States)

    Wu, G Albert; Prochnik, Simon; Jenkins, Jerry; Salse, Jerome; Hellsten, Uffe; Murat, Florent; Perrier, Xavier; Ruiz, Manuel; Scalabrin, Simone; Terol, Javier; Takita, Marco Aurélio; Labadie, Karine; Poulain, Julie; Couloux, Arnaud; Jabbari, Kamel; Cattonaro, Federica; Del Fabbro, Cristian; Pinosio, Sara; Zuccolo, Andrea; Chapman, Jarrod; Grimwood, Jane; Tadeo, Francisco R; Estornell, Leandro H; Muñoz-Sanz, Juan V; Ibanez, Victoria; Herrero-Ortega, Amparo; Aleza, Pablo; Pérez-Pérez, Julián; Ramón, Daniel; Brunel, Dominique; Luro, François; Chen, Chunxian; Farmerie, William G; Desany, Brian; Kodira, Chinnappa; Mohiuddin, Mohammed; Harkins, Tim; Fredrikson, Karin; Burns, Paul; Lomsadze, Alexandre; Borodovsky, Mark; Reforgiato, Giuseppe; Freitas-Astúa, Juliana; Quetier, Francis; Navarro, Luis; Roose, Mikeal; Wincker, Patrick; Schmutz, Jeremy; Morgante, Michele; Machado, Marcos Antonio; Talon, Manuel; Jaillon, Olivier; Ollitrault, Patrick; Gmitter, Frederick; Rokhsar, Daniel

    2014-07-01

    Cultivated citrus are selections from, or hybrids of, wild progenitor species whose identities and contributions to citrus domestication remain controversial. Here we sequence and compare citrus genomes--a high-quality reference haploid clementine genome and mandarin, pummelo, sweet-orange and sour-orange genomes--and show that cultivated types derive from two progenitor species. Although cultivated pummelos represent selections from one progenitor species, Citrus maxima, cultivated mandarins are introgressions of C. maxima into the ancestral mandarin species Citrus reticulata. The most widely cultivated citrus, sweet orange, is the offspring of previously admixed individuals, but sour orange is an F1 hybrid of pure C. maxima and C. reticulata parents, thus implying that wild mandarins were part of the early breeding germplasm. A Chinese wild 'mandarin' diverges substantially from C. reticulata, thus suggesting the possibility of other unrecognized wild citrus species. Understanding citrus phylogeny through genome analysis clarifies taxonomic relationships and facilitates sequence-directed genetic improvement.

  17. The genome sequence of taurine cattle: a window to ruminant biology and evolution.

    Science.gov (United States)

    Elsik, Christine G; Tellam, Ross L; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Weinstock, George M; Adelson, David L; Eichler, Evan E; Elnitski, Laura; Guigó, Roderic; Hamernik, Debora L; Kappes, Steve M; Lewin, Harris A; Lynn, David J; Nicholas, Frank W; Reymond, Alexandre; Rijnkels, Monique; Skow, Loren C; Zdobnov, Evgeny M; Schook, Lawrence; Womack, James; Alioto, Tyler; Antonarakis, Stylianos E; Astashyn, Alex; Chapple, Charles E; Chen, Hsiu-Chuan; Chrast, Jacqueline; Câmara, Francisco; Ermolaeva, Olga; Henrichsen, Charlotte N; Hlavina, Wratko; Kapustin, Yuri; Kiryutin, Boris; Kitts, Paul; Kokocinski, Felix; Landrum, Melissa; Maglott, Donna; Pruitt, Kim; Sapojnikov, Victor; Searle, Stephen M; Solovyev, Victor; Souvorov, Alexandre; Ucla, Catherine; Wyss, Carine; Anzola, Juan M; Gerlach, Daniel; Elhaik, Eran; Graur, Dan; Reese, Justin T; Edgar, Robert C; McEwan, John C; Payne, Gemma M; Raison, Joy M; Junier, Thomas; Kriventseva, Evgenia V; Eyras, Eduardo; Plass, Mireya; Donthu, Ravikiran; Larkin, Denis M; Reecy, James; Yang, Mary Q; Chen, Lin; Cheng, Ze; Chitko-McKown, Carol G; Liu, George E; Matukumalli, Lakshmi K; Song, Jiuzhou; Zhu, Bin; Bradley, Daniel G; Brinkman, Fiona S L; Lau, Lilian P L; Whiteside, Matthew D; Walker, Angela; Wheeler, Thomas T; Casey, Theresa; German, J Bruce; Lemay, Danielle G; Maqbool, Nauman J; Molenaar, Adrian J; Seo, Seongwon; Stothard, Paul; Baldwin, Cynthia L; Baxter, Rebecca; Brinkmeyer-Langford, Candice L; Brown, Wendy C; Childers, Christopher P; Connelley, Timothy; Ellis, Shirley A; Fritz, Krista; Glass, Elizabeth J; Herzig, Carolyn T A; Iivanainen, Antti; Lahmers, Kevin K; Bennett, Anna K; Dickens, C Michael; Gilbert, James G R; Hagen, Darren E; Salih, Hanni; Aerts, Jan; Caetano, Alexandre R; Dalrymple, Brian; Garcia, Jose Fernando; Gill, Clare A; Hiendleder, Stefan G; Memili, Erdogan; Spurlock, Diane; Williams, John L; Alexander, Lee; Brownstein, Michael J; Guan, Leluo; Holt, Robert A; Jones, Steven J M; Marra, Marco A; Moore, Richard; Moore, Stephen S; Roberts, Andy; Taniguchi, Masaaki; Waterman, Richard C; Chacko, Joseph; Chandrabose, Mimi M; Cree, Andy; Dao, Marvin Diep; Dinh, Huyen H; Gabisi, Ramatu Ayiesha; Hines, Sandra; Hume, Jennifer; Jhangiani, Shalini N; Joshi, Vandita; Kovar, Christie L; Lewis, Lora R; Liu, Yih-Shin; Lopez, John; Morgan, Margaret B; Nguyen, Ngoc Bich; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Wright, Rita A; Buhay, Christian; Ding, Yan; Dugan-Rocha, Shannon; Herdandez, Judith; Holder, Michael; Sabo, Aniko; Egan, Amy; Goodell, Jason; Wilczek-Boney, Katarzyna; Fowler, Gerald R; Hitchens, Matthew Edward; Lozado, Ryan J; Moen, Charles; Steffen, David; Warren, James T; Zhang, Jingkun; Chiu, Readman; Schein, Jacqueline E; Durbin, K James; Havlak, Paul; Jiang, Huaiyang; Liu, Yue; Qin, Xiang; Ren, Yanru; Shen, Yufeng; Song, Henry; Bell, Stephanie Nicole; Davis, Clay; Johnson, Angela Jolivet; Lee, Sandra; Nazareth, Lynne V; Patel, Bella Mayurkumar; Pu, Ling-Ling; Vattathil, Selina; Williams, Rex Lee; Curry, Stacey; Hamilton, Cerissa; Sodergren, Erica; Wheeler, David A; Barris, Wes; Bennett, Gary L; Eggen, André; Green, Ronnie D; Harhay, Gregory P; Hobbs, Matthew; Jann, Oliver; Keele, John W; Kent, Matthew P; Lien, Sigbjørn; McKay, Stephanie D; McWilliam, Sean; Ratnakumar, Abhirami; Schnabel, Robert D; Smith, Timothy; Snelling, Warren M; Sonstegard, Tad S; Stone, Roger T; Sugimoto, Yoshikazu; Takasuga, Akiko; Taylor, Jeremy F; Van Tassell, Curtis P; Macneil, Michael D; Abatepaulo, Antonio R R; Abbey, Colette A; Ahola, Virpi; Almeida, Iassudara G; Amadio, Ariel F; Anatriello, Elen; Bahadue, Suria M; Biase, Fernando H; Boldt, Clayton R; Carroll, Jeffery A; Carvalho, Wanessa A; Cervelatti, Eliane P; Chacko, Elsa; Chapin, Jennifer E; Cheng, Ye; Choi, Jungwoo; Colley, Adam J; de Campos, Tatiana A; De Donato, Marcos; Santos, Isabel K F de Miranda; de Oliveira, Carlo J F; Deobald, Heather; Devinoy, Eve; Donohue, Kaitlin E; Dovc, Peter; Eberlein, Annett; Fitzsimmons, Carolyn J; Franzin, Alessandra M; Garcia, Gustavo R; Genini, Sem; Gladney, Cody J; Grant, Jason R; Greaser, Marion L; Green, Jonathan A; Hadsell, Darryl L; Hakimov, Hatam A; Halgren, Rob; Harrow, Jennifer L; Hart, Elizabeth A; Hastings, Nicola; Hernandez, Marta; Hu, Zhi-Liang; Ingham, Aaron; Iso-Touru, Terhi; Jamis, Catherine; Jensen, Kirsty; Kapetis, Dimos; Kerr, Tovah; Khalil, Sari S; Khatib, Hasan; Kolbehdari, Davood; Kumar, Charu G; Kumar, Dinesh; Leach, Richard; Lee, Justin C-M; Li, Changxi; Logan, Krystin M; Malinverni, Roberto; Marques, Elisa; Martin, William F; Martins, Natalia F; Maruyama, Sandra R; Mazza, Raffaele; McLean, Kim L; Medrano, Juan F; Moreno, Barbara T; Moré, Daniela D; Muntean, Carl T; Nandakumar, Hari P; Nogueira, Marcelo F G; Olsaker, Ingrid; Pant, Sameer D; Panzitta, Francesca; Pastor, Rosemeire C P; Poli, Mario A; Poslusny, Nathan; Rachagani, Satyanarayana; Ranganathan, Shoba; Razpet, Andrej; Riggs, Penny K; Rincon, Gonzalo; Rodriguez-Osorio, Nelida; Rodriguez-Zas, Sandra L; Romero, Natasha E; Rosenwald, Anne; Sando, Lillian; Schmutz, Sheila M; Shen, Libing; Sherman, Laura; Southey, Bruce R; Lutzow, Ylva Strandberg; Sweedler, Jonathan V; Tammen, Imke; Telugu, Bhanu Prakash V L; Urbanski, Jennifer M; Utsunomiya, Yuri T; Verschoor, Chris P; Waardenberg, Ashley J; Wang, Zhiquan; Ward, Robert; Weikard, Rosemarie; Welsh, Thomas H; White, Stephen N; Wilming, Laurens G; Wunderlich, Kris R; Yang, Jianqi; Zhao, Feng-Qi

    2009-04-24

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.

  18. Draft Genome Sequence of Lachancea lanzarotensis CBS 12615T, an Ascomycetous Yeast Isolated from Grapes.

    Science.gov (United States)

    Sarilar, Véronique; Devillers, Hugo; Freel, Kelle C; Schacherer, Joseph; Neuvéglise, Cécile

    2015-04-16

    We report the genome sequencing of the yeast Lachancea lanzarotensis CBS 12615(T). The assembly comprises 24 scaffolds, for a total size of 11.46 Mbp. The annotation revealed 5,058 putative protein-coding genes. Detection of seven centromeres supports a chromosome fusion, which occurred after divergence from Lachancea thermotolerans and Lachancea kluyveri. Copyright © 2015 Sarilar et al.

  19. Genome sequence of the novel marine member of the Gammaproteobacteria strain HTCC5015.

    KAUST Repository

    Thrash, J Cameron

    2010-07-01

    HTCC5015 is a novel, highly divergent marine member of the Gammaproteobacteria, currently without a cultured representative with greater than 89% 16S rRNA gene identity to itself. The organism was isolated from water collected from Hydrostation S south of Bermuda using high-throughput dilution-to-extinction culturing techniques. Here we present the genome sequence of the unique Gammaproteobacterium strain HTCC5015.

  20. The Solanum commersonii Genome Sequence Provides Insights into Adaptation to Stress Conditions and Genome Evolution of Wild Potato Relatives

    Science.gov (United States)

    Aversano, Riccardo; Contaldi, Felice; Ercolano, Maria Raffaella; Grosso, Valentina; Iorizzo, Massimo; Tatino, Filippo; Xumerle, Luciano; Dal Molin, Alessandra; Avanzato, Carla; Ferrarini, Alberto; Delledonne, Massimo; Sanseverino, Walter; Cigliano, Riccardo Aiese; Capella-Gutierrez, Salvador; Gabaldón, Toni; Frusciante, Luigi; Bradeen, James M.; Carputo, Domenico

    2015-01-01

    Here, we report the draft genome sequence of Solanum commersonii, which consists of ∼830 megabases with an N50 of 44,303 bp anchored to 12 chromosomes, using the potato (Solanum tuberosum) genome sequence as a reference. Compared with potato, S. commersonii shows a striking reduction in heterozygosity (1.5% versus 53 to 59%), and differences in genome sizes were mainly due to variations in intergenic sequence length. Gene annotation by ab initio prediction supported by RNA-seq data produced a catalog of 1703 predicted microRNAs, 18,882 long noncoding RNAs of which 20% are shown to target cold-responsive genes, and 39,290 protein-coding genes with a significant repertoire of nonredundant nucleotide binding site-encoding genes and 126 cold-related genes that are lacking in S. tuberosum. Phylogenetic analyses indicate that domesticated potato and S. commersonii lineages diverged ∼2.3 million years ago. Three duplication periods corresponding to genome enrichment for particular gene families related to response to salt stress, water transport, growth, and defense response were discovered. The draft genome sequence of S. commersonii substantially increases our understanding of the domesticated germplasm, facilitating translation of acquired knowledge into advances in crop stability in light of global climate and environmental changes. PMID:25873387

  1. Comparative analysis of two phenologically divergent populations of the pine processionary moth (Thaumetopoea pityocampa) by de novo transcriptome sequencing.

    Science.gov (United States)

    Gschloessl, Bernhard; Vogel, Heiko; Burban, Christian; Heckel, David; Streiff, Réjane; Kerdelhué, Carole

    2014-03-01

    The pine processionary moth Thaumetopoea pityocampa is a Mediterranean lepidopteran defoliator that experiences a rapid range expansion towards higher latitudes and altitudes due to the current climate warming. Its phenology - the time of sexual reproduction - is certainly a key trait for the local adaptation of the processionary moth to climatic conditions. Moreover, an exceptional case of allochronic differentiation was discovered ca. 15 years ago in this species. A population with a shifted phenology (the summer population, SP) co-exists near Leiria, Portugal, with a population following the classical cycle (the winter population, WP). The existence of this population is an outstanding opportunity to decipher the genetic bases of phenology. No genomic resources were so far available for T. pityocampa. We developed a high-throughput sequencing approach to build a first reference transcriptome, and to proceed with comparative analyses of the sympatric SP and WP. We pooled RNA extracted from whole individuals of various developmental stages, and performed a transcriptome characterisation for both populations combining Roche 454-FLX and traditional Sanger data. The obtained sequences were clustered into ca. 12,000 transcripts corresponding to 9265 unigenes. The mean transcript coverage was 21.9 reads per bp. Almost 70% of the de novo assembled transcripts displayed significant similarity to previously published proteins and around 50% of the transcripts contained a full-length coding region. Comparative analyses of the population transcriptomes allowed to investigate genes specifically expressed in one of the studied populations only, and to identify the most divergent homologous SP/WP transcripts. The most divergent pairs of transcripts did not correspond to obvious phenology-related candidate genes, and 43% could not be functionally annotated. This study provides the first comprehensive genome-wide resource for the target species T. pityocampa. Many of the

  2. Genome Sequence of Stachybotrys chartarum Strain 51-11

    OpenAIRE

    Betancourt, Doris A.; Dean, Timothy R.; Kim, Jean; Levy, Josh

    2015-01-01

    The Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina HiSeq 2000 and PacBio technologies. Since S. chartarum has been implicated as having health impacts within water-damaged buildings, any information extracted from the genomic sequence data relating to toxins or the metabolism of the fungus might be useful.

  3. Complete Genome Sequence of Rift Valley Fever Virus Strain Lunyo.

    Science.gov (United States)

    Lumley, Sarah; Horton, Daniel L; Marston, Denise A; Johnson, Nicholas; Ellis, Richard J; Fooks, Anthony R; Hewson, Roger

    2016-04-14

    Using next-generation sequencing technologies, the first complete genome sequence of Rift Valley fever virus strain Lunyo is reported here. Originally reported as an attenuated antigenic variant strain from Uganda, genomic sequence analysis shows that Lunyo clusters together with other Ugandan isolates.

  4. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis

    DEFF Research Database (Denmark)

    Carlton, Jane M.; Hirt, Robert P.; Silva, Joana C.

    2007-01-01

    We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the approximately 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion...

  5. Inferring the evolutionary histories of divergences in Hylobates and Nomascus gibbons through multilocus sequence data

    OpenAIRE

    Chan, Y.; Roos, C.; Inoue-Murayama, M.; Inoue, E; Shih, C.; Pei, K.; Vigilant, L.

    2013-01-01

    Background Gibbons (Hylobatidae) are the most diverse group of living apes. They exist as geographically-contiguous species which diverged more rapidly than did their close relatives, the great apes (Hominidae). Of the four extant gibbon genera, the evolutionary histories of two polyspecific genera, Hylobates and Nomascus, have been the particular focus of research but the DNA sequence data used was largely derived from the maternally inherited mitochondrial DNA (mtDNA) locus. Results To inve...

  6. Coevolution between simple sequence repeats (SSRs and virus genome size

    Directory of Open Access Journals (Sweden)

    Zhao Xiangyan

    2012-08-01

    Full Text Available Abstract Background Relationship between the level of repetitiveness in genomic sequence and genome size has been investigated by making use of complete prokaryotic and eukaryotic genomes, but relevant studies have been rarely made in virus genomes. Results In this study, a total of 257 viruses were examined, which cover 90% of genera. The results showed that simple sequence repeats (SSRs is strongly, positively and significantly correlated with genome size. Certain repeat class is distributed in a certain range of genome sequence length. Mono-, di- and tri- repeats are widely distributed in all virus genomes, tetra- SSRs as a common component consist in genomes which more than 100 kb in size; in the range of genome  Conclusions We conducted this research standing on the height of the whole virus. We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree.

  7. The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae

    Directory of Open Access Journals (Sweden)

    David B. Neale

    2017-09-01

    Full Text Available A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb. Franco (Coastal Douglas-fir is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp. Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms.

  8. Draft Genome Sequences of Klebsiella variicola Plant Isolates.

    Science.gov (United States)

    Martínez-Romero, Esperanza; Silva-Sanchez, Jesús; Barrios, Humberto; Rodríguez-Medina, Nadia; Martínez-Barnetche, Jesús; Téllez-Sosa, Juan; Gómez-Barreto, Rosa Elena; Garza-Ramos, Ulises

    2015-09-10

    Three endophytic Klebsiella variicola isolates-T29A, 3, and 6A2, obtained from sugar cane stem, maize shoots, and banana leaves, respectively-were used for whole-genome sequencing. Here, we report the draft genome sequences of circular chromosomes and plasmids. The genomes contain plant colonization and cellulases genes. This study will help toward understanding the genomic basis of K. variicola interaction with plant hosts.

  9. Next-generation sequencing strategies for characterizing the turkey genome.

    Science.gov (United States)

    Dalloul, Rami A; Zimin, Aleksey V; Settlage, Robert E; Kim, Sungwon; Reed, Kent M

    2014-02-01

    The turkey genome sequencing project was initiated in 2008 and has relied primarily on next-generation sequencing (NGS) technologies. Our first efforts used a synergistic combination of 2 NGS platforms (Roche/454 and Illumina GAII), detailed bacterial artificial chromosome (BAC) maps, and unique assembly tools to sequence and assemble the genome of the domesticated turkey, Meleagris gallopavo. Since the first release in 2010, efforts to improve the genome assembly, gene annotation, and genomic analyses continue. The initial assembly build (2.01) represented about 89% of the genome sequence with 17X coverage depth (931 Mb). Sequence contigs were assigned to 30 of the 40 chromosomes with approximately 10% of the assembled sequence corresponding to unassigned chromosomes (ChrUn). The sequence has been refined through both genome-wide and area-focused sequencing, including shotgun and paired-end sequencing, and targeted sequencing of chromosomal regions with low or incomplete coverage. These additional efforts have improved the sequence assembly resulting in 2 subsequent genome builds of higher genome coverage (25X/Build3.0 and 30X/Build4.0) with a current sequence totaling 1,010 Mb. Further, BAC with end sequences assigned to the Z/W and MG18 (MHC) chromosomes, ChrUn, or not placed in the previous build were isolated, deeply sequenced (Hi-Seq), and incorporated into the latest build (5.0). To aid in the annotation and to generate a gene expression atlas of major tissues, a comprehensive set of RNA samples was collected at various developmental stages of female and male turkeys. Transcriptome sequencing data (using Illumina Hi-Seq) will provide information to enhance the final assembly and ultimately improve sequence annotation. The most current sequence covers more than 95% of the turkey genome and should yield a much improved gene level of annotation, making it a valuable resource for studying genetic variations underlying economically important traits in poultry.

  10. Reconstructing cancer genomes from paired-end sequencing data

    Directory of Open Access Journals (Sweden)

    Oesper Layla

    2012-04-01

    Full Text Available Abstract Background A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. Results By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i a partition of the reference genome into intervals; (ii adjacencies between these intervals in the cancer genome; (iii an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO, to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B cycles. Conclusions We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is

  11. Next-generation sequencing and large genome assemblies

    OpenAIRE

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-01-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches ...

  12. Genome sequencing and annotation of Morganella sp. SA36

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Morganella sp. Strain SA36, isolated from water spring in Aljouf region, Saudi Arabia. The draft genome size is 2,564,439 bp with a G + C content of 51.1% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDNQ00000000.

  13. Genome sequencing and annotation of Stenotrophomonas sp. SAM8

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Stenotrophomonas sp. strain SAM8, isolated from environmental water. The draft genome size is 3,665,538 bp with a G + C content of 67.2% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDAV00000000.

  14. Genome sequencing and annotation of Proteus sp. SAS71

    Directory of Open Access Journals (Sweden)

    Samy Selim

    2015-12-01

    Full Text Available We report draft genome sequence of Proteus sp. strain SAS71, isolated from water spring in Aljouf region, Saudi Arabia. The draft genome size is 3,037,704 bp with a G + C content of 39.3% and contains 6 rRNA sequence (single copies of 5S, 16S & 23S rRNA. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. LDIU00000000.

  15. Complete Genome Sequence of Corynebacterium pseudotuberculosis Viscerotropic Strain N1

    Science.gov (United States)

    Portela, Ricardo W.; Sousa, Thiago J.; Rocha, Flávia; Pereira, Felipe L.; Dorella, Fernanda A.; Carvalho, Alex F.; Menezes, Nildo; Macedo, Eduardo S.; Moura-Costa, Lilia F.; Meyer, Roberto; Leal, Carlos A. G.; Figueiredo, Henrique C.; Azevedo, Vasco

    2016-01-01

    We present the complete genome sequence of Corynebacterium pseudotuberculosis strain N1. The sequencing was performed with the Ion Torrent Personal Genome Machine system. The genome is a circular chromosome with 2,337,845 bp, a G+C content of 52.85%, and a total of 2,045 coding sequences, 12 rRNAs, 49 tRNAs, and 58 pseudogenes. PMID:26823597

  16. Insights from 20 years of bacterial genome sequencing

    DEFF Research Database (Denmark)

    Land, Miriam; Hauser, Loren; Jun, Se-Ran

    2015-01-01

    the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative...... (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident...

  17. Assembly of the Complete Sitka Spruce Chloroplast Genome Using 10X Genomics’ GemCode Sequencing Data

    Science.gov (United States)

    Coombe, Lauren; Jackman, Shaun D.; Yang, Chen; Vandervalk, Benjamin P.; Moore, Richard A.; Pleasance, Stephen; Coope, Robin J.; Bohlmann, Joerg; Holt, Robert A.; Jones, Steven J. M.; Birol, Inanc

    2016-01-01

    The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis). Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly. PMID:27632164

  18. Extreme sequence divergence but conserved ligand-binding specificity in Streptococcus pyogenes M protein.

    Directory of Open Access Journals (Sweden)

    Jenny Persson

    2006-05-01

    Full Text Available Many pathogenic microorganisms evade host immunity through extensive sequence variability in a protein region targeted by protective antibodies. In spite of the sequence variability, a variable region commonly retains an important ligand-binding function, reflected in the presence of a highly conserved sequence motif. Here, we analyze the limits of sequence divergence in a ligand-binding region by characterizing the hypervariable region (HVR of Streptococcus pyogenes M protein. Our studies were focused on HVRs that bind the human complement regulator C4b-binding protein (C4BP, a ligand that confers phagocytosis resistance. A previous comparison of C4BP-binding HVRs identified residue identities that could be part of a binding motif, but the extended analysis reported here shows that no residue identities remain when additional C4BP-binding HVRs are included. Characterization of the HVR in the M22 protein indicated that two relatively conserved Leu residues are essential for C4BP binding, but these residues are probably core residues in a coiled-coil, implying that they do not directly contribute to binding. In contrast, substitution of either of two relatively conserved Glu residues, predicted to be solvent-exposed, had no effect on C4BP binding, although each of these changes had a major effect on the antigenic properties of the HVR. Together, these findings show that HVRs of M proteins have an extraordinary capacity for sequence divergence and antigenic variability while retaining a specific ligand-binding function.

  19. Sequence Divergence of Microsatellites and Phylogeny Analysis in Tetraploid Cotton Species and Their Putative Diploid Ancestors

    Institute of Scientific and Technical Information of China (English)

    Wang-Zhen GUO; Dong FANG; Wen-Duo YU; Tian-Zhen ZHANG

    2005-01-01

    To determine the level of microsatellite sequence differences and to use the information to construct a phylogenetic relationship for cultivated tetraploid cotton (Gossypium spp.) species and their putative diploid ancestors, 10 genome-derived microsatellite primer pairs were used to amplify eight species,including two tetraploid and six diploid species, in Gossypium. A total of 92 unique amplicons were resolved using polyacrylamide gel electrophoresis. Each amplicon was cloned, sequenced, and analyzed using standard phylogenetic software. Allelic diversities were caused mostly by changes in the number of simple sequence repeat (SSR) motif repeats and only a small proportion resulted from interruption of the SSR motif within the locus for the same genome. The frequency of base substitutions was 0.5%-1.0% in different genomes, with only few indels found. Based on the combined 10 SSR flanking sequence data, the homology of A-genome diploid species averaged 98.9%, even though most of the amplicons were of the same size, and the sequence homology between G. gossypioides (Ulbr.) Standl. and three other D-genome species (G.raimondii Ulbr., G. davidsonii Kell., and G. thurberi Tod.) was 98.5%, 98.6%, and 98.5%, respectively.Phylogenetic trees of the two allotetraploid species and their putative diploid progenitors showed that homoelogous sequences from the A- and D-subgenome were still present in the polyploid subgenomes and they evolved independently. Meanwhile, homoelogous sequence interaction that duplicated loci in the polyploid subgenomes became phylogenetic sisters was also found in the evolutionary history of tetraploid cotton species. The results of the present study suggest that evaluation of SSR variation at the sequence level can be effective in exploring the evolutionary relationships among Gossypuim species.

  20. Inferring the evolutionary histories of divergences in Hylobates and Nomascus gibbons through multilocus sequence data

    Science.gov (United States)

    2013-01-01

    Background Gibbons (Hylobatidae) are the most diverse group of living apes. They exist as geographically-contiguous species which diverged more rapidly than did their close relatives, the great apes (Hominidae). Of the four extant gibbon genera, the evolutionary histories of two polyspecific genera, Hylobates and Nomascus, have been the particular focus of research but the DNA sequence data used was largely derived from the maternally inherited mitochondrial DNA (mtDNA) locus. Results To investigate the evolutionary relationships and divergence processes of gibbon species, particularly those of the Hylobates genus, we produced and analyzed a total of 11.5 kb DNA of sequence at 14 biparentally inherited autosomal loci. We find that on average gibbon genera have a high average sequence diversity but a lower degree of genetic differentiation as compared to great ape genera. Our multilocus species tree features H. pileatus in a basal position and a grouping of the four Sundaic island species (H. agilis, H. klossii, H. moloch and H. muelleri). We conducted pairwise comparisons based on an isolation-with-migration (IM) model and detect signals of asymmetric gene flow between H. lar and H. moloch, between H. agilis and H. muelleri, and between N. leucogenys and N. siki. Conclusions Our multilocus analyses provide inferences of gibbon evolutionary histories complementary to those based on single gene data. The results of IM analyses suggest that the divergence processes of gibbons may be accompanied by gene flow. Future studies using analyses of multi-population model with samples of known provenance for Hylobates and Nomascus species would expand the understanding of histories of gene flow during divergences for these two gibbon genera. PMID:23586586

  1. Genome sequences and comparative genomics of two Lactobacillus ruminis strains from the bovine and human intestinal tracts

    LENUS (Irish Health Repository)

    2011-08-30

    Abstract Background The genus Lactobacillus is characterized by an extraordinary degree of phenotypic and genotypic diversity, which recent genomic analyses have further highlighted. However, the choice of species for sequencing has been non-random and unequal in distribution, with only a single representative genome from the L. salivarius clade available to date. Furthermore, there is no data to facilitate a functional genomic analysis of motility in the lactobacilli, a trait that is restricted to the L. salivarius clade. Results The 2.06 Mb genome of the bovine isolate Lactobacillus ruminis ATCC 27782 comprises a single circular chromosome, and has a G+C content of 44.4%. In silico analysis identified 1901 coding sequences, including genes for a pediocin-like bacteriocin, a single large exopolysaccharide-related cluster, two sortase enzymes, two CRISPR loci and numerous IS elements and pseudogenes. A cluster of genes related to a putative pilin was identified, and shown to be transcribed in vitro. A high quality draft assembly of the genome of a second L. ruminis strain, ATCC 25644 isolated from humans, suggested a slightly larger genome of 2.138 Mb, that exhibited a high degree of synteny with the ATCC 27782 genome. In contrast, comparative analysis of L. ruminis and L. salivarius identified a lack of long-range synteny between these closely related species. Comparison of the L. salivarius clade core proteins with those of nine other Lactobacillus species distributed across 4 major phylogenetic groups identified the set of shared proteins, and proteins unique to each group. Conclusions The genome of L. ruminis provides a comparative tool for directing functional analyses of other members of the L. salivarius clade, and it increases understanding of the divergence of this distinct Lactobacillus lineage from other commensal lactobacilli. The genome sequence provides a definitive resource to facilitate investigation of the genetics, biochemistry and host

  2. Comparative genomic analysis identifies divergent genomic features of pathogenic Enterococcus cecorum including a type IC CRISPR-Cas system, a capsule locus, an epa-like locus, and putative host tissue binding proteins.

    Science.gov (United States)

    Borst, Luke B; Suyemoto, M Mitsu; Scholl, Elizabeth H; Fuller, Fredrick J; Barnes, H John

    2015-01-01

    Enterococcus cecorum (EC) is the dominant enteric commensal of adult chickens and contributes to the gut consortia of many avian and mammalian species. While EC infection is an uncommon zoonosis, like other enterococcal species it can cause life-threating nosocomial infection in people. In contrast to other enterococci which are considered opportunistic pathogens, emerging pathogenic strains of EC cause outbreaks of musculoskeletal disease in broiler chickens. Typical morbidity and mortality is comparable to other important infectious diseases of poultry. In molecular epidemiologic studies, pathogenic EC strains were found to be genetically clonal. These findings suggested acquisition of specific virulence determinants by pathogenic EC. To identify divergent genomic features and acquired virulence determinants in pathogenic EC; comparative genomic analysis was performed on genomes of 3 pathogenic and 3 commensal strains of EC. Pathogenic isolates had smaller genomes with a higher GC content, and they demonstrated large regions of synteny compared to commensal isolates. A molecular phylogenetic analysis demonstrated sequence divergence in pathogenic EC genomes. At a threshold of 98% identity, 414 predicted proteins were identified that were highly conserved in pathogenic EC but not in commensal EC. Among these, divergent CRISPR-cas defense loci were observed. In commensal EC, the type IIA arrangement typical for enterococci was present; however, pathogenic EC had a type IC locus, which is novel in enterococci but commonly observed in streptococci. Potential mediators of virulence identified in this analysis included a polysaccharide capsular locus similar to that recently described for E. faecium, an epa-like locus, and cell wall associated proteins which may bind host extracellular matrix. This analysis identified specific genomic regions, coding sequences, and predicted proteins which may be related to the divergent evolution and increased virulence of emerging

  3. Bayesian molecular clock dating of species divergences in the genomics era.

    Science.gov (United States)

    dos Reis, Mario; Donoghue, Philip C J; Yang, Ziheng

    2016-02-01

    Five decades have passed since the proposal of the molecular clock hypothesis, which states that the rate of evolution at the molecular level is constant through time and among species. This hypothesis has become a powerful tool in evolutionary biology, making it possible to use molecular sequences to estimate the geological ages of species divergence events. With recent advances in Bayesian clock dating methodology and the explosive accumulation of genetic sequence data, molecular clock dating has found widespread applications, from tracking virus pandemics and studying the macroevolutionary process of speciation and extinction to estimating a timescale for life on Earth.

  4. A taste of pineapple evolution through genome sequencing.

    Science.gov (United States)

    Xu, Qing; Liu, Zhong-Jian

    2015-12-01

    The genome sequence assembly of the highly heterozygous Ananas comosus and its varieties is an impressive technical achievement. The sequence opens the door to a greater understanding of pineapple morphology and evolution.

  5. A late origin of the extant eukaryotic diversity: divergence time estimates using rare genomic changes

    Directory of Open Access Journals (Sweden)

    Koonin Eugene V

    2011-05-01

    Full Text Available Abstract Background Accurate estimation of the divergence time of the extant eukaryotes is a fundamentally important but extremely difficult problem owing primarily to gross violations of the molecular clock at long evolutionary distances and the lack of appropriate calibration points close to the date of interest. These difficulties are intrinsic to the dating of ancient divergence events and are reflected in the large discrepancies between estimates obtained with different approaches. Estimates of the age of Last Eukaryotic Common Ancestor (LECA vary approximately twofold, from ~1,100 million years ago (Mya to ~2,300 Mya. Results We applied the genome-wide analysis of rare genomic changes associated with conserved amino acids (RGC_CAs and used several independent techniques to obtain date estimates for the divergence of the major lineages of eukaryotes with calibration intervals for insects, land plants and vertebrates. The results suggest an early divergence of monocot and dicot plants, approximately 340 Mya, raising the possibility of plant-insect coevolution. The divergence of bilaterian animal phyla is estimated at ~400-700 Mya, a range of dates that is consistent with cladogenesis immediately preceding the Cambrian explosion. The origin of opisthokonts (the supergroup of eukaryotes that includes metazoa and fungi is estimated at ~700-1,000 Mya, and the age of LECA at ~1,000-1,300 Mya. We separately analyzed the red algal calibration interval which is based on single fossil. This analysis produced time estimates that were systematically older compared to the other estimates. Nevertheless, the majority of the estimates for the age of the LECA using the red algal data fell within the 1,200-1,400 Mya interval. Conclusion The inference of a "young LECA" is compatible with the latest of previously estimated dates and has substantial biological implications. If these estimates are valid, the approximately 1 to 1.4 billion years of evolution of

  6. A Snapshot of the Emerging Tomato Genome Sequence

    Directory of Open Access Journals (Sweden)

    Lukas A. Mueller

    2009-03-01

    Full Text Available The genome of tomato ( L. is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States as part of the larger “International Solanaceae Genome Project (SOL: Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial chromosome (BAC approach to generate a high-quality tomato euchromatic genome sequence for use as a reference genome for the Solanaceae and euasterids. Sequence is deposited at GenBank and at the SOL Genomics Network (SGN. Currently, there are around 1000 BACs finished or in progress, representing more than a third of the projected euchromatic portion of the genome. An annotation effort is also underway by the International Tomato Annotation Group. The expected number of genes in the euchromatin is ∼40,000, based on an estimate from a preliminary annotation of 11% of finished sequence. Here, we present this first snapshot of the emerging tomato genome and its annotation, a short comparison with potato ( L. sequence data, and the tools available for the researchers to exploit this new resource are also presented. In the future, whole-genome shotgun techniques will be combined with the BAC-by-BAC approach to cover the entire tomato genome. The high-quality reference euchromatic tomato sequence is expected to be near completion by 2010.

  7. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

    DEFF Research Database (Denmark)

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.;

    2005-01-01

    between the species-but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence...

  8. Whole-Genome Sequence Assembly for Mammalian Genomes: Arachne 2

    OpenAIRE

    Jaffe, David B.; Butler, Jonathan; Gnerre, Sante; Mauceli, Evan; Lindblad-Toh, Kerstin; Jill P. Mesirov; Michael C Zody; Lander, Eric S.

    2003-01-01

    We previously described the whole-genome assembly program Arachne, presenting assemblies of simulated data for small to mid-sized genomes. Here we describe algorithmic adaptations to the program, allowing for assembly of mammalian-size genomes, and also improving the assembly of smaller genomes. Three principal changes were simultaneously made and applied to the assembly of the mouse genome, during a six-month period of development: (1) Supercontigs (scaffolds) were iteratively broken and rej...

  9. Insights from twenty years of bacterial genome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Jun, Se Ran [ORNL; Nookaew, Intawat [ORNL; Leuze, Michael Rex [ORNL; Ahn, Tae-Hyuk [ORNL; Karpinets, Tatiana V [ORNL; Lund, Ole [Technical University of Denmark; Kora, Guruprasad H [ORNL; Wassenaar, Trudy [Molecular Microbiology & Genomics Consultants, Zotzenheim, Germany; Poudel, Suresh [ORNL; Ussery, David W [ORNL

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  10. Genome Microscale Heterogeneity among Wild Potatoes Revealed by Diversity Arrays Technology Marker Sequences

    Directory of Open Access Journals (Sweden)

    Alessandra Traini

    2013-01-01

    Full Text Available Tuber-bearing potato species possess several genes that can be exploited to improve the genetic background of the cultivated potato Solanum tuberosum. Among them, S. bulbocastanum and S. commersonii are well known for their strong resistance to environmental stresses. However, scant information is available for these species in terms of genome organization, gene function, and regulatory networks. Consequently, genomic tools to assist breeding are meager, and efficient exploitation of these species has been limited so far. In this paper, we employed the reference genome sequences from cultivated potato and tomato and a collection of sequences of 1,423 potato Diversity Arrays Technology (DArT markers that show polymorphic representation across the genomes of S. bulbocastanum and/or S. commersonii genotypes. Our results highlighted microscale genome sequence heterogeneity that may play a significant role in functional and structural divergence between related species. Our analytical approach provides knowledge of genome structural and sequence variability that could not be detected by transcriptome and proteome approaches.

  11. Whole-genome sequencing of uropathogenic Escherichia coli reveals long evolutionary history of diversity and virulence.

    Science.gov (United States)

    Lo, Yancy; Zhang, Lixin; Foxman, Betsy; Zöllner, Sebastian

    2015-08-01

    Uropathogenic Escherichia coli (UPEC) are phenotypically and genotypically very diverse. This diversity makes it challenging to understand the evolution of UPEC adaptations responsible for causing urinary tract infections (UTI). To gain insight into the relationship between evolutionary divergence and adaptive paths to uropathogenicity, we sequenced at deep coverage (190×) the genomes of 19 E. coli strains from urinary tract infection patients from the same geographic area. Our sample consisted of 14 UPEC isolates and 5 non-UTI-causing (commensal) rectal E. coli isolates. After identifying strain variants using de novo assembly-based methods, we clustered the strains based on pairwise sequence differences using a neighbor-joining algorithm. We examined evolutionary signals on the whole-genome phylogeny and contrasted these signals with those found on gene trees constructed based on specific uropathogenic virulence factors. The whole-genome phylogeny showed that the divergence between UPEC and commensal E. coli strains without known UPEC virulence factors happened over 32 million generations ago. Pairwise diversity between any two strains was also high, suggesting multiple genetic origins of uropathogenic strains in a small geographic region. Contrasting the whole-genome phylogeny with three gene trees constructed from common uropathogenic virulence factors, we detected no selective advantage of these virulence genes over other genomic regions. These results suggest that UPEC acquired uropathogenicity long time ago and used it opportunistically to cause extraintestinal infections.

  12. Genome sequence of Cronobacter sakazakii BAA-894 and comparative genomic hybridization analysis with other Cronobacter species.

    Directory of Open Access Journals (Sweden)

    Eva Kucerova

    Full Text Available BACKGROUND: The genus Cronobacter (formerly called Enterobacter sakazakii is composed of five species; C. sakazakii, C. malonaticus, C. turicensis, C. muytjensii, and C. dublinensis. The genus includes opportunistic human pathogens, and the first three species have been associated with neonatal infections. The most severe diseases are caused in neonates and include fatal necrotizing enterocolitis and meningitis. The genetic basis of the diversity within the genus is unknown, and few virulence traits have been identified. METHODOLOGY/PRINCIPAL FINDINGS: We report here the first sequence of a member of this genus, C. sakazakii strain BAA-894. The genome of Cronobacter sakazakii strain BAA-894 comprises a 4.4 Mb chromosome (57% GC content and two plasmids; 31 kb (51% GC and 131 kb (56% GC. The genome was used to construct a 387,000 probe oligonucleotide tiling DNA microarray covering the whole genome. Comparative genomic hybridization (CGH was undertaken on five other C. sakazakii strains, and representatives of the four other Cronobacter species. Among 4,382 annotated genes inspected in this study, about 55% of genes were common to all C. sakazakii strains and 43% were common to all Cronobacter strains, with 10-17% absence of genes. CONCLUSIONS/SIGNIFICANCE: CGH highlighted 15 clusters of genes in C. sakazakii BAA-894 that were divergent or absent in more than half of the tested strains; six of these are of probable prophage origin. Putative virulence factors were identified in these prophage and in other variable regions. A number of genes unique to Cronobacter species associated with neonatal infections (C. sakazakii, C. malonaticus and C. turicensis were identified. These included a copper and silver resistance system known to be linked to invasion of the blood-brain barrier by neonatal meningitic strains of Escherichia coli. In addition, genes encoding for multidrug efflux pumps and adhesins were identified that were unique to C. sakazakii

  13. Genome Project Standards in a New Era of Sequencing

    Energy Technology Data Exchange (ETDEWEB)

    GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

    2009-06-01

    For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better

  14. The Complete Chloroplast Genome Sequences of the Medicinal Plant Pogostemon cablin

    Directory of Open Access Journals (Sweden)

    Yang He

    2016-06-01

    Full Text Available Pogostemon cablin, the natural source of patchouli alcohol, is an important herb in the Lamiaceae family. Here, we present the entire chloroplast genome of P. cablin. This genome, with 38.24% GC content, is 152,460 bp in length. The genome presents a typical quadripartite structure with two inverted repeats (each 25,417 bp in length, separated by one small and one large single-copy region (17,652 and 83,974 bp in length, respectively. The chloroplast genome encodes 127 genes, of which 107 genes are single-copy, including 79 protein-coding genes, four rRNA genes, and 24 tRNA genes. The genome structure, GC content, and codon usage of this chloroplast genome are similar to those of other species in the family, except that it encodes less protein-coding genes and tRNA genes. Phylogenetic analysis reveals that P. cablin diverged from the Scutellarioideae clade about 29.45 million years ago (Mya. Furthermore, most of the simple sequence repeats (SSRs are short polyadenine or polythymine repeats that contribute to high AT content in the chloroplast genome. Complete sequences and annotation of P. cablin chloroplast genome will facilitate phylogenic, population and genetic engineering research investigations involving this particular species.

  15. Comparative genomics reveals surprising divergence of two closely related strains of uncultivated UCYN-A cyanobacteria

    DEFF Research Database (Denmark)

    Bombar, Deniz; Heller, Philip; Sanchez-Baracaldo, Patricia

    2014-01-01

    Marine planktonic cyanobacteria capable of fixing molecular nitrogen (termed 'diazotrophs') are key in biogeochemical cycling, and the nitrogen fixed is one of the major external sources of nitrogen to the open ocean. Candidatus Atelocyanobacterium thalassa (UCYN-A) is a diazotrophic cyanobacterium...... known for its widespread geographic distribution in tropical and subtropical oligotrophic oceans, unusually reduced genome and symbiosis with a single-celled prymnesiophyte alga. Recently a novel strain of this organism was also detected in coastal waters sampled from the Scripps Institute...... and ecological level. Our results suggest that UCYN-A1 and UCYN-A2 had a common ancestor and diverged after genome reduction. These two variants may reflect adaptation of the host to different niches, which could be coastal and open ocean habitats....

  16. Draft Genome Sequence of a Diarrheagenic Morganella morganii Isolate.

    Science.gov (United States)

    Singh, Pallavi; Mosci, Rebekah; Rudrik, James T; Manning, Shannon D

    2015-10-08

    This is a report of the whole-genome draft sequence of a diarrheagenic Morganella morganii isolate from a patient in Michigan, USA. This genome represents an important addition to the limited number of pathogenic M. morganii genomes available.

  17. Complete genome sequence of ‘Candidatus Liberibacter africanus’

    Science.gov (United States)

    The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...

  18. Genome sequencing and annotation of Cellulomonas sp. HZM

    Directory of Open Access Journals (Sweden)

    Patric Chua

    2015-09-01

    Full Text Available We report the draft genome sequence of Cellulomonas sp. HZM, isolated from a tropical peat swamp forest. The draft genome size is 3,559,280 bp with a G + C content of 73% and contains 3 rRNA sequences (single copies of 5S, 16S and 23S rRNA.

  19. Whole-Genome Sequences of 26 Vibrio cholerae Isolates

    Science.gov (United States)

    Watve, Samit S.; Chande, Aroon T.; Rishishwar, Lavanya; Jordan, I. King

    2016-01-01

    The human pathogen Vibrio cholerae employs several adaptive mechanisms for environmental persistence, including natural transformation and type VI secretion, creating a reservoir for the spread of disease. Here, we report whole-genome sequences of 26 diverse V. cholerae isolates, significantly increasing the sequence diversity of publicly available V. cholerae genomes. PMID:28007852

  20. Complete Genome Sequence of Staphylococcus pseudintermedius Type Strain LMG 22219

    Science.gov (United States)

    Abouelkhair, Mohamed A.; Riley, Matthew C.; Bemis, David A.

    2017-01-01

    ABSTRACT We report the first complete genome sequence of LMG 22219 (=ON 86T = CCUG 49543T), the Staphylococcus pseudintermedius type strain isolated from feline lung tissue. This sequence information will facilitate phylogenetic comparisons of staphylococcal species and other bacteria at the genome level. PMID:28209834

  1. Genome sequence of Kocuria palustris strain W4

    DEFF Research Database (Denmark)

    Herschend, Jakob; Raghupathi, Prem Krishnan; Røder, Henriette Lyng;

    2016-01-01

    We report the 3.09 Mb draft genome sequence ofKocuria palustrisW4, isolated from a slaughterhouse in Denmark.......We report the 3.09 Mb draft genome sequence ofKocuria palustrisW4, isolated from a slaughterhouse in Denmark....

  2. Genome sequence of Kocuria palustris strain W4

    DEFF Research Database (Denmark)

    Herschend, Jakob; Raghupathi, Prem Krishnan; Røder, Henriette Lyng

    2016-01-01

    We report the 3.09 Mb draft genome sequence ofKocuria palustrisW4, isolated from a slaughterhouse in Denmark.......We report the 3.09 Mb draft genome sequence ofKocuria palustrisW4, isolated from a slaughterhouse in Denmark....

  3. Nearly Complete Genome Sequence of Lactobacillus plantarum Strain NIZO2877

    NARCIS (Netherlands)

    Martino, M.E.; Bayjanov, J.R.; Joncour, P.; Hughes, S.; Gillet, B.; Kleerebezem, M; Siezen, R.; Hijum, S.A.F.T. van; Leulier, F.

    2015-01-01

    Lactobacillus plantarum is a versatile bacterial species that is isolated mostly from foods. Here, we present the first genome sequence of L. plantarum strain NIZO2877 isolated from a hot dog in Vietnam. Its two contigs represent a nearly complete genome sequence.

  4. Investigation of genome sequences within the family Pasteurellaceae

    DEFF Research Database (Denmark)

    Angen, Øystein; Ussery, David

    . The homology between genomes ranged from 47.2% to 94.1%. The number of genes found increased steadily for each sequence added to the analysis and the pan-genome of all 20 sequences consisted of around 8500 genes. On the other hand, the number of genes found in all strains steadily decreased when adding...

  5. Full Genome Sequence of Giant Panda Rotavirus Strain CH-1

    Science.gov (United States)

    Guo, Ling; Yang, Shaolin; Wang, Chengdong; Chen, Shijie; Yang, Xiaonong; Hou, Rong; Quan, Zifang; Hao, Zhongxiang

    2013-01-01

    We report here the complete genomic sequence of the giant panda rotavirus strain CH-1. This work is the first to document the complete genomic sequence (segments 1 to 11) of the CH-1 strain, which offers an effective platform for providing authentic research experiences to novice scientists. PMID:23469354

  6. Complete genome sequence of Enterobacter aerogenes KCTC 2190.

    Science.gov (United States)

    Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon; Yang, Kap-Seok

    2012-05-01

    This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs.

  7. Draft Genome Sequence of Enterococcus mundtii CRL1656

    OpenAIRE

    2012-01-01

    We report the draft genome sequence of Enterococcus mundtii CRL1656, which was isolated from the stripping milk of a clinically healthy adult Holstein dairy cow from a dairy farm of the northwestern region of Tucumán (Argentina). The 3.10-Mb genome sequence consists of 450 large contigs and contains 2,741 predicted protein-coding genes.

  8. Complete Genome Sequence of the Human Gut Symbiont Roseburia hominis

    DEFF Research Database (Denmark)

    Travis, Anthony J.; Kelly, Denise; Flint, Harry J;

    2015-01-01

    We report here the complete genome sequence of the human gut symbiont Roseburia hominis A2-183(T) (= DSM 16839(T) = NCIMB 14029(T)), isolated from human feces. The genome is represented by a 3,592,125-bp chromosome with 3,405 coding sequences. A number of potential functions contributing to host-...

  9. Initial sequencing and analysis of the human genome.

    Science.gov (United States)

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  10. A gapless genome sequence of the fungus Botrytis cinerea

    NARCIS (Netherlands)

    Kan, Van Jan A.L.; Stassen, Joost H.M.; Mosbach, Andreas; Lee, Van Der Theo A.J.; Faino, Luigi; Farmer, Andrew D.; Papasotiriou, Dimitrios G.; Zhou, Shiguo; Seidl, Michael F.; Cottam, Eleanor; Edel, Dominique; Hahn, Matthias; Schwartz, David C.; Dietrich, Robert A.; Widdison, Stephanie; Scalliet, Gabriel

    2016-01-01

    Following earlier incomplete and fragmented versions of a genome sequence for the grey mould Botrytis cinerea, a gapless, near-finished genome sequence for B. cinerea strain B05.10 is reported. The assembly comprised 18 chromosomes and was confirmed by an optical map and a genetic map based on ap

  11. Draft Genome Sequence of Raoultella planticola, Isolated from River Water.

    Science.gov (United States)

    Jothikumar, Narayanan; Kahler, Amy; Strockbine, Nancy; Gladney, Lori; Hill, Vincent R

    2014-10-16

    We isolated Raoultella planticola from a river water sample, which was phenotypically indistinguishable from Escherichia coli on MI agar. The genome sequence of R. planticola was determined to gain information about its metabolic functions contributing to its false positive appearance of E. coli on MI agar. We report the first whole genome sequence of Raoultella planticola.

  12. Genome sequence of the Chlamydophila abortus variant strain LLG.

    Science.gov (United States)

    Sait, Michelle; Clark, Ewan M; Wheelhouse, Nick; Livingstone, Morag; Spalding, Lucy; Siarkou, Victoria I; Vretou, Evangelia; Smith, David G E; Lainson, F Alex; Longbottom, David

    2011-08-01

    Chlamydophila abortus is a common cause of ruminant abortion. Here we report the genome sequence of strain LLG, which differs genotypically and phenotypically from the wild-type strain S26/3. Genome sequencing revealed differences between LLG and S26/3 to occur in pseudogene content, in transmembrane head/inc family proteins, and in biotin biosynthesis genes.

  13. Draft Genome Sequence of Tannerella forsythia Type Strain ATCC 43037.

    Science.gov (United States)

    Friedrich, Valentin; Pabinger, Stephan; Chen, Tsute; Messner, Paul; Dewhirst, Floyd E; Schäffer, Christina

    2015-06-11

    Tannerella forsythia is an oral pathogen implicated in the development of periodontitis. Here, we report the draft genome sequence of the Tannerella forsythia strain ATCC 43037. The previously available genome of this designation (NCBI reference sequence NC_016610.1) was discovered to be derived from a different strain, FDC 92A2 (= ATCC BAA-2717).

  14. Draft Genome Sequence of Tannerella forsythia Type Strain ATCC 43037

    OpenAIRE

    Friedrich, Valentin; Pabinger, Stephan; Chen, Tsute; Messner, Paul; Dewhirst, Floyd E.; Schäffer, Christina

    2015-01-01

    Tannerella forsythia is an oral pathogen implicated in the development of periodontitis. Here, we report the draft genome sequence of the Tannerella forsythia strain ATCC 43037. The previously available genome of this designation (NCBI reference sequence NC_016610.1) was discovered to be derived from a different strain, FDC 92A2 (= ATCC BAA-2717).

  15. Nearly Complete Genome Sequence of Lactobacillus plantarum Strain NIZO2877

    NARCIS (Netherlands)

    Martino, M.E.; Bayjanov, J.R.; Joncour, P.; Hughes, S.; Gillet, B.; Kleerebezem, M; Siezen, R.; Hijum, S.A.F.T. van; Leulier, F.

    2015-01-01

    Lactobacillus plantarum is a versatile bacterial species that is isolated mostly from foods. Here, we present the first genome sequence of L. plantarum strain NIZO2877 isolated from a hot dog in Vietnam. Its two contigs represent a nearly complete genome sequence.

  16. Complete Genome Sequence of Lactobacillus plantarum CGMCC 8198

    Science.gov (United States)

    Dong, Qing-Qing; Hu, Hai-Jie; Wang, Qiu-Tong; Gu, Xiang-Chao; Zhou, Hao; Zhou, Wen-Juan; Ni, Xiao-Meng

    2017-01-01

    ABSTRACT We report the complete genome sequence of Lactobacillus plantarum CGMCC 8198, a novel probiotic strain isolated from fermented herbage. We have determined the complete genome sequence of strain L. plantarum CGMCC 8198, which consists of genes that are likely to be involved in dairy fermentation and that have probiotic qualities. PMID:28183756

  17. Sequencing of a Cultivated Diploid CottonGenome-Gossypium arboreum

    Institute of Scientific and Technical Information of China (English)

    WILKINS Thea A

    2008-01-01

    @@ Sequencing the genomes of crop species and model systems contributes significantly to our under-standing of the organization,structure and function of plant genomes.In a "white paper" published in2007,the cotton community set forth a strategic plan for sequencing the AD genome of cultivated up-land cotton that initially targets less complex diploid genomes.This strategy banks on the high degreeof conservation between diploid progenitors and AD species that will allow information derived fromdiploid genomes to be directly applied to the tetraploids.

  18. Shared Y chromosome repetitive DNA sequences in stallion and donkey as visualized using whole-genomic comparative hybridization

    Directory of Open Access Journals (Sweden)

    R. Mezzanotte

    2010-01-01

    Full Text Available The genome of stallion (Spanish breed and donkey (Spanish endemic Zamorano-Leonés were compared using whole comparative genomic in situ hybridization (W-CGH technique, with special reference to the variability observed in the Y chromosome. Results show that these diverging genomes still share some highly repetitive DNA families localized in pericentromeric regions and, in the particular case of the Y chromosome, a sub-family of highly repeated DNA sequences, greatly expanded in the donkey genome, accounts for a large part of the chromatin in the stallion Y chromosome.

  19. Shared Y chromosome repetitive DNA sequences in stallion and donkey as visualized using whole-genomic comparative hybridization

    Directory of Open Access Journals (Sweden)

    J. Gosalvez

    2010-01-01

    Full Text Available The genome of stallion (Spanish breed and donkey (Spanish endemic Zamorano-Leonés were compared using whole comparative genomic in situ hybridization (W-CGH technique, with special reference to the variability observed in the Y chromosome. Results show that these diverging genomes still share some highly repetitive DNA families localized in pericentromeric regions and, in the particular case of the Y chromosome, a sub-family of highly repeated DNA sequences, greatly expanded in the donkey genome, accounts for a large part of the chromatin in the stallion Y chromosome.

  20. Next-generation sequencing reveals phylogeographic structure and a species tree for recent bird divergences

    DEFF Research Database (Denmark)

    McCormack, John E.; Maley, James M.; Hird, Sarah M.

    2012-01-01

    Next generation sequencing (NGS) technologies are revolutionizing many biological disciplines but have been slow to take root in phylogeography. This is partly due to the difficulty of using NGS to sequence orthologous DNA fragments for many individuals at low cost. We explore cases of recent...... divergence in four phylogenetically diverse avian systems using a method for quick and cost-effective generation of primary DNA sequence data using pyrosequencing. NGS data were processed using an analytical pipeline that reduces many reads into two called alleles per locus per individual. Using single...... nucleotide polymorphisms (SNPs) mined from the loci, we detected population differentiation in each of the four bird systems, including: a case of ecological speciation in rails (Rallus); a rapid postglacial radiation in the genus Junco; recent in situ speciation among hummingbirds (Trochilus) in Jamaica...

  1. Comparison of 61 Sequenced Escherichia coli Genomes

    DEFF Research Database (Denmark)

    Lukjancenko, Oksana; Wassenaar, T. M.; Ussery, David

    2010-01-01

    MLST was performed, many of the various strains appear jumbled and less well resolved. The predicted pan-genome comprises 15,741 gene families, and only 993 (6%) of the families are represented in every genome, comprising the core genome. The variable or 'accessory' genes thus make up more than 90......% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group...

  2. Haplotype-resolved genome sequencing of a Gujarati Indian individual.

    Science.gov (United States)

    Kitzman, Jacob O; Mackenzie, Alexandra P; Adey, Andrew; Hiatt, Joseph B; Patwardhan, Rupali P; Sudmant, Peter H; Ng, Sarah B; Alkan, Can; Qiu, Ruolan; Eichler, Evan E; Shendure, Jay

    2011-01-01

    Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. Although individual human genome sequencing is increasingly routine, nearly all such genomes are unresolved with respect to haplotype. Here we combine the throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning to experimentally determine the haplotype-resolved genome of a South Asian individual. A single fosmid library was split into a modest number of pools, each providing ∼3% physical coverage of the diploid genome. Sequencing of each pool yielded reads overwhelmingly derived from only one homologous chromosome at any given location. These data were combined with whole-genome shotgun sequence to directly phase 94% of ascertained heterozygous single nucleotide polymorphisms (SNPs) into long haplotype blocks (N50 of 386 kilobases (kbp)). This method also facilitates the analysis of structural variation, for example, to anchor novel insertions to specific locations and haplotypes.

  3. Unexpected cross-species contamination in genome sequencing projects

    Directory of Open Access Journals (Sweden)

    Samier Merchant

    2014-11-01

    Full Text Available The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes.

  4. Discovery of a divergent HPIV4 from respiratory secretions using second and third generation metagenomic sequencing

    DEFF Research Database (Denmark)

    Alquezar Planas, David Eugenio; Mourier, Tobias; Bruhn, Christian Anders Wathne

    2013-01-01

    Molecular detection of viruses has been aided by high-throughput sequencing, permitting the genomic characterization of emerging strains. In this study, we comprehensively screened 500 respiratory secretions from children with upper and/or lower respiratory tract infections for viral pathogens. T...

  5. Draft sequences of the radish (Raphanus sativus L.) genome.

    Science.gov (United States)

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-10-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥ 300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified.

  6. Genome sequence of mungbean and insights into evolution within Vigna species

    Science.gov (United States)

    Kang, Yang Jae; Kim, Sue K.; Kim, Moon Young; Lestari, Puji; Kim, Kil Hyun; Ha, Bo-Keun; Jun, Tae Hwan; Hwang, Won Joo; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Yoon, Min Young; Jang, Young Eun; Han, Kwang Soo; Taeprayoon, Puntaree; Yoon, Na; Somta, Prakit; Tanya, Patcharin; Kim, Kwang Soo; Gwag, Jae-Gyun; Moon, Jung-Kyung; Lee, Yeong-Ho; Park, Beom-Seok; Bombarely, Aureliano; Doyle, Jeffrey J.; Jackson, Scott A.; Schafleitner, Roland; Srinives, Peerasak; Varshney, Rajeev K.; Lee, Suk-Ha

    2014-01-01

    Mungbean (Vigna radiata) is a fast-growing, warm-season legume crop that is primarily cultivated in developing countries of Asia. Here we construct a draft genome sequence of mungbean to facilitate genome research into the subgenus Ceratotropis, which includes several important dietary legumes in Asia, and to enable a better understanding of the evolution of leguminous species. Based on the de novo assembly of additional wild mungbean species, the divergence of what was eventually domesticated and the sampled wild mungbean species appears to have predated domestication. Moreover, the de novo assembly of a tetraploid Vigna species (V. reflexo-pilosa var. glabra) provides genomic evidence of a recent allopolyploid event. The species tree is constructed using de novo RNA-seq assemblies of 22 accessions of 18 Vigna species and protein sets of Glycine max. The present assembly of V. radiata var. radiata will facilitate genome research and accelerate molecular breeding of the subgenus Ceratotropis. PMID:25384727

  7. Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus

    Directory of Open Access Journals (Sweden)

    Boore Jeffrey L

    2007-06-01

    Full Text Available Abstract Background The number of completely sequenced plastid genomes available is growing rapidly. This array of sequences presents new opportunities to perform comparative analyses. In comparative studies, it is often useful to compare across wide phylogenetic spans and, within angiosperms, to include representatives from basally diverging lineages such as the genomes reported here: Nuphar advena (from a basal-most lineage and Ranunculus macranthus (a basal eudicot. We report these two new plastid genome sequences and make comparisons (within angiosperms, seed plants, or all photosynthetic lineages to evaluate features such as the status of ycf15 and ycf68 as protein coding genes, the distribution of simple sequence repeats (SSRs and longer dispersed repeats (SDR, and patterns of nucleotide composition. Results The Nuphar [GenBank:NC_008788] and Ranunculus [GenBank:NC_008796] plastid genomes share characteristics of gene content and organization with many other chloroplast genomes. Like other plastid genomes, these genomes are A+T-rich, except for rRNA and tRNA genes. Detailed comparisons of Nuphar with Nymphaea, another Nymphaeaceae, show that more than two-thirds of these genomes exhibit at least 95% sequence identity and that most SSRs are shared. In broader comparisons, SSRs vary among genomes in terms of abundance and length and most contain repeat motifs based on A and T nucleotides. Conclusion SSR and SDR abundance varies by genome and, for SSRs, is proportional to genome size. Long SDRs are rare in the genomes assessed. SSRs occur less frequently than predicted and, although the majority of the repeat motifs do include A and T nucleotides, the A+T bias in SSRs is less than that predicted from the underlying genomic nucleotide composition. In codon usage third positions show an A+T bias, however variation in codon usage does not correlate with differences in A+T-richness. Thus, although plastome nucleotide composition shows "A

  8. Comparative Genomics of the Extreme Acidophile Acidithiobacillus thiooxidans Reveals Intraspecific Divergence and Niche Adaptation

    Directory of Open Access Journals (Sweden)

    Xian Zhang

    2016-08-01

    Full Text Available Acidithiobacillus thiooxidans known for its ubiquity in diverse acidic and sulfur-bearing environments worldwide was used as the research subject in this study. To explore the genomic fluidity and intraspecific diversity of Acidithiobacillus thiooxidans (A. thiooxidans species, comparative genomics based on nine draft genomes was performed. Phylogenomic scrutiny provided first insights into the multiple groupings of these strains, suggesting that genetic diversity might be potentially correlated with their geographic distribution as well as geochemical conditions. While these strains shared a large number of common genes, they displayed differences in gene content. Functional assignment indicated that the core genome was essential for microbial basic activities such as energy acquisition and uptake of nutrients, whereas the accessory genome was thought to be involved in niche adaptation. Comprehensive analysis of their predicted central metabolism revealed that few differences were observed among these strains. Further analyses showed evidences of relevance between environmental conditions and genomic diversification. Furthermore, a diverse pool of mobile genetic elements including insertion sequences and genomic islands in all A. thiooxidans strains probably demonstrated the frequent genetic flow (such as lateral gene transfer in the extremely acidic environments. From another perspective, these elements might endow A. thiooxidans species with capacities to withstand the chemical constraints of their natural habitats. Taken together, our findings bring some valuable data to better understand the genomic diversity and econiche adaptation within A. thiooxidans strains.

  9. Comparative Genomics of the Extreme Acidophile Acidithiobacillus thiooxidans Reveals Intraspecific Divergence and Niche Adaptation.

    Science.gov (United States)

    Zhang, Xian; Feng, Xue; Tao, Jiemeng; Ma, Liyuan; Xiao, Yunhua; Liang, Yili; Liu, Xueduan; Yin, Huaqun

    2016-08-19

    Acidithiobacillus thiooxidans known for its ubiquity in diverse acidic and sulfur-bearing environments worldwide was used as the research subject in this study. To explore the genomic fluidity and intraspecific diversity of Acidithiobacillus thiooxidans (A. thiooxidans) species, comparative genomics based on nine draft genomes was performed. Phylogenomic scrutiny provided first insights into the multiple groupings of these strains, suggesting that genetic diversity might be potentially correlated with their geographic distribution as well as geochemical conditions. While these strains shared a large number of common genes, they displayed differences in gene content. Functional assignment indicated that the core genome was essential for microbial basic activities such as energy acquisition and uptake of nutrients, whereas the accessory genome was thought to be involved in niche adaptation. Comprehensive analysis of their predicted central metabolism revealed that few differences were observed among these strains. Further analyses showed evidences of relevance between environmental conditions and genomic diversification. Furthermore, a diverse pool of mobile genetic elements including insertion sequences and genomic islands in all A. thiooxidans strains probably demonstrated the frequent genetic flow (such as lateral gene transfer) in the extremely acidic environments. From another perspective, these elements might endow A. thiooxidans species with capacities to withstand the chemical constraints of their natural habitats. Taken together, our findings bring some valuable data to better understand the genomic diversity and econiche adaptation within A. thiooxidans strains.

  10. Ancient Human Genome Sequence of an Extinct Palaeo-Eskimo

    DEFF Research Database (Denmark)

    Rasmussen, Morten; Li, Yingrui; Lindgreen, Stinus;

    2010-01-01

    We report here the genome sequence of an ancient human. Obtained from approximately 4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20x, we recover 79% of the diploid genome, an...... for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit....

  11. Genomic Dissection and Expression Profiling Revealed Functional Divergence in Triticum aestivum Leucine Rich Repeat Receptor Like Kinases (TaLRRKs)

    Science.gov (United States)

    Shumayla; Sharma, Shailesh; Kumar, Rohit; Mendu, Venugopal; Singh, Kashmir; Upadhyay, Santosh K.

    2016-01-01

    The leucine rich repeat receptor like kinases (LRRK) constitute the largest subfamily of receptor like kinases (RLK), which play critical roles in plant development and stress responses. Herein, we identified 531 TaLRRK genes in Triticum aestivum (bread wheat), which were distributed throughout the A, B, and D sub-genomes and chromosomes. These were clustered into 233 homologous groups, which were mostly located on either homeologous chromosomes from various sub-genomes or in proximity on the same chromosome. A total of 255 paralogous genes were predicted which depicted the role of duplication events in expansion of this gene family. Majority of TaLRRKs consisted of trans-membrane region and localized on plasma-membrane. The TaLRRKs were further categorized into eight phylogenetic groups with numerous subgroups on the basis of sequence homology. The gene and protein structure in terms of exon/intron ratio, domains, and motifs organization were found to be variably conserved across the different phylogenetic groups/subgroups, which indicated a potential divergence and neofunctionalization during evolution. High-throughput transcriptome data and quantitative real time PCR analyses in various developmental stages, and biotic and abiotic (heat, drought, and salt) stresses provided insight into modus operandi of TaLRRKs during these conditions. Distinct expression of majority of stress responsive TaLRRKs homologous genes suggested their specified role in a particular condition. These results provided a comprehensive analysis of various characteristic features including functional divergence, which may provide the way for future functional characterization of this important gene family in bread wheat.

  12. Genomic Dissection and Expression Profiling Revealed Functional Divergence in Triticum aestivum Leucine Rich Repeat Receptor Like Kinases (TaLRRKs).

    Science.gov (United States)

    Shumayla; Sharma, Shailesh; Kumar, Rohit; Mendu, Venugopal; Singh, Kashmir; Upadhyay, Santosh K

    2016-01-01

    The leucine rich repeat receptor like kinases (LRRK) constitute the largest subfamily of receptor like kinases (RLK), which play critical roles in plant development and stress responses. Herein, we identified 531 TaLRRK genes in Triticum aestivum (bread wheat), which were distributed throughout the A, B, and D sub-genomes and chromosomes. These were clustered into 233 homologous groups, which were mostly located on either homeologous chromosomes from various sub-genomes or in proximity on the same chromosome. A total of 255 paralogous genes were predicted which depicted the role of duplication events in expansion of this gene family. Majority of TaLRRKs consisted of trans-membrane region and localized on plasma-membrane. The TaLRRKs were further categorized into eight phylogenetic groups with numerous subgroups on the basis of sequence homology. The gene and protein structure in terms of exon/intron ratio, domains, and motifs organization were found to be variably conserved across the different phylogenetic groups/subgroups, which indicated a potential divergence and neofunctionalization during evolution. High-throughput transcriptome data and quantitative real time PCR analyses in various developmental stages, and biotic and abiotic (heat, drought, and salt) stresses provided insight into modus operandi of TaLRRKs during these conditions. Distinct expression of majority of stress responsive TaLRRKs homologous genes suggested their specified role in a particular condition. These results provided a comprehensive analysis of various characteristic features including functional divergence, which may provide the way for future functional characterization of this important gene family in bread wheat.

  13. Genomic dissection and expression profiling revealed functional divergence in Triticum aestivum leucine rich repeat receptor like kinases (TaLRRKs

    Directory of Open Access Journals (Sweden)

    Shumayla .

    2016-09-01

    Full Text Available The leucine rich repeat receptor like kinases (LRRK constitute the largest subfamily of receptor like kinases (RLK, which play critical roles in plant development and stress responses. Herein, we identified 531 TaLRRK genes in Triticum aestivum (bread wheat, which were distributed throughout the A, B, and D sub-genomes and chromosomes. These were clustered into 233 homologous groups, which were mostly located on either homeologous chromosomes from various sub-genomes or in proximity on the same chromosome. A total of 255 paralogous genes were predicted which depicted the role of duplication events in expansion of this gene family. Majority of TaLRRKs consisted of trans-membrane region and localized on plasma-membrane. The TaLRRKs were further categorized into eight phylogenetic groups with numerous subgroups on the basis of sequence homology. The gene and protein structure in terms of exon/intron ratio, domains and motifs organization were found to be variably conserved across the different phylogenetic groups/subgroups, which indicated a potential divergence and neofunctionalization during evolution. High-throughput transcriptome data and quantitative real time PCR analyses in various developmental stages, and biotic and abiotic (heat, drought and salt stresses provided insight into modus operandi of TaLRRKs during these conditions. Distinct expression of majority of stress responsive TaLRRKs homologous genes suggested their specified role in a particular condition. These results provided a comprehensive analysis of various characteristic features including functional divergence, which may provide the way for future functional characterization of this important gene family in bread wheat.

  14. The genome sequence of Schizosaccharomyces pombe.

    Science.gov (United States)

    Wood, V; Gwilliam, R; Rajandream, M-A; Lyne, M; Lyne, R; Stewart, A; Sgouros, J; Peat, N; Hayles, J; Baker, S; Basham, D; Bowman, S; Brooks, K; Brown, D; Brown, S; Chillingworth, T; Churcher, C; Collins, M; Connor, R; Cronin, A; Davis, P; Feltwell, T; Fraser, A; Gentles, S; Goble, A; Hamlin, N; Harris, D; Hidalgo, J; Hodgson, G; Holroyd, S; Hornsby, T; Howarth, S; Huckle, E J; Hunt, S; Jagels, K; James, K; Jones, L; Jones, M; Leather, S; McDonald, S; McLean, J; Mooney, P; Moule, S; Mungall, K; Murphy, L; Niblett, D; Odell, C; Oliver, K; O'Neil, S; Pearson, D; Quail, M A; Rabbinowitsch, E; Rutherford, K; Rutter, S; Saunders, D; Seeger, K; Sharp, S; Skelton, J; Simmonds, M; Squares, R; Squares, S; Stevens, K; Taylor, K; Taylor, R G; Tivey, A; Walsh, S; Warren, T; Whitehead, S; Woodward, J; Volckaert, G; Aert, R; Robben, J; Grymonprez, B; Weltjens, I; Vanstreels, E; Rieger, M; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Düsterhöft, A; Fritzc, C; Holzer, E; Moestl, D; Hilbert, H; Borzym, K; Langer, I; Beck, A; Lehrach, H; Reinhardt, R; Pohl, T M; Eger, P; Zimmermann, W; Wedler, H; Wambutt, R; Purnelle, B; Goffeau, A; Cadieu, E; Dréano, S; Gloux, S; Lelaure, V; Mottier, S; Galibert, F; Aves, S J; Xiang, Z; Hunt, C; Moore, K; Hurst, S M; Lucas, M; Rochet, M; Gaillardin, C; Tallada, V A; Garzon, A; Thode, G; Daga, R R; Cruzado, L; Jimenez, J; Sánchez, M; del Rey, F; Benito, J; Domínguez, A; Revuelta, J L; Moreno, S; Armstrong, J; Forsburg, S L; Cerutti, L; Lowe, T; McCombie, W R; Paulsen, I; Potashkin, J; Shpakovski, G V; Ussery, D; Barrell, B G; Nurse, P; Cerrutti, L

    2002-02-21

    We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.

  15. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  16. Scrutinizing virus genome termini by high-throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Shasha Li

    Full Text Available Analysis of genomic terminal sequences has been a major step in studies on viral DNA replication and packaging mechanisms. However, traditional methods to study genome termini are challenging due to the time-consuming protocols and their inefficiency where critical details are lost easily. Recent advances in next generation sequencing (NGS have enabled it to be a powerful tool to study genome termini. In this study, using NGS we sequenced one iridovirus genome and twenty phage genomes and confirmed for the first time that the high frequency sequences (HFSs found in the NGS reads are indeed the terminal sequences of viral genomes. Further, we established a criterion to distinguish the type of termini and the viral packaging mode. We also obtained additional terminal details such as terminal repeats, multi-termini, asymmetric termini. With this approach, we were able to simultaneously detect details of the genome termini as well as obtain the complete sequence of bacteriophage genomes. Theoretically, this application can be further extended to analyze larger and more complicated genomes of plant and animal viruses. This study proposed a novel and efficient method for research on viral replication, packaging, terminase activity, transcription regulation, and metabolism of the host cell.

  17. Genome Sequence of Vaccinia virus Strain Lister-Butantan, a Lister Vaccine Variant Used during a Smallpox Eradication Campaign in Brazil

    Science.gov (United States)

    Assis, Felipe; Trindade, Giliane; Drumond, Betânia; Frace, Mike; Sammons, Scott; Emerson, Ginny; Li, Yu; Carroll, Darin; Batra, Dhwani; Kroon, Erna

    2016-01-01

    Here, we report the 187.8-kb genome sequence of Vaccinia virus Lister-Butantan, which was used in Brazil during the WHO smallpox eradication campaign. Its genome showed an average similarity of 98.18% with the original Lister isolate, highlighting the low divergence among related Vaccinia virus vaccine strains, even after several passages in animals and cell culture. PMID:27340056

  18. Generation of Physical Map Contig-Specific Sequences Useful for Whole Genome Sequence Scaffolding

    Science.gov (United States)

    Jiang, Yanliang; Ninwichian, Parichart; Liu, Shikai; Zhang, Jiaren; Kucuktas, Huseyin; Sun, Fanyue; Kaltenboeck, Ludmilla; Sun, Luyang; Bao, Lisui; Liu, Zhanjiang

    2013-01-01

    Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge. PMID:24205335

  19. Generation of physical map contig-specific sequences useful for whole genome sequence scaffolding.

    Directory of Open Access Journals (Sweden)

    Yanliang Jiang

    Full Text Available Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.

  20. The minimum information about a genome sequence (MIGS) specification

    DEFF Research Database (Denmark)

    Field, D; Garrity, G; Gray, T;

    2008-01-01

    the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources...... that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases....

  1. The mitochondrial genome of a Texas outbreak strain of the cattle tick, Rhipicephalus (Boophilus) microplus, derived from whole genome sequencing Pacific Biosciences and Illumina reads.

    Science.gov (United States)

    McCooke, John K; Guerrero, Felix D; Barrero, Roberto A; Black, Michael; Hunter, Adam; Bell, Callum; Schilkey, Faye; Miller, Robert J; Bellgard, Matthew I

    2015-10-15

    The cattle fever tick, Rhipicephalus (Boophilus) microplus is one of the most significant medical veterinary pests in the world, vectoring several serious livestock diseases negatively impacting agricultural economies of tropical and subtropical countries around the world. In our study, we assembled the complete R. microplus mitochondrial genome from Illumina and Pac Bio sequencing reads obtained from the ongoing R. microplus (Deutsch strain from Texas, USA) genome sequencing project. We compared the Deutsch strain mitogenome to the mitogenome from a Brazilian R. microplus and from an Australian cattle tick that has recently been taxonomically designated as Rhipicephalus australis after previously being considered R. microplus. The sequence divergence of the Texas and Australia ticks is much higher than the divergence between the Texas and Brazil ticks. This is consistent with the idea that the Australian ticks are distinct from the R. microplus of the Americas.

  2. Genomic libraries: II. Subcloning, sequencing, and assembling large-insert genomic DNA clones.

    Science.gov (United States)

    Quail, Mike A; Matthews, Lucy; Sims, Sarah; Lloyd, Christine; Beasley, Helen; Baxter, Simon W

    2011-01-01

    Sequencing large insert clones to completion is useful for characterizing specific genomic regions, identifying haplotypes, and closing gaps in whole genome sequencing projects. Despite being a standard technique in molecular laboratories, DNA sequencing using the Sanger method can be highly problematic when complex secondary structures or sequence repeats are encountered in genomic clones. Here, we describe methods to isolate DNA from a large insert clone (fosmid or BAC), subclone the sample, and sequence the region to the highest industry standard. Troubleshooting solutions for sequencing difficult templates are discussed.

  3. Genome-wide identification and evolution of ATP-binding cassette transporters in the ciliate Tetrahymena thermophila: A case of functional divergence in a multigene family

    Directory of Open Access Journals (Sweden)

    Yuan Dongxia

    2010-10-01

    Full Text Available Abstract Background In eukaryotes, ABC transporters that utilize the energy of ATP hydrolysis to expel cellular substrates into the environment are responsible for most of the efflux from cells. Many members of the superfamily of ABC transporters have been linked with resistance to multiple drugs or toxins. Owing to their medical and toxicological importance, members of the ABC superfamily have been studied in several model organisms and warrant examination in newly sequenced genomes. Results A total of 165 ABC transporter genes, constituting a highly expanded superfamily relative to its size in other eukaryotes, were identified in the macronuclear genome of the ciliate Tetrahymena thermophila. Based on ortholog comparisons, phylogenetic topologies and intron characterizations, each highly expanded ABC transporter family of T. thermophila was classified into several distinct groups, and hypotheses about their evolutionary relationships are presented. A comprehensive microarray analysis revealed divergent expression patterns among the members of the ABC transporter superfamily during different states of physiology and development. Many of the relatively recently formed duplicate pairs within individual ABC transporter families exhibit significantly different expression patterns. Further analysis showed that multiple mechanisms have led to functional divergence that is responsible for the preservation of duplicated genes. Conclusion Gene duplications have resulted in an extensive expansion of the superfamily of ABC transporters in the Tetrahymena genome, making it the largest example of its kind reported in any organism to date. Multiple independent duplications and subsequent divergence contributed to the formation of different families of ABC transporter genes. Many of the members within a gene family exhibit different expression patterns. The combination of gene duplication followed by both sequence divergence and acquisition of new patterns of

  4. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    Energy Technology Data Exchange (ETDEWEB)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  5. Human-specific protein isoforms produced by novel splice sites in the human genome after the human-chimpanzee divergence

    Directory of Open Access Journals (Sweden)

    Kim Dong Seon

    2012-11-01

    Full Text Available Abstract Background Evolution of splice sites is a well-known phenomenon that results in transcript diversity during human evolution. Many novel splice sites are derived from repetitive elements and may not contribute to protein products. Here, we analyzed annotated human protein-coding exons and identified human-specific splice sites that arose after the human-chimpanzee divergence. Results We analyzed multiple alignments of the annotated human protein-coding exons and their respective orthologous mammalian genome sequences to identify 85 novel splice sites (50 splice acceptors and 35 donors in the human genome. The novel protein-coding exons, which are expressed either constitutively or alternatively, produce novel protein isoforms by insertion, deletion, or frameshift. We found three cases in which the human-specific isoform conferred novel molecular function in the human cells: the human-specific IMUP protein isoform induces apoptosis of the trophoblast and is implicated in pre-eclampsia; the intronization of a part of SMOX gene exon produces inactive spermine oxidase; the human-specific NUB1 isoform shows reduced interaction with ubiquitin-like proteins, possibly affecting ubiquitin pathways. Conclusions Although the generation of novel protein isoforms does not equate to adaptive evolution, we propose that these cases are useful candidates for a molecular functional study to identify proteomic changes that might bring about novel phenotypes during human evolution.

  6. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    Science.gov (United States)

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  7. A Probabilistic Genome-Wide Gene Reading Frame Sequence Model

    DEFF Research Database (Denmark)

    Have, Christian Theil; Mørk, Søren

    We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...

  8. Marsupial Genome Sequences: Providing Insight into Evolution and Disease

    Directory of Open Access Journals (Sweden)

    Janine E. Deakin

    2012-01-01

    Full Text Available Marsupials (metatherians, with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil, with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences.

  9. Insight into the evolution and origin of leprosy bacilli from the genome sequence of Mycobacterium lepromatosis

    Science.gov (United States)

    Singh, Pushpendra; Benjak, Andrej; Schuenemann, Verena J.; Herbig, Alexander; Avanzi, Charlotte; Busso, Philippe; Nieselt, Kay; Krause, Johannes; Vera-Cabrera, Lucio; Cole, Stewart T.

    2015-01-01

    Mycobacterium lepromatosis is an uncultured human pathogen associated with diffuse lepromatous leprosy and a reactional state known as Lucio's phenomenon. By using deep sequencing with and without DNA enrichment, we obtained the near-complete genome sequence of M. lepromatosis present in a skin biopsy from a Mexican patient, and compared it with that of Mycobacterium leprae, which has undergone extensive reductive evolution. The genomes display extensive synteny and are similar in size (∼3.27 Mb). Protein-coding genes share 93% nucleotide sequence identity, whereas pseudogenes are only 82% identical. The events that led to pseudogenization of 50% of the genome likely occurred before divergence from their most recent common ancestor (MRCA), and both M. lepromatosis and M. leprae have since accumulated new pseudogenes or acquired specific deletions. Functional comparisons suggest that M. lepromatosis has lost several enzymes required for amino acid synthesis whereas M. leprae has a defective heme pathway. M. lepromatosis has retained all functions required to infect the Schwann cells of the peripheral nervous system and therefore may also be neuropathogenic. A phylogeographic survey of 227 leprosy biopsies by differential PCR revealed that 221 contained M. leprae whereas only six, all from Mexico, harbored M. lepromatosis. Phylogenetic comparisons indicate that M. lepromatosis is closer than M. leprae to the MRCA, and a Bayesian dating analysis suggests that they diverged from their MRCA approximately 13.9 Mya. Thus, despite their ancient separation, the two leprosy bacilli are remarkably conserved and still cause similar pathologic conditions. PMID:25831531

  10. Genomic sequencing reveals historical, demographic and selective factors associated with the diversification of the fire-associated fungus Neurospora discreta.

    Science.gov (United States)

    Gladieux, Pierre; Wilson, Benjamin A; Perraudeau, Fanny; Montoya, Liliam A; Kowbel, David; Hann-Soden, Christopher; Fischer, Monika; Sylvain, Iman; Jacobson, David J; Taylor, John W

    2015-11-01

    Delineating microbial populations, discovering ecologically relevant phenotypes and identifying migrants, hybrids or admixed individuals have long proved notoriously difficult, thereby limiting our understanding of the evolutionary forces at play during the diversification of microbial species. However, recent advances in sequencing and computational methods have enabled an unbiased approach whereby incipient species and the genetic correlates of speciation can be identified by examining patterns of genomic variation within and between lineages. We present here a population genomic study of a phylogenetic species in the Neurospora discreta species complex, based on the resequencing of full genomes (~37 Mb) for 52 fungal isolates from nine sites in three continents. Population structure analyses revealed two distinct lineages in South-East Asia, and three lineages in North America/Europe with a broad longitudinal and latitudinal range and limited admixture between lineages. Genome scans for selective sweeps and comparisons of the genomic landscapes of diversity and recombination provided no support for a role of selection at linked sites on genomic heterogeneity in levels of divergence between lineages. However, demographic inference indicated that the observed genomic heterogeneity in divergence was generated by varying rates of gene flow between lineages following a period of isolation. Many putative cases of exchange of genetic material between phylogenetically divergent fungal lineages have been discovered, and our work highlights the quantitative importance of genetic exchanges between more closely related taxa to the evolution of fungal genomes. Our study also supports the role of allopatric isolation as a driver of diversification in saprobic microbes.

  11. Conservation and divergence of plant LHP1 protein sequences and expression patterns in angiosperms and gymnosperms.

    Science.gov (United States)

    Guan, Hexin; Zheng, Zhengui; Grey, Paris H; Li, Yuhua; Oppenheimer, David G

    2011-05-01

    Floral transition is a critical and strictly regulated developmental process in plants. Mutations in Arabidopsis LIKE HETEROCHROMATIN PROTEIN 1 (AtLHP1)/TERMINAL FLOWER 2 (TFL2) result in early and terminal flowers. Little is known about the gene expression, function and evolution of plant LHP1 homologs, except for Arabidopsis LHP1. In this study, the conservation and divergence of plant LHP1 protein sequences was analyzed by sequence alignments and phylogeny. LHP1 expression patterns were compared among taxa that occupy pivotal phylogenetic positions. Several relatively conserved new motifs/regions were identified among LHP1 homologs. Phylogeny of plant LHP1 proteins agreed with established angiosperm relationships. In situ hybridization unveiled conserved expression of plant LHP1 in the axillary bud/tiller, vascular bundles, developing stamens, and carpels. Unlike AtLHP1, cucumber CsLHP1-2, sugarcane SoLHP1 and maize ZmLHP1, rice OsLHP1 is not expressed in the shoot apical meristem (SAM) and the OsLHP1 transcript level is consistently low in shoots. "Unequal crossover" might have contributed to the divergence in the N-terminal and hinge region lengths of LHP1 homologs. We propose an "insertion-deletion" model for soybean (Glycine max L.) GmLHP1s evolution. Plant LHP1 homologs are more conserved than previously expected, and may favor vegetative meristem identity and primordia formation. OsLHP1 may not function in rice SAM during floral induction.

  12. Sequencing of chloroplast genomes from wheat, barley, rye and their relatives provides a detailed insight into the evolution of the Triticeae tribe.

    Directory of Open Access Journals (Sweden)

    Christopher P Middleton

    Full Text Available Using Roche/454 technology, we sequenced the chloroplast genomes of 12 Triticeae species, including bread wheat, barley and rye, as well as the diploid progenitors and relatives of bread wheat Triticum urartu, Aegilops speltoides and Ae. tauschii. Two wild tetraploid taxa, Ae. cylindrica and Ae. geniculata, were also included. Additionally, we incorporated wild Einkorn wheat Triticum boeoticum and its domesticated form T. monococcum and two Hordeum spontaneum (wild barley genotypes. Chloroplast genomes were used for overall sequence comparison, phylogenetic analysis and dating of divergence times. We estimate that barley diverged from rye and wheat approximately 8-9 million years ago (MYA. The genome donors of hexaploid wheat diverged between 2.1-2.9 MYA, while rye diverged from Triticum aestivum approximately 3-4 MYA, more recently than previously estimated. Interestingly, the A genome taxa T. boeoticum and T. urartu were estimated to have diverged approximately 570,000 years ago. As these two have a reproductive barrier, the divergence time estimate also provides an upper limit for the time required for the formation of a species boundary between the two. Furthermore, we conclusively show that the chloroplast genome of hexaploid wheat was contributed by the B genome donor and that this unknown species diverged from Ae. speltoides about 980,000 years ago. Additionally, sequence alignments identified a translocation of a chloroplast segment to the nuclear genome which is specific to the rye/wheat lineage. We propose the presented phylogeny and divergence time estimates as a reference framework for future studies on Triticeae.

  13. Complete genome sequence of Klebsiella pneumoniae phage JD001.

    Science.gov (United States)

    Cui, Zelin; Shen, Wenbin; Wang, Zheng; Zhang, Haotian; Me, Rao; Wang, Yanchun; Zeng, Lingbin; Zhu, Yongzhang; Qin, Jinhong; He, Ping; Guo, Xiaokui

    2012-12-01

    Klebsiella pneumoniae is a member of the family Enterobacteriaceae, opportunistic pathogens that are among the eight most prevalent infectious agents in hospitals. The emergence of multidrug-resistant strains of K. pneumoniae has became a public health problem globally. To develop an effective antimicrobial agent, we isolated a bacteriophage, named JD001, from seawater and sequenced its genome. Comparative genome analysis of phage JD001 with other K. pneumoniae bacteriophages revealed that phage JD001 has little similarity to previously published K. pneumoniae phages KP15, KP32, KP34, and phiKO2. Here we announce the complete genome sequence of JD001 and report major findings from the genomic analysis.

  14. Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)

    Energy Technology Data Exchange (ETDEWEB)

    Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Lang, Elke [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-01-01

    Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    Energy Technology Data Exchange (ETDEWEB)

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  16. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    Energy Technology Data Exchange (ETDEWEB)

    Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Jando, Marlen [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J C [U.S. Department of Energy, Joint Genome Institute; Brettin, Thomas S [ORNL; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  17. Phylogenetics and differentiation of Salmonella Newport lineages by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Guojie Cao

    Full Text Available Salmonella Newport has ranked in the top three Salmonella serotypes associated with foodborne outbreaks from 1995 to 2011 in the United States. In the current study, we selected 26 S. Newport strains isolated from diverse sources and geographic locations and then conducted 454 shotgun pyrosequencing procedures to obtain 16-24 × coverage of high quality draft genomes for each strain. Comparative genomic analysis of 28 S. Newport strains (including 2 reference genomes and 15 outgroup genomes identified more than 140,000 informative SNPs. A resulting phylogenetic tree consisted of four sublineages and indicated that S. Newport had a clear geographic structure. Strains from Asia were divergent from those from the Americas. Our findings demonstrated that analysis using whole genome sequencing data resulted in a more accurate picture of phylogeny compared to that using single genes or small sets of genes. We selected loci around the mutS gene of S. Newport to differentiate distinct lineages, including those between invH and mutS genes at the 3' end of Salmonella Pathogenicity Island 1 (SPI-1, ste fimbrial operon, and Clustered, Regularly Interspaced, Short Palindromic Repeats (CRISPR associated-proteins (cas. These genes in the outgroup genomes held high similarity with either S. Newport Lineage II or III at the same loci. S. Newport Lineages II and III have different evolutionary histories in this region and our data demonstrated genetic flow and homologous recombination events around mutS. The findings suggested that S. Newport Lineages II and III diverged early in the serotype evolution and have evolved largely independently. Moreover, we identified genes that could delineate sublineages within the phylogenetic tree and that could be used as potential biomarkers for trace-back investigations during outbreaks. Thus, whole genome sequencing data enabled us to better understand the genetic background of pathogenicity and evolutionary history of S

  18. Oxford Nanopore MinION Sequencing and Genome Assembly

    Institute of Scientific and Technical Information of China (English)

    Hengyun Lu; Francesca Giordano; Zemin Ning

    2016-01-01

    The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS) technology. The third-generation sequencing (TGS) technology, led by Pacific Biosciences (PacBio), is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that pro-mises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT). MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the geno-mics community. While de novo genome assemblies can be cheaply produced from SGS data, assem-bly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in gen-ome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.

  19. Molecular Characterization of Sec2 Loci in Wheat—Secale africanum Derivatives Demonstrates Genomic Divergence of Secale Species

    Directory of Open Access Journals (Sweden)

    Guangrong Li

    2015-04-01

    Full Text Available The unique 75 K γ-secalins encoded by Sec2 loci in Secale species is composed of almost half rye storage proteins. The chromosomal location of Sec2 loci in wild Secale species, Secale africanum, was carried out by the wheat—S. africanum derivatives, which were identified by genomic in situ hybridization and multi-color fluorescence in situ hybridization. The Sec2 gene-specific PCR analysis indicated that the S. cereale Sec2 was located on chromosome 2R, while the S. africanum Sec2 was localized on chromosome 6Rafr of S. africanum. A total of 38 Sec2 gene sequences were isolated from S. africanum, S. cereale and S. sylvestre by PCR-based cloning. Phylogenetic analysis showed that S. africanum Sec2 diverged from S. cereale Sec2 approximately 2–3 million years ago. The illegitimate recombination of chromosome 2R–6R involving the Sec2 loci region may accelerate sequence variation during evolutionary process from wild to cultivated Secale species.

  20. High-throughput sequencing of Astrammina rara: Sampling the giant genome of a giant foraminiferan protist

    Directory of Open Access Journals (Sweden)

    Reilly Andrew A

    2011-03-01

    Full Text Available Abstract Background Foraminiferan protists, which are significant players in most marine ecosystems, are also genetic innovators, harboring unique modifications to proteins that make up the basic eukaryotic cell machinery. Despite their ecological and evolutionary importance, foraminiferan genomes are poorly understood due to the extreme sequence divergence of many genes and the difficulty of obtaining pure samples: exogenous DNA from ingested food or ecto/endo symbionts often vastly exceed the amount of "native" DNA, and foraminiferans cannot be cultured axenically. Few foraminiferal genes have been sequenced from genomic material, although partial sequences of coding regions have been determined by EST studies and mass spectroscopy. The lack of genomic data has impeded evolutionary and cell-biology studies and has also hindered our ability to test ecological hypotheses using genetic tools. Results 454 sequence analysis was performed on a library derived from whole genome amplification of microdissected nuclei of the Antarctic foraminiferan Astrammina rara. Xenogenomic sequence, which was shown not to be of eukaryotic origin, represented only 12% of the sample. The first foraminiferal examples of important classes of genes, such as tRNA genes, are reported, and we present evidence that sequences of mitochondrial origin have been translocated to the nucleus. The recovery of a 3' UTR and downstream sequence from an actin gene suggests that foraminiferal mRNA processing may have some unusual features. Finally, the presence of a co-purified bacterial genome in the library also permitted the first calculation of the size of a foraminiferal genome by molecular methods, and statistical analysis of sequence from different genomic sources indicates that low-complexity tracts of the genome may be endoreplicated in some stages of the foraminiferal life cycle. Conclusions These data provide the first window into genomic organization and genetic control in

  1. Genome sequencing and annotation of Aeromonas sp. HZM

    Directory of Open Access Journals (Sweden)

    Patric Chua

    2015-09-01

    Full Text Available We report the draft genome sequence of Aeromonas sp. strain HZM, isolated from tropical peat swamp forest soil. The draft genome size is 4,451,364 bp with a G + C content of 61.7% and contains 10 rRNA sequences (eight copies of 5S rRNA genes, single copy of 16S and 23S rRNA each. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. JEMQ00000000.

  2. Complete Genome Sequence of Phytopathogenic Pectobacterium atrosepticum Bacteriophage Peat1.

    Science.gov (United States)

    Kalischuk, Melanie; Hachey, John; Kawchuk, Lawrence

    2015-08-13

    Pectobacterium atrosepticum is a common phytopathogen causing significant economic losses worldwide. To develop a biocontrol strategy for this blackleg pathogen of solanaceous plants, P. atrosepticum bacteriophage Peat1 was isolated and its genome completely sequenced. Interestingly, morphological and sequence analyses of the 45,633-bp genome revealed that phage Peat1 is a member of the family Podoviridae and most closely resembles the Klebsiella pneumoniae bacteriophage KP34. This is the first published complete genome sequence of a phytopathogenic P. atrosepticum bacteriophage, and details provide important information for the development of biocontrol by advancing our understanding of phage-phytopathogen interactions.

  3. Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

    Science.gov (United States)

    Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

    2017-07-01

    PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.

  4. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

    Science.gov (United States)

    van der Weide, Robin H; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.

  5. The complete chloroplast genome sequence of the relict woody plant Metasequoia glyptostroboides Hu et Cheng.

    Science.gov (United States)

    Chen, Jinhui; Hao, Zhaodong; Xu, Haibin; Yang, Liming; Liu, Guangxin; Sheng, Yu; Zheng, Chen; Zheng, Weiwei; Cheng, Tielong; Shi, Jisen

    2015-01-01

    Metasequoia glyptostroboides Hu et Cheng is the only species in the genus Metasequoia Miki ex Hu et Cheng, which belongs to the Cupressaceae family. There were around 10 species in the Metasequoia genus, which were widely spread across the Northern Hemisphere during the Cretaceous of the Mesozoic and in the Cenozoic. M. glyptostroboides is the only remaining representative of this genus. Here, we report the complete chloroplast (cp) genome sequence and the cp genomic features of M. glyptostroboides. The M. glyptostroboides cp genome is 131,887 bp in length, with a total of 117 genes comprised of 82 protein-coding genes, 31 tRNA genes and four rRNA genes. In this genome, 11 forward repeats, nine palindromic repeats, and 15 tandem repeats were detected. A total of 188 perfect microsatellites were detected through simple sequence repeat (SSR) analysis and these were distributed unevenly within the cp genome. Comparison of the cp genome structure and gene order to those of several other land plants indicated that a copy of the inverted repeat (IR) region, which was found to be IR region A (IRA), was lost in the M. glyptostroboides cp genome. The five most divergent and five most conserved genes were determined and further phylogenetic analysis was performed among plant species, especially for related species in conifers. Finally, phylogenetic analysis demonstrated that M. glyptostroboides is a sister species to Cryptomeria japonica (L. F.) D. Don and to Taiwania cryptomerioides Hayata. The complete cp genome sequence information of M. glyptostroboides will be great helpful for further investigations of this endemic relict woody plant and for in-depth understanding of the evolutionary history of the coniferous cp genomes, especially for the position of M. glyptostroboides in plant systematics and evolution.

  6. MIPS: a database for genomes and protein sequences.

    Science.gov (United States)

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  7. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    Science.gov (United States)

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination.

  8. Composition and organization of active centromere sequences in complex genomes

    Directory of Open Access Journals (Sweden)

    Hayden Karen E

    2012-07-01

    Full Text Available Abstract Background Centromeres are sites of chromosomal spindle attachment during mitosis and meiosis. While the sequence basis for centromere identity remains a subject of considerable debate, one approach is to examine the genomic organization at these active sites that are correlated with epigenetic marks of centromere function. Results We have developed an approach to characterize both satellite and non-satellite centromeric sequences that are missing from current assemblies in complex genomes, using the dog genome as an example. Combining this genomic reference with an epigenetic dataset corresponding to sequences associated with the histone H3 variant centromere protein A (CENP-A, we identify active satellite sequence domains that appear to be both functionally and spatially distinct within the overall definition of satellite families. Conclusions These findings establish a genomic and epigenetic foundation for exploring the functional role of centromeric sequences in the previously sequenced dog genome and provide a model for similar studies within the context of less-characterized genomes.

  9. Divergence of conserved non-coding sequences: rate estimates and relative rate tests.

    Science.gov (United States)

    Wagner, Günter P; Fried, Claudia; Prohaska, Sonja J; Stadler, Peter F

    2004-11-01

    In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-protein coding DNA harbors important genetic elements directing the development and the physiology of the organisms, like promoters, enhancers, insulators, and micro-RNA genes. The molecular evolution of these genetic elements is difficult to study because their functional significance is hard to deduce from sequence information alone. Here we propose an approach to the study of the rate of evolution of functional non-coding sequences at a macro-evolutionary scale. We identify functionally important non-coding sequences as Conserved Non-Coding Nucleotide (CNCN) sequences from the comparison of two outgroup species. The CNCN sequences so identified are then compared to their homologous sequences in a pair of ingroup species, and we monitor the degree of modification these sequences suffered in the two ingroup lineages. We propose a method to test for rate differences in the modification of CNCN sequences among the two ingroup lineages, as well as a method to estimate their rate of modification. We apply this method to the full sequences of the HoxA clusters from six gnathostome species: a shark, Heterodontus francisci; a basal ray finned fish, Polypterus senegalus; the amphibian, Xenopus tropicalis; as well as three mammalian species, human, rat and mouse. The results show that the evolutionary rate of CNCN sequences is not distinguishable among the three mammalian lineages, while the Xenopus lineage has a significantly increased rate of evolution. Furthermore the estimates of the rate parameters suggest that in the stem lineage of mammals the rate of CNCN sequence evolution was more than twice the rate observed within the placental amniotes clade, suggesting a high rate of evolution of cis-regulatory elements during the origin of amniotes and mammals. We conclude that the proposed methods can be used for testing hypotheses about the rate and pattern of evolution of putative

  10. Complete genomic sequence analysis of infectious bronchitis virus Ark DPI strain and its evolution by recombination.

    Science.gov (United States)

    Ammayappan, Arun; Upadhyay, Chitra; Gelb, Jack; Vakharia, Vikram N

    2008-12-22

    An infectious bronchitis virus Arkansas DPI (Ark DPI) virulent strain was sequenced, analyzed and compared with many different IBV strains and coronaviruses. The genome of Ark DPI consists of 27,620 nucleotides, excluding poly (A) tail, and comprises ten open reading frames. Comparative sequence analysis of Ark DPI with other IBV strains shows striking similarity to the Conn, Gray, JMK, and Ark 99, which were circulating during that time period. Furthermore, comparison of the Ark genome with other coronaviruses demonstrates a close relationship to turkey coronavirus. Among non-structural genes, the 5'untranslated region (UTR), 3C-like proteinase (3CLpro) and the polymerase (RdRp) sequences are 100% identical to the Gray strain. Among structural genes, S1 has 97% identity with Ark 99; S2 has 100% identity with JMK and 96% to Conn; 3b 99%, and 3C to N is 100% identical to Conn strain. Possible recombination sites were found at the intergenic region of spike gene, 3'end of S1 and 3a gene. Independent recombination events may have occurred in the entire genome of Ark DPI, involving four different IBV strains, suggesting that genomic RNA recombination may occur in any part of the genome at number of sites. Hence, we speculate that the Ark DPI strain originated from the Conn strain, but diverged and evolved independently by point mutations and recombination between field strains.

  11. Complete genomic sequence analysis of infectious bronchitis virus Ark DPI strain and its evolution by recombination

    Directory of Open Access Journals (Sweden)

    Gelb Jack

    2008-12-01

    Full Text Available Abstract An infectious bronchitis virus Arkansas DPI (Ark DPI virulent strain was sequenced, analyzed and compared with many different IBV strains and coronaviruses. The genome of Ark DPI consists of 27,620 nucleotides, excluding poly (A tail, and comprises ten open reading frames. Comparative sequence analysis of Ark DPI with other IBV strains shows striking similarity to the Conn, Gray, JMK, and Ark 99, which were circulating during that time period. Furthermore, comparison of the Ark genome with other coronaviruses demonstrates a close relationship to turkey coronavirus. Among non-structural genes, the 5'untranslated region (UTR, 3C-like proteinase (3CLpro and the polymerase (RdRp sequences are 100% identical to the Gray strain. Among structural genes, S1 has 97% identity with Ark 99; S2 has 100% identity with JMK and 96% to Conn; 3b 99%, and 3C to N is 100% identical to Conn strain. Possible recombination sites were found at the intergenic region of spike gene, 3'end of S1 and 3a gene. Independent recombination events may have occurred in the entire genome of Ark DPI, involving four different IBV strains, suggesting that genomic RNA recombination may occur in any part of the genome at number of sites. Hence, we speculate that the Ark DPI strain originated from the Conn strain, but diverged and evolved independently by point mutations and recombination between field strains.

  12. Genome-scale validation of deep-sequencing libraries.

    Directory of Open Access Journals (Sweden)

    Dominic Schmidt

    Full Text Available Chromatin immunoprecipitation followed by high-throughput (HTP sequencing (ChIP-seq is a powerful tool to establish protein-DNA interactions genome-wide. The primary limitation of its broad application at present is the often-limited access to sequencers. Here we report a protocol, Mab-seq, that generates genome-scale quality evaluations for nucleic acid libraries intended for deep-sequencing. We show how commercially available genomic microarrays can be used to maximize the efficiency of library creation and quickly generate reliable preliminary data on a chromosomal scale in advance of deep sequencing. We also exploit this technique to compare enriched regions identified using microarrays with those identified by sequencing, demonstrating that they agree on a core set of clearly identified enriched regions, while characterizing the additional enriched regions identifiable using HTP sequencing.

  13. Armillaria phylogeny based on tef-1α sequences suggests ongoing divergent speciation within the boreal floristic kingdom

    Science.gov (United States)

    Ned B. Klopfenstein; John W. Hanna; Amy L. Ross-Davis; Jane E. Stewart; Yuko Ota; Rosario Medel-Ortiz; Miguel Armando Lopez-Ramirez; Ruben Damian Elias-Roman; Dionicio Alvarado-Rosales; Mee-Sook Kim

    2013-01-01

    Armillaria plays diverse ecological roles in forests worldwide, which has inspired interest in understanding phylogenetic relationships within and among species of this genus. Previous rDNA sequence-based phylogenetic analyses of Armillaria have shown general relationships among widely divergent taxa, but rDNA sequences were not reliable for separating closely related...

  14. Conservation, Divergence, and Genome-Wide Distribution of PAL and POX A Gene Families in Plants

    Directory of Open Access Journals (Sweden)

    H. C. Rawal

    2013-01-01

    Full Text Available Genome-wide identification and phylogenetic and syntenic comparison were performed for the genes responsible for phenylalanine ammonia lyase (PAL and peroxidase A (POX A enzymes in nine plant species representing very diverse groups like legumes (Glycine max and Medicago truncatula, fruits (Vitis vinifera, cereals (Sorghum bicolor, Zea mays, and Oryza sativa, trees (Populus trichocarpa, and model dicot (Arabidopsis thaliana and monocot (Brachypodium distachyon species. A total of 87 and 1045 genes in PAL and POX A gene families, respectively, have been identified in these species. The phylogenetic and syntenic comparison along with motif distributions shows a high degree of conservation of PAL genes, suggesting that these genes may predate monocot/eudicot divergence. The POX A family genes, present in clusters at the subtelomeric regions of chromosomes, might be evolving and expanding with higher rate than the PAL gene family. Our analysis showed that during the expansion of POX A gene family, many groups and subgroups have evolved, resulting in a high level of functional divergence among monocots and dicots. These results will act as a first step toward the understanding of monocot/eudicot evolution and functional characterization of these gene families in the future.

  15. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Directory of Open Access Journals (Sweden)

    Martijn Staats

    Full Text Available Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes, but at least generating vital comparative genomic data for testing (phylogenetic, demographic and genetic hypotheses, that become increasingly more

  16. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Science.gov (United States)

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  17. Sequencing of chloroplast genome using whole cellular DNA and Solexa sequencing technology

    Directory of Open Access Journals (Sweden)

    Jian eWu

    2012-11-01

    Full Text Available Sequencing of the chloroplast genome using traditional sequencing methods has been difficult because of its size (>120 kb and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the chloroplast genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246 Mb, 362Mb, 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16 and FT, respectively. Microreads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8% or 95.5–99.7% of the B. rapa chloroplast genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of chloroplast genome.

  18. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    Energy Technology Data Exchange (ETDEWEB)

    Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Foster, Brian [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Clum, Alicia [U.S. Department of Energy, Joint Genome Institute; Sun, Hui [U.S. Department of Energy, Joint Genome Institute; Pukall, Rudiger [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  19. RNA deep sequencing reveals novel candidate genes and polymorphisms in boar testis and liver tissues with divergent androstenone levels.

    Directory of Open Access Journals (Sweden)

    Asep Gunawan

    Full Text Available Boar taint is an unpleasant smell and taste of pork meat derived from some entire male pigs. The main causes of boar taint are the two compounds androstenone (5α-androst-16-en-3-one and skatole (3-methylindole. It is crucial to understand the genetic mechanism of boar taint to select pigs for lower androstenone levels and thus reduce boar taint. The aim of the present study was to investigate transcriptome differences in boar testis and liver tissues with divergent androstenone levels using RNA deep sequencing (RNA-Seq. The total number of reads produced for each testis and liver sample ranged from 13,221,550 to 33,206,723 and 12,755,487 to 46,050,468, respectively. In testis samples 46 genes were differentially regulated whereas 25 genes showed differential expression in the liver. The fold change values ranged from -4.68 to 2.90 in testis samples and -2.86 to 3.89 in liver samples. Differentially regulated genes in high androstenone testis and liver samples were enriched in metabolic processes such as lipid metabolism, small molecule biochemistry and molecular transport. This study provides evidence for transcriptome profile and gene polymorphisms of boars with divergent androstenone level using RNA-Seq technology. Digital gene expression analysis identified candidate genes in flavin monooxygenease family, cytochrome P450 family and hydroxysteroid dehydrogenase family. Moreover, polymorphism and association analysis revealed mutation in IRG6, MX1, IFIT2, CYP7A1, FMO5 and KRT18 genes could be potential candidate markers for androstenone levels in boars. Further studies are required for proving the role of candidate genes to be used in genomic selection against boar taint in pig breeding programs.

  20. The Release 6 reference sequence of the Drosophila melanogaster genome

    Science.gov (United States)

    Carlson, Joseph W.; Wan, Kenneth H.; Park, Soo; Mendez, Ivonne; Galle, Samuel E.; Booth, Benjamin W.; Pfeiffer, Barret D.; George, Reed A.; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V.; Andreyeva, Evgeniya N.; Boldyreva, Lidiya V.; Marra, Marco; Carvalho, A. Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F.; Rubin, Gerald M.; Karpen, Gary H.

    2015-01-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. PMID:25589440

  1. Host imprints on bacterial genomes--rapid, divergent evolution in individual patients.

    Directory of Open Access Journals (Sweden)

    Jaroslaw Zdziarski

    Full Text Available Bacteria lose or gain genetic material and through selection, new variants become fixed in the population. Here we provide the first, genome-wide example of a single bacterial strain's evolution in different deliberately colonized patients and the surprising insight that hosts appear to personalize their microflora. By first obtaining the complete genome sequence of the prototype asymptomatic bacteriuria strain E. coli 83972 and then resequencing its descendants after therapeutic bladder colonization of different patients, we identified 34 mutations, which affected metabolic and virulence-related genes. Further transcriptome and proteome analysis proved that these genome changes altered bacterial gene expression resulting in unique adaptation patterns in each patient. Our results provide evidence that, in addition to stochastic events, adaptive bacterial evolution is driven by individual host environments. Ongoing loss of gene function supports the hypothesis that evolution towards commensalism rather than virulence is favored during asymptomatic bladder colonization.

  2. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.;

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  3. Analysis of Simple Sequence Repeats in Genomes of Rhizobia

    Institute of Scientific and Technical Information of China (English)

    GAO Ya-mei; HAN Yi-qiang; TANG Hui; SUN Dong-mei; WANG Yan-jie; WANG Wei-dong

    2008-01-01

    Simple sequence repeats (SSRs) or microsatellites, as genetic markers, are ubiquitous in genomes of various organisms. The analysis of SSR in rhizobia genome provides useful information for a variety of applications in population genetics of rhizobia. We analyzed the occurrences, relative abundance, and relative density of SSRs, the most common in Bradyrhizobium japonicum, Mesorhizobium loti, and Sinorhizobium meliloti genomes se-quenced in the microorganisms tandem repeats database, and SSRs in the three species genomes were compared with each other. The result showed that there were 1 410, 859, and 638 SSRs in B. japonicum, M. loti, and 5. meliloti genomes, respectively. In the genomes of B. japonicum, M. loti, and 5. meliloti, tetranucleotide, pentanucleotide, and hexanucleotide repeats were more abundant and indicated higher mutation rates in these species. The least abundance was mononucleotide repeat. The SSRs type and distribution were similar among these species.

  4. Perspectives of Integrative Cancer Genomics in Next Generation Sequencing Era

    Directory of Open Access Journals (Sweden)

    So Mee Kwon

    2012-06-01

    Full Text Available The explosive development of genomics technologies including microarrays and next generation sequencing (NGS has provided comprehensive maps of cancer genomes, including the expression of mRNAs and microRNAs, DNA copy numbers, sequence variations, and epigenetic changes. These genome-wide profiles of the genetic aberrations could reveal the candidates for diagnostic and/or prognostic biomarkers as well as mechanistic insights into tumor development and progression. Recent efforts to establish the huge cancer genome compendium and integrative omics analyses, so-called "integromics", have extended our understanding on the cancer genome, showing its daunting complexity and heterogeneity. However, the challenges of the structured integration, sharing, and interpretation of the big omics data still remain to be resolved. Here, we review several issues raised in cancer omics data analysis, including NGS, focusing particularly on the study design and analysis strategies. This might be helpful to understand the current trends and strategies of the rapidly evolving cancer genomics research.

  5. Correction for Measurement Error from Genotyping-by-Sequencing in Genomic Variance and Genomic Prediction Models

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Janss, Luc; Jensen, Just

    Genotyping-by-sequencing (GBSeq) is becoming a cost-effective genotyping platform for species without available SNP arrays. GBSeq considers to sequence short reads from restriction sites covering a limited part of the genome (e.g., 5-10%) with low sequencing depth per individual (e.g., 5-10X per....... In the current work we show how the correction for measurement error in GBSeq can also be applied in whole genome genomic variance and genomic prediction models. Bayesian whole-genome random regression models are proposed to allow implementation of large-scale SNP-based models with a per-SNP correction...... for measurement error. We show correct retrieval of genomic explained variance, and improved genomic prediction when accounting for the measurement error in GBSeq data...

  6. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

    2011-04-29

    In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

  7. REVISITING MOLECULAR CLONING TO SOLVE GENOME SEQUENCING PROJECT CONFLICTS

    National Research Council Canada - National Science Library

    Hugo A Barrera-Saldaña; Aarón Daniel Ramírez-Sánchez; Tiffany Editth Palacios-Tovar; Dionicio Aguirre-Treviño; Saúl Felipe Karr-de-León

    2017-01-01

    .... Molecular cloning was chosen as the most straight-forward strategy to solve the dilemma. The initial characterization of recombinant plasmids by restriction enzyme digestion confirmed the presence of two genomic sequences...

  8. Complete Genome Sequence of Mycobacterium phlei Type Strain RIVM601174

    KAUST Repository

    Abdallah, A. M.

    2012-05-24

    Mycobacterium phlei is a rapidly growing nontuberculous Mycobacterium species that is typically nonpathogenic, with few reported cases of human disease. Here we report the whole genome sequence of M. phlei type strain RIVM601174.

  9. Draft Genome Sequence of Coprobacter fastidiosus NSB1T

    Science.gov (United States)

    Chaplin, A. V.; Efimov, B. A.; Khokhlova, E. V.; Kafarskaia, L. I.; Tupikin, A. E.; Kabilov, M. R.

    2014-01-01

    Coprobacter fastidiosus is a Gram-negative obligate anaerobic bacterium belonging to the phylum Bacteroidetes. In this work, we report the draft genome sequence of C. fastidiosus strain NSB1T isolated from human infant feces. PMID:24604645

  10. Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines

    Energy Technology Data Exchange (ETDEWEB)

    Li, Lijin [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); Goedegebuure, Peter [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Mardis, Elaine R. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); The Genome Institute at Washington University School of Medicine, St. Louis, MO 63108 (United States); Ellis, Matthew J.C. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110 (United States); Zhang, Xiuli; Herndon, John M. [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); Fleming, Timothy P. [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Carreno, Beatriz M. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110 (United States); Hansen, Ted H. [The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States); Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110 (United States); Gillanders, William E., E-mail: gillandersw@wudosis.wustl.edu [Department of Surgery, Washington University School of Medicine, St. Louis, MO 63110 (United States); The Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine, St. Louis, MO 63110 (United States)

    2011-11-25

    New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines.

  11. Draft Genome Sequences of Nine Cyanobacterial Strains from Diverse Habitats

    Science.gov (United States)

    Zhu, Tao; Hou, Shengwei

    2017-01-01

    ABSTRACT Here, we report the annotated draft genome sequences of nine different cyanobacteria, which were originally collected from different habitats, including hot springs, terrestrial, freshwater, and marine environments, and cover four of the five morphological subsections of cyanobacteria. PMID:28254973

  12. Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines

    Directory of Open Access Journals (Sweden)

    William E. Gillanders

    2011-11-01

    Full Text Available New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines.

  13. Genome sequence of vanilla distortion mosaic virus infecting Coriandrum sativum.

    Science.gov (United States)

    Adams, I P; Rai, S; Deka, M; Harju, V; Hodges, T; Hayward, G; Skelton, A; Fox, A; Boonham, N

    2014-12-01

    The 9573-nucleotide genome of a potyvirus was sequenced from a Coriandrum sativum plant from India with viral symptoms. On analysis, this virus was shown to have greater than 85 % nucleotide sequence identity to vanilla distortion mosaic virus (VDMV). Analysis of the putative coat protein sequence confirmed that this virus was in fact VDMV, with greater than 91 % amino acid sequence identity. The genome appears to encode a 3083-amino-acid polyprotein potentially cleaved into the 10 mature proteins expected in potyviruses. Phylogenetic analysis confirmed that VDMV is a distinct but ungrouped member of the genus Potyvirus.

  14. Intra-species sequence comparisons for annotating genomes

    Energy Technology Data Exchange (ETDEWEB)

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  15. Genome Sequence of Mycobacterium Phage Waterfoul

    Science.gov (United States)

    Jackson, Paige N.; Embry, Ella K.; Johnson, Christa O.; Watson, Tiara L.; Weast, Sayre K.; DeGraw, Caroline J.; Douglas, Jessica R.; Sellers, J. Michael; D’Angelo, William A.

    2016-01-01

    Waterfoul is a newly isolated temperate siphovirus of Mycobacterium smegmatis mc2155. It was identified as a member of the K5 cluster of Mycobacterium phages and has a 61,248-bp genome with 95 predicted genes. PMID:27856585

  16. Draft genome sequence of the Tibetan antelope

    OpenAIRE

    Ge, Ri-Li; Cai, Qingle; Shen, Yong-Yi; San, A.; Ma, Lan; Zhang, Yong; Yi, Xin; Chen, Yan; Yang, Lingfeng; Huang, Ying; He, Rongjun; Hui, Yuanyuan; Hao, Meirong; Li, Yue; Wang, Bo

    2013-01-01

    The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in ge...

  17. Whole genome and transcriptome sequencing of a B3 thymoma.

    Directory of Open Access Journals (Sweden)

    Iacopo Petrini

    Full Text Available Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37. Copy number (CN aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs and 2 insertion/deletions (INDELs were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma.

  18. Draft genome sequence of Therminicola potens strain JR

    Energy Technology Data Exchange (ETDEWEB)

    Byrne-Bailey, K.G.; Wrighton, K.C.; Melnyk, R.A.; Agbo, P.; Hazen, T.C.; Coates, J.D.

    2010-07-01

    'Thermincola potens' strain JR is one of the first Gram-positive dissimilatory metal-reducing bacteria (DMRB) for which there is a complete genome sequence. Consistent with the physiology of this organism, preliminary annotation revealed an abundance of multiheme c-type cytochromes that are putatively associated with the periplasm and cell surface in a Gram-positive bacterium. Here we report the complete genome sequence of strain JR.

  19. Complete Genome Sequence of Phytopathogenic Pectobacterium atrosepticum Bacteriophage Peat1

    OpenAIRE

    Kalischuk, Melanie; Hachey, John; Kawchuk, Lawrence

    2015-01-01

    Pectobacterium atrosepticum is a common phytopathogen causing significant economic losses worldwide. To develop a biocontrol strategy for this blackleg pathogen of solanaceous plants, P. atrosepticum bacteriophage Peat1 was isolated and its genome completely sequenced. Interestingly, morphological and sequence analyses of the 45,633-bp genome revealed that phage Peat1 is a member of the family Podoviridae and most closely resembles the Klebsiella pneumoniae bacteriophage KP34. This is the fir...

  20. Triplex-forming oligonucleotide target sequences in the human genome

    OpenAIRE

    Goñi, J Ramon; de la Cruz, Xavier; Orozco, Modesto

    2004-01-01

    The existence of sequences in the human genome which can be a target for triplex formation, and accordingly are candidates for anti-gene therapies, has been studied by using bioinformatics tools. It was found that the population of triplex-forming oligonucleotide target sequences (TTS) is much more abundant than that expected from simple random models. The population of TTS is large in all the genome, without major differences between chromosomes. A wide analysis along annotated regions of th...

  1. Draft Genome Sequence of Bacillus tequilensis Strain FJAT-14262a

    OpenAIRE

    Chen, Qian-Qian; Liu, Bo; Liu, Guo-hong; Wang, Jie-ping; Che, Jian-Mei

    2015-01-01

    Bacillus tequilensis FJAT-14262a is a Gram-positive rod-shaped bacterium. Here, we report the 4,038,551-bp genome sequence of B. tequilensis FJAT-14262a, which will provide useful information for genomic taxonomy and phylogenomics of Bacillus.

  2. Draft Genome Sequence of Bacillus tequilensis Strain FJAT-14262a.

    Science.gov (United States)

    Chen, Qian-Qian; Liu, Bo; Liu, Guo-Hong; Wang, Jie-Ping; Che, Jian-Mei

    2015-11-12

    Bacillus tequilensis FJAT-14262a is a Gram-positive rod-shaped bacterium. Here, we report the 4,038,551-bp genome sequence of B. tequilensis FJAT-14262a, which will provide useful information for genomic taxonomy and phylogenomics of Bacillus.

  3. Finished Genome Sequence of Collimonas arenae Cal35

    NARCIS (Netherlands)

    Wu, Je-Jia; de Jager, Victor; Deng, Wen-ling; Leveau, Johan

    2015-01-01

    We announce the finished genome sequence of soil forest isolate Collimonas arenae Cal35, which comprises a 5.6-Mbp chromosome and 41-kb plasmid. The Cal35 genome is the second one published for the bacterial genus Collimonas and represents the first opportunity for high-resolution comparison of geno

  4. Draft genome sequence of the silver pomfret fish, Pampus argenteus.

    Science.gov (United States)

    AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

    2016-01-01

    Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.

  5. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.

    2014-08-14

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  6. Finished Genome Sequence of Collimonas arenae Cal35

    NARCIS (Netherlands)

    Wu, Je-Jia; de Jager, Victor; Deng, Wen-ling; Leveau, Johan

    2015-01-01

    We announce the finished genome sequence of soil forest isolate Collimonas arenae Cal35, which comprises a 5.6-Mbp chromosome and 41-kb plasmid. The Cal35 genome is the second one published for the bacterial genus Collimonas and represents the first opportunity for high-resolution comparison of geno

  7. Complete Genome Sequence of Bacillus thuringiensis Bacteriophage Smudge.

    Science.gov (United States)

    Cornell, Jessica L; Breslin, Eileen; Schuhmacher, Zachary; Himelright, Madison; Berluti, Cassandra; Boyd, Charles; Carson, Rachel; Del Gallo, Elle; Giessler, Caris; Gilliam, Benjamin; Heatherly, Catherine; Nevin, Julius; Nguyen, Bryan; Nguyen, Justin; Parada, Jocelyn; Sutterfield, Blake; Tukruni, Muruj; Temple, Louise

    2016-08-18

    Smudge, a bacteriophage enriched from soil using Bacillus thuringiensis DSM-350 as the host, had its complete genome sequenced. Smudge is a myovirus with a genome consisting of 292 genes and was identified as belonging to the C1 cluster of Bacillus phages.

  8. Complete Genome Sequence of Bacillus thuringiensis Strain 407 Cry-

    OpenAIRE

    Poehlein, Anja; Liesegang, Heiko

    2013-01-01

    Bacillus thuringiensis is an insect pathogen that has been used widely as a biopesticide. Here, we report the genome sequence of strain 407 Cry-, which is used to study the genetic determinants of pathogenicity. The genome consists of a 5.5-Mb chromosome and nine plasmids, including a novel 502-kb megaplasmid.

  9. Genome sequences of Listeria monocytogenes strains with resistance to arsenic

    Science.gov (United States)

    Listeria monocytogenes frequently exhibits resistance to arsenic. We report here the draft genome sequences of eight genetically diverse arsenic-resistant L. monocytogenes strains from human listeriosis and food-associated environments. Availability of these genomes would help to elucidate the role ...

  10. Complete genome sequence of Bifidobacterium bifidum S17.

    NARCIS (Netherlands)

    Zhurina, D.; Zomer, A.L.; Gleinser, M.; Brancaccio, V.F.; Auchter, M.; Waidmann, M.S.; Westermann, C.; Sinderen, D. van; Riedel, C.U.

    2011-01-01

    Here, we report on the first completely annotated genome sequence of a Bifidobacterium bifidum strain. B. bifidum S17, isolated from feces of a breast-fed infant, was shown to strongly adhere to intestinal epithelial cells and has potent anti-inflammatory activity in vitro and in vivo. The genome se

  11. Draft genome sequences of 10 strains of the genus exiguobacterium.

    Science.gov (United States)

    Vishnivetskaya, Tatiana A; Chauhan, Archana; Layton, Alice C; Pfiffner, Susan M; Huntemann, Marcel; Copeland, Alex; Chen, Amy; Kyrpides, Nikos C; Markowitz, Victor M; Palaniappan, Krishna; Ivanova, Natalia; Mikhailova, Natalia; Ovchinnikova, Galina; Andersen, Evan W; Pati, Amrita; Stamatis, Dimitrios; Reddy, T B K; Shapiro, Nicole; Nordberg, Henrik P; Cantor, Michael N; Hua, X Susan; Woyke, Tanja

    2014-10-16

    High-quality draft genome sequences were determined for 10 Exiguobacterium strains in order to provide insight into their evolutionary strategies for speciation and environmental adaptation. The selected genomes include psychrotrophic and thermophilic species from a range of habitats, which will allow for a comparison of metabolic pathways and stress response genes.

  12. Complete genome sequence of Aeromonas hydrophila AL06-06

    Science.gov (United States)

    Aeromonas hydrophila occurs in freshwater environments and infects fish and mammals. In this work, we report the complete genome sequence of Aeromonas hydrophila AL06-06, which was isolated from diseased goldfish and is being used for comparative genomic studies with A. hydrophila strains causing ba...

  13. The complete chloroplast genome sequence of Abies nephrolepis (Pinaceae: Abietoideae

    Directory of Open Access Journals (Sweden)

    Dong-Keun Yi

    2016-06-01

    Full Text Available The plant chloroplast (cp genome has maintained a relatively conserved structure and gene content throughout evolution. Cp genome sequences have been used widely for resolving evolutionary and phylogenetic issues at various taxonomic levels of plants. Here, we report the complete cp genome of Abies nephrolepis. The A. nephrolepis cp genome is 121,336 base pairs (bp in length including a pair of short inverted repeat regions (IRa and IRb of 139 bp each separated by a small single copy (SSC region of 54,323 bp (SSC and a large single copy region of 66,735 bp (LSC. It contains 114 genes, 68 of which are protein coding genes, 35 tRNA and four rRNA genes, six open reading frames, and one pseudogene. Seventeen repeat units and 64 simple sequence repeats (SSR have been detected in A. nephrolepis cp genome. Large IR sequences locate in 42-kb inversion points (1186 bp. The A. nephrolepis cp genome is identical to Abies koreana’s which is closely related to taxa. Pairwise comparison between two cp genomes revealed 140 polymorphic sites in each. Complete cp genome sequence of A. nephrolepis has a significant potential to provide information on the evolutionary pattern of Abietoideae and valuable data for development of DNA markers for easy identification and classification.

  14. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis

    NARCIS (Netherlands)

    Carlton, Jane M.; Hirt, Robert P.; Silva, Joana C.; Delcher, Arthur L.; Schatz, Michael; Zhao, Qi; Wortman, Jennifer R.; Bidwell, Shelby L.; Alsmark, U. Cecilia M.; Besteiro, Sebastien; Sicheritz-Ponten, Thomas; Noel, Christophe J.; Dacks, Joel B.; Foster, Peter G.; Simillion, Cedric; Van de Peer, Yves; Miranda-Saavedra, Diego; Barton, Geoffrey J.; Westrop, Gareth D.; Mueller, Sylke; Dessi, Daniele; Fiori, Pier Luigi; Ren, Qinghu; Paulsen, Ian; Zhang, Hanbang; Bastida-Corcuera, Felix D.; Simoes-Barbosa, Augusto; Brown, Mark T.; Hayes, Richard D.; Mukherjee, Mandira; Okumura, Cheryl Y.; Schneider, Rachel; Smith, Alias J.; Vanacova, Stepanka; Villalvazo, Maria; Haas, Brian J.; Pertea, Mihaela; Feldblyum, Tamara V.; Utterback, Terry R.; Shu, Chung-Li; Osoegawa, Kazutoyo; de Jong, Pieter J.; Hrdy, Ivan; Horvathova, Lenka; Zubacova, Zuzana; Dolezal, Pavel; Malik, Shehre-Banoo; Logsdon, John M.; Henze, Katrin; Gupta, Arti; Wang, Ching C.; Dunne, Rebecca L.; Upcroft, Jacqueline A.; Upcroft, Peter; White, Owen; Salzberg, Steven L.; Tang, Petrus; Chiu, Cheng-Hsun; Lee, Ying-Shiung; Embley, T. Martin; Coombs, Graham H.; Mottram, Jeremy C.; Tachezy, Jan; Fraser-Liggett, Claire M.; Johnson, Patricia J.

    2007-01-01

    We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the similar to 160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction wi

  15. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  16. Complete Genome Sequence of Pediococcus pentosaceus Strain SL4

    DEFF Research Database (Denmark)

    Dantoft, Shruti Harnal; Bielak, Eliza Maria; Seo, Jae-Gu;

    2013-01-01

    Pediococcus pentosaceus SL4 was isolated from a Korean fermented vegetable product, kimchi. We report here the whole-genome sequence (WGS) of P. pentosaceus SL4. The genome consists of a 1.79-Mb circular chromosome (G+C content of 37.3%) and seven distinct plasmids ranging in size from 4 kb to 50...

  17. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    Science.gov (United States)

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  18. Genome-wide divergence of DNA methylation marks in cerebral and cerebellar cortices.

    Directory of Open Access Journals (Sweden)

    Yurong Xin

    Full Text Available BACKGROUND: Emerging evidence suggests that DNA methylation plays an expansive role in the central nervous system (CNS. Large-scale whole genome DNA methylation profiling of the normal human brain offers tremendous potential in understanding the role of DNA methylation in brain development and function. METHODOLOGY/SIGNIFICANT FINDINGS: Using methylation-sensitive SNP chip analysis (MSNP, we performed whole genome DNA methylation profiling of the prefrontal, occipital, and temporal regions of cerebral cortex, as well as cerebellum. These data provide an unbiased representation of CpG sites comprising 377,509 CpG dinucleotides within both the genic and intergenic euchromatic region of the genome. Our large-scale genome DNA methylation profiling reveals that the prefrontal, occipital, and temporal regions of the cerebral cortex compared to cerebellum have markedly different DNA methylation signatures, with the cerebral cortex being hypermethylated and cerebellum being hypomethylated. Such differences were observed in distinct genomic regions, including genes involved in CNS function. The MSNP data were validated for a subset of these genes, by performing bisulfite cloning and sequencing and confirming that prefrontal, occipital, and temporal cortices are significantly more methylated as compared to the cerebellum. CONCLUSIONS: These findings are consistent with known developmental differences in nucleosome repeat lengths in cerebral and cerebellar cortices, with cerebrum exhibiting shorter repeat lengths than cerebellum. Our observed differences in DNA methylation profiles in these regions underscores the potential role of DNA methylation in chromatin structure and organization in CNS, reflecting functional specialization within cortical regions.

  19. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    Energy Technology Data Exchange (ETDEWEB)

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  20. Characterization and Profiling of Liver microRNAs by RNA-sequencing in Cattle Divergently Selected for Residual Feed Intake.

    Science.gov (United States)

    Al-Husseini, Wijdan; Chen, Yizhou; Gondro, Cedric; Herd, Robert M; Gibson, John P; Arthur, Paul F

    2016-10-01

    MicroRNAs (miRNAs) are short non-coding RNAs that post-transcriptionally regulate expression of mRNAs in many biological pathways. Liver plays an important role in the feed efficiency of animals and high and low efficient cattle demonstrated different gene expression profiles by microarray. Here we report comprehensive miRNAs profiles by next-gen deep sequencing in Angus cattle divergently selected for residual feed intake (RFI) and identify miRNAs related to feed efficiency in beef cattle. Two microRNA libraries were constructed from pooled RNA extracted from livers of low and high RFI cattle, and sequenced by Illumina genome analyser. In total, 23,628,103 high quality short sequence reads were obtained and more than half of these reads were matched to the bovine genome (UMD 3.1). We identified 305 known bovine miRNAs. Bta-miR-143, bta-miR-30, bta-miR-122, bta-miR-378, and bta-let-7 were the top five most abundant miRNAs families expressed in liver, representing more than 63% of expressed miRNAs. We also identified 52 homologous miRNAs and 10 novel putative bovine-specific miRNAs, based on precursor sequence and the secondary structure and utilizing the miRBase (v. 21). We compared the miRNAs profile between high and low RFI animals and ranked the most differentially expressed bovine known miRNAs. Bovine miR-143 was the most abundant miRNA in the bovine liver and comprised 20% of total expressed mapped miRNAs. The most highly expressed miRNA in liver of mice and humans, miR-122, was the third most abundant in our cattle liver samples. We also identified 10 putative novel bovine-specific miRNA candidates. Differentially expressed miRNAs between high and low RFI cattle were identified with 18 miRNAs being up-regulated and 7 other miRNAs down-regulated in low RFI cattle. Our study has identified comprehensive miRNAs expressed in bovine liver. Some of the expressed miRNAs are novel in cattle. The differentially expressed miRNAs between high and low RFI give some

  1. Characterization and Profiling of Liver microRNAs by RNA-sequencing in Cattle Divergently Selected for Residual Feed Intake

    Directory of Open Access Journals (Sweden)

    Wijdan Al-Husseini

    2016-10-01

    Full Text Available MicroRNAs (miRNAs are short non-coding RNAs that post-transcriptionally regulate expression of mRNAs in many biological pathways. Liver plays an important role in the feed efficiency of animals and high and low efficient cattle demonstrated different gene expression profiles by microarray. Here we report comprehensive miRNAs profiles by next-gen deep sequencing in Angus cattle divergently selected for residual feed intake (RFI and identify miRNAs related to feed efficiency in beef cattle. Two microRNA libraries were constructed from pooled RNA extracted from livers of low and high RFI cattle, and sequenced by Illumina genome analyser. In total, 23,628,103 high quality short sequence reads were obtained and more than half of these reads were matched to the bovine genome (UMD 3.1. We identified 305 known bovine miRNAs. Bta-miR-143, bta-miR-30, bta-miR-122, bta-miR-378, and bta-let-7 were the top five most abundant miRNAs families expressed in liver, representing more than 63% of expressed miRNAs. We also identified 52 homologous miRNAs and 10 novel putative bovine-specific miRNAs, based on precursor sequence and the secondary structure and utilizing the miRBase (v. 21. We compared the miRNAs profile between high and low RFI animals and ranked the most differentially expressed bovine known miRNAs. Bovine miR-143 was the most abundant miRNA in the bovine liver and comprised 20% of total expressed mapped miRNAs. The most highly expressed miRNA in liver of mice and humans, miR-122, was the third most abundant in our cattle liver samples. We also identified 10 putative novel bovine-specific miRNA candidates. Differentially expressed miRNAs between high and low RFI cattle were identified with 18 miRNAs being up-regulated and 7 other miRNAs down-regulated in low RFI cattle. Our study has identified comprehensive miRNAs expressed in bovine liver. Some of the expressed miRNAs are novel in cattle. The differentially expressed miRNAs between high and low RFI

  2. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species.

    Science.gov (United States)

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species.

  3. Reference genome-independent assessment of mutation density using restriction enzyme-phased sequencing

    Directory of Open Access Journals (Sweden)

    Monson-Miller Jennifer

    2012-02-01

    Full Text Available Abstract Background The availability of low cost sequencing has spurred its application to discovery and typing of variation, including variation induced by mutagenesis. Mutation discovery is challenging as it requires a substantial amount of sequencing and analysis to detect very rare changes and distinguish them from noise. Also challenging are the cases when the organism of interest has not been sequenced or is highly divergent from the reference. Results We describe the development of a simple method for reduced representation sequencing. Input DNA was digested with a single restriction enzyme and ligated to Y adapters modified to contain a sequence barcode and to provide a compatible overhang for ligation. We demonstrated the efficiency of this method at SNP discovery using rice and arabidopsis. To test its suitability for the discovery of very rare SNP, one control and three mutagenized rice individuals (1, 5 and 10 mM sodium azide were used to prepare genomic libraries for Illumina sequencers by ligating barcoded adapters to NlaIII restriction sites. For genome-dependent discovery 15-30 million of 80 base reads per individual were aligned to the reference sequence achieving individual sequencing coverage from 7 to 15×. We identified high-confidence base changes by comparing sequences across individuals and identified instances consistent with mutations, i.e. changes that were found in a single treated individual and were solely GC to AT transitions. For genome-independent discovery 70-mers were extracted from the sequence of the control individual and single-copy sequence was identified by comparing the 70-mers across samples to evaluate copy number and variation. This de novo "genome" was used to align the reads and identify mutations as above. Covering approximately 1/5 of the 380 Mb genome of rice we detected mutation densities ranging from 0.6 to 4 per Mb of diploid DNA depending on the mutagenic treatment. Conclusions The

  4. Comparative genomics of a Helicobacter pylori isolate from a Chinese Yunnan Naxi ethnic aborigine suggests high genetic divergence and phage insertion.

    Directory of Open Access Journals (Sweden)

    Yuanhai You

    Full Text Available Helicobacter pylori is a common pathogen correlated with several severe digestive diseases. It has been reported that isolates associated with different geographic areas, different diseases and different individuals might have variable genomic features. Here, we describe draft genomic sequences of H. pylori strains YN4-84 and YN1-91 isolated from patients with gastritis from the Naxi and Han populations of Yunnan, China, respectively. The draft sequences were compared to 45 other publically available genomes, and a total of 1059 core genes were identified. Genes involved in restriction modification systems, type four secretion system three (TFS3 and type four secretion system four (TFS4, were identified as highly divergent. Both YN4-84 and YN1-91 harbor intact cag pathogenicity island (cagPAI and have EPIYA-A/B/D type at the carboxyl terminal of cagA. The vacA gene type is s1m2i1. Another major finding was a 32.5-kb prophage integrated in the YN4-84 genome. The prophage shares most of its genes (30/33 with Helicobacter pylori prophage KHP30. Moreover, a 1,886 bp transposable sequence (IS605 was found in the prophage. Our results imply that the Naxi ethnic minority isolate YN4-84 and Han isolate YN1-91 belong to the hspEAsia subgroup and have diverse genome structure. The genome has been extensively modified in several regions involved in horizontal DNA transfer. The important roles played by phages in the ecology and microevolution of H. pylori were further emphasized. The current data will provide valuable information regarding the H. pylori genome based on historic human migrations and population structure.

  5. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes

    Directory of Open Access Journals (Sweden)

    Cronn Richard

    2009-12-01

    Full Text Available Abstract Background Molecular evolutionary studies share the common goal of elucidating historical relationships, and the common challenge of adequately sampling taxa and characters. Particularly at low taxonomic levels, recent divergence, rapid radiations, and conservative genome evolution yield limited sequence variation, and dense taxon sampling is often desirable. Recent advances in massively parallel sequencing make it possible to rapidly obtain large amounts of sequence data, and multiplexing makes extensive sampling of megabase sequences feasible. Is it possible to efficiently apply massively parallel sequencing to increase phylogenetic resolution at low taxonomic levels? Results We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome generated using multiplexed massively parallel sequencing. 30/33 ingroup nodes resolved with ≥ 95% bootstrap support; this is a substantial improvement relative to prior studies, and shows massively parallel sequencing-based strategies can produce sufficient high quality sequence to reach support levels originally proposed for the phylogenetic bootstrap. Resampling simulations show that at least the entire plastome is necessary to fully resolve Pinus, particularly in rapidly radiating clades. Meta-analysis of 99 published infrageneric phylogenies shows that whole plastome analysis should provide similar gains across a range of plant genera. A disproportionate amount of phylogenetic information resides in two loci (ycf1, ycf2, highlighting their unusual evolutionary properties. Conclusion Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses. With continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling

  6. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  7. Canine fibroblast growth factor receptor 3 sequence is conserved across dogs of divergent skeletal size

    Directory of Open Access Journals (Sweden)

    Grossman Deborah I

    2008-10-01

    Full Text Available Abstract Background Fibroblast growth factor receptor 3 (FGFR3 is expressed in the growth plate of endochondral bones and serves as a negative regulator of linear bone elongation. Activating mutations severely limit bone growth, resulting in dwarfism, while inactivating mutations significantly enhance bone elongation and overall skeletal size. Domesticated dogs exhibit the greatest skeletal size diversity of any species and, given the regulatory role of FGFR3 on growth plate proliferation, we asked whether sequence differences in FGFR3 could account for some of the size differences. Methods All exons, the promoter region, and 60 bp of the 3' flanking region of the canine FGFR3 gene were sequenced for nine different dog breeds representing a spectrum of skeletal size. The resultant sequences were compared to the reference Boxer genome sequence. Results There was no variation in sequence for any FGFR3 exons, promoter region, or 3' flanking sequence across all breeds evaluated. Conclusion The results suggest that, regardless of domestication selection pressure to develop breeds having extreme differences in skeletal size, the FGFR3 gene is conserved. This implies a critical role for this gene in normal skeletal integrity and indicates that other genes account for size variability in dogs.

  8. Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae).

    Science.gov (United States)

    Dubey, Bhawna; Meganathan, P R; Haque, Ikramul

    2012-07-01

    This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more A-C % than T-G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.

  9. Phased genotyping-by-sequencing enhances analysis of genetic diversity and reveals divergent copy number variants in maize

    Science.gov (United States)

    High-throughput sequencing of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken fr...

  10. Assessing divergence time of Spirulida and Sepiida (Cephalopoda) based on hemocyanin sequences.

    Science.gov (United States)

    Warnke, Kerstin Martina; Meyer, Achim; Ebner, Bettina; Lieb, Bernhard

    2011-02-01

    The phylogenetic position of the mesopelagic decabrachian cephalopod Spirula is still a matter of debate. Since hemocyanin has successfully been used to calibrate a molecular clock for many molluscan species, a molecular clock was calculated based on this gene with special attention to the cephalopod genera Spirula and Sepia. The obtained partial sequence comprising ca., one third (3567 bp) of the complete gene is similar to that of Sepia officinalis. The molecular clock was calibrated using the splits of Gastropoda-Cephalopoda (ca. 550 ± 50 mya) and Heterobranchia-Vetigastropoda (ca. 380 ± 10 mya). The resulting hemocyanin-based molecular clock is stable, and the estimated divergence time of Spirulida and Sepiida, some 150 ± 30 million years ago, can be deemed reliable.

  11. Untangling the hybrid nature of modern pig genomes: a mosaic derived from biogeographically distinct and highly divergent Sus scrofa populations

    NARCIS (Netherlands)

    Bosse, M.; Megens, H.J.W.C.; Madsen, O.; Frantz, L.A.F.; Paudel, Y.; Crooijmans, R.P.M.A.; Groenen, M.

    2014-01-01

    The merging of populations after an extended period of isolation and divergence is a common phenomenon, in natural settings as well as due to human interference. Individuals with such hybrid origins contain genomes that essentially form a mosaic of different histories and demographies. Pigs are an e

  12. The Integrated Genomic Architecture and Evolution of Dental Divergence in East African Cichlid Fishes (Haplochromis chilotes x H. nyererei).

    Science.gov (United States)

    Hulsey, C Darrin; Machado-Schiaffino, Gonzalo; Keicher, Lara; Ellis-Soto, Diego; Henning, Frederico; Meyer, Axel

    2017-09-07

    The independent evolution of the two toothed jaws of cichlid fishes is thought to have promoted their unparalleled ecological divergence and species richness. However, dental divergence in cichlids could exhibit substantial genetic covariance and this could dictate how traits like tooth numbers evolve in different African Lakes and on their two jaws. To test this hypothesis, we used a hybrid mapping cross of two trophically divergent Lake Victoria species (Haplochromis chilotes × Haplochromis nyererei) to examine genomic regions associated with cichlid tooth diversity. Surprisingly, a similar genomic region was found to be associated with oral jaw tooth numbers in cichlids from both Lake Malawi and Lake Victoria. Likewise, this same genomic location was associated with variation in pharyngeal jaw tooth numbers. Similar relationships between tooth numbers on the two jaws in both our Victoria hybrid population and across the phylogenetic diversity of Malawi cichlids additionally suggests that tooth numbers on the two jaws of haplochromine cichlids might generally coevolve owing to shared genetic underpinnings. Integrated, rather than independent, genomic architectures could be key to the incomparable evolutionary divergence and convergence in cichlid tooth numbers. Copyright © 2017 Hulsey et al.

  13. The Integrated Genomic Architecture and Evolution of Dental Divergence in East African Cichlid Fishes (Haplochromis chilotes x H. nyererei

    Directory of Open Access Journals (Sweden)

    C. Darrin Hulsey

    2017-09-01

    Full Text Available The independent evolution of the two toothed jaws of cichlid fishes is thought to have promoted their unparalleled ecological divergence and species richness. However, dental divergence in cichlids could exhibit substantial genetic covariance and this could dictate how traits like tooth numbers evolve in different African Lakes and on their two jaws. To test this hypothesis, we used a hybrid mapping cross of two trophically divergent Lake Victoria species (Haplochromis chilotes × Haplochromis nyererei to examine genomic regions associated with cichlid tooth diversity. Surprisingly, a similar genomic region was found to be associated with oral jaw tooth numbers in cichlids from both Lake Malawi and Lake Victoria. Likewise, this same genomic location was associated with variation in pharyngeal jaw tooth numbers. Similar relationships between tooth numbers on the two jaws in both our Victoria hybrid population and across the phylogenetic diversity of Malawi cichlids additionally suggests that tooth numbers on the two jaws of haplochromine cichlids might generally coevolve owing to shared genetic underpinnings. Integrated, rather than independent, genomic architectures could be key to the incomparable evolutionary divergence and convergence in cichlid tooth numbers.

  14. Complete sequence of heterogenous-composition mitochondrial genome (Brassica napus and its exogenous source

    Directory of Open Access Journals (Sweden)

    Wang Juan

    2012-11-01

    Full Text Available Abstract Background Unlike maternal inheritance of mitochondria in sexual reproduction, somatic hybrids follow no obvious pattern. The introgressed segment orf138 from the mitochondrial genome of radish (Raphanus sativus to its counterpart in rapeseed (Brassica napus demonstrates that this inheritance mode derives from the cytoplasm of both parents. Sequencing of the complete mitochondrial genome of five species from Brassica family allowed the prediction of other extraneous sources of the cybrids from the radish parent, and the determination of their mitochondrial rearrangement. Results We obtained the complete mitochondrial genome of Ogura-cms-cybrid (oguC rapeseed. To date, this is the first time that a heterogeneously composed mitochondrial genome was sequenced. The 258,473 bp master circle constituted of 33 protein-coding genes, 3 rRNA sequences, and 23 tRNA sequences. This mitotype noticeably holds two copies of atp9 and is devoid of cox2-2. Relative to nap mitochondrial genome, 40 point mutations were scattered in the 23 protein-coding genes. atp6 even has an abnormal start locus whereas tatC has an abnormal end locus. The rearrangement of the 22 syntenic regions that comprised 80.11% of the genome was influenced by short repeats. A pair of large repeats (9731 bp was responsible for the multipartite structure. Nine unique regions were detected when compared with other published Brassica mitochondrial genome sequences. We also found six homologous chloroplast segments (Brassica napus. Conclusions The mitochondrial genome of oguC is quite divergent from nap and pol, which are more similar with each other. We analyzed the unique regions of every genome of the Brassica family, and found that very few segments were specific for these six mitotypes, especially cam, jun, and ole, which have no specific segments at all. Therefore, we conclude that the most specific regions of oguC possibly came from radish. Compared with the chloroplast genome

  15. Mitogenome Sequencing in the Genus Camelus Reveals Evidence for Purifying Selection and Long-term Divergence between Wild and Domestic Bactrian Camels.

    Science.gov (United States)

    Mohandesan, Elmira; Fitak, Robert R; Corander, Jukka; Yadamsuren, Adiya; Chuluunbat, Battsetseg; Abdelhadi, Omer; Raziq, Abdul; Nagy, Peter; Stalder, Gabrielle; Walzer, Chris; Faye, Bernard; Burger, Pamela A

    2017-08-30

    The genus Camelus is an interesting model to study adaptive evolution in the mitochondrial genome, as the three extant Old World camel species inhabit hot and low-altitude as well as cold and high-altitude deserts. We sequenced 24 camel mitogenomes and combined them with three previously published sequences to study the role of natural selection under different environmental pressure, and to advance our understanding of the evolutionary history of the genus Camelus. We confirmed the heterogeneity of divergence across different components of the electron transport system. Lineage-specific analysis of mitochondrial protein evolution revealed a significant effect of purifying selection in the concatenated protein-coding genes in domestic Bactrian camels. The estimated dN/dS < 1 in the concatenated protein-coding genes suggested purifying selection as driving force for shaping mitogenome diversity in camels. Additional analyses of the functional divergence in amino acid changes between species-specific lineages indicated fixed substitutions in various genes, with radical effects on the physicochemical properties of the protein products. The evolutionary time estimates revealed a divergence between domestic and wild Bactrian camels around 1.1 [0.58-1.8] million years ago (mya). This has major implications for the conservation and management of the critically endangered wild species, Camelus ferus.

  16. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    Science.gov (United States)

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  17. Complete genome sequence of Ferroglobus placidus AEDII12DO

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Risso, Carla [University of Massachusetts, Amherst; Holmes, Dawn [University of Massachusetts, Amherst; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Brettin, Thomas S [ORNL; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Larimer, Frank W [ORNL; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Lovley, Derek [University of Massachusetts, Amherst; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute

    2011-01-01

    Ferroglobus placidus belongs to the order Archaeoglobales within the archaeal phylum Euryar- chaeota. Strain AEDII12DO is the type strain of the species and was isolated from a shallow marine hydrothermal system at Vulcano, Italy. It is a hyperthermophilic, anaerobic chemoli- thoautotroph, but it can also use a variety of aromatic compounds as electron donors. Here we describe the features of this organism together with the complete genome sequence and anno- tation. The 2,196,266 bp genome with its 2,567 protein-coding and 55 RNA genes was se- quenced as part of a DOE Joint Genome Institute Laboratory Sequencing Program (LSP) project.

  18. Complete genome sequence of Serratia plymuthica strain AS12

    Energy Technology Data Exchange (ETDEWEB)

    Neupane, Saraswoti [Uppsala University, Uppsala, Sweden; Finlay, Roger D. [Uppsala University, Uppsala, Sweden; Alstrom, Sadhna [Uppsala University, Uppsala, Sweden; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Peters, Lin [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Chertkov, Olga [Los Alamos National Laboratory (LANL); Han, James [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Hogberg, Nils [Uppsala University, Uppsala, Sweden

    2012-01-01

    A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.

  19. Insights into the dynamics of genome size and chromosome evolution in the early diverging angiosperm lineage Nymphaeales (water lilies).

    Science.gov (United States)

    Pellicer, J; Kelly, L J; Magdalena, C; Leitch, I J

    2013-08-01

    Nymphaeales are the most species-rich lineage of the earliest diverging angiosperms known as the ANA grade (Amborellales, Nymphaeales, Austrobaileyales), and they have received considerable attention from morphological, physiological, and ecological perspectives. Although phylogenetic relationships between these three lineages of angiosperms are mainly well resolved, insights at the whole genome level are still limited because of a dearth of information. To address this, genome sizes and chromosome numbers in 34 taxa, comprising 28 species were estimated and analysed together with previously published data to provide an overview of genome size and chromosome diversity in Nymphaeales. Overall, genome sizes were shown to vary 10-fold and chromosome numbers and ploidy levels ranged from 2n = 2x = 18 to 2n = 16x = ∼224. Distinct patterns of genome diversity were apparent, reflecting the differential incidence of polyploidy, changes in repetitive DNA content, and chromosome rearrangements within and between genera. Using model-based approaches, ancestral genome size and basic chromosome numbers were reconstructed to provide insights into the dynamics of genome size and chromosome number evolution. Finally, by combining additional data from Amborellales and Austrobaileyales, a comprehensive overview of genome sizes and chromosome numbers in these early diverging angiosperms is presented.

  20. Comparative Genomics of a Plant-Pathogenic Fungus, Pyrenophora tritici-repentis, Reveals Transduplication and the Impact of Repeat Elements on Pathogenicity and Population Divergence

    Energy Technology Data Exchange (ETDEWEB)

    Manning, Viola A.; Pandelova, Iovanna; Dhillon, Braham; Wilhelm, Larry J.; Goodwin, Stephen B.; Berlin, Aaron M.; Figueroa, Melania; Freitag, Michael; Hane, James K.; Henrissat, Bernard; Holman, Wade H.; Kodira, Chinnappa D.; Martin, Joel; Oliver, Richard P.; Robbertse, Barbara; Schackwitz, Wendy; Schwartz, David C.; Spatafora, Joseph W.; Turgeon, B. Gillian; Yandava, Chandri; Young, Sarah; Zhou, Shiguo; Zeng, Qiandong; Grigoriev, Igor V.; Ma, Li-Jun; Ciuffetti, Lynda M.

    2012-08-16

    Pyrenophora tritici-repentis is a necrotrophic fungus causal to the disease tan spot of wheat, whose contribution to crop loss has increased significantly during the last few decades. Pathogenicity by this fungus is attributed to the production of host-selective toxins (HST), which are recognized by their host in a genotype-specific manner. To better understand the mechanisms that have led to the increase in disease incidence related to this pathogen, we sequenced the genomes of three P. tritici-repentis isolates. A pathogenic isolate that produces two known HSTs was used to assemble a reference nuclear genome of approximately 40 Mb composed of 11 chromosomes that encode 12,141 predicted genes. Comparison of the reference genome with those of a pathogenic isolate that produces a third HST, and a nonpathogenic isolate, showed the nonpathogen genome to be more diverged than those of the two pathogens. Examination of gene-coding regions has provided candidate pathogen-specific proteins and revealed gene families that may play a role in a necrotrophic lifestyle. Analysis of transposable elements suggests that their presence in the genome of pathogenic isolates contributes to the creation of novel genes, effector diversification, possible horizontal gene transfer events, identified copy number variation, and the first example of transduplication by DNA transposable elements in fungi. Overall, comparative analysis of these genomes provides evidence that pathogenicity in this species arose through an influx of transposable elements, which created a genetically flexible landscape that can easily respond to environmental changes.

  1. Genome size evolution in pufferfish: an insight from BAC clone-based Diodon holocanthus genome sequencing

    Directory of Open Access Journals (Sweden)

    Gan Xiaoni

    2010-06-01

    Full Text Available Abstract Background Variations in genome size within and between species have been observed since the 1950 s in diverse taxonomic groups. Serving as model organisms, smooth pufferfish possess the smallest vertebrate genomes. Interestingly, spiny pufferfish from its sister family have genome twice as large as smooth pufferfish. Therefore, comparative genomic analysis between smooth pufferfish and spiny pufferfish is useful for our understanding of genome size evolution in pufferfish. Results Ten BAC clones of a spiny pufferfish Diodon holocanthus were randomly selected and shotgun sequenced. In total, 776 kb of non-redundant sequences without gap representing 0.1% of the D. holocanthus genome were identified, and 77 distinct genes were predicted. In the sequenced D. holocanthus genome, 364 kb is homologous with 265 kb of the Takifugu rubripes genome, and 223 kb is homologous with 148 kb of the Tetraodon nigroviridis genome. The repetitive DNA accounts for 8% of the sequenced D. holocanthus genome, which is higher than that in the T. rubripes genome (6.89% and that in the Te. nigroviridis genome (4.66%. In the repetitive DNA, 76% is retroelements which account for 6% of the sequenced D. holocanthus genome and belong to known families of transposable elements. More than half of retroelements were distributed within genes. In the non-homologous regions, repeat element proportion in D. holocanthus genome increased to 10.6% compared with T. rubripes and increased to 9.19% compared with Te. nigroviridis. A comparison of 10 well-defined orthologous genes showed that the average intron size (566 bp in D. holocanthus genome is significantly longer than that in the smooth pufferfish genome (435 bp. Conclusion Compared with the smooth pufferfish, D. holocanthus has a low gene density and repeat elements rich genome. Genome size variation between D. holocanthus and the smooth pufferfish exhibits as length variation between homologous region and different

  2. Monitoring Genomic Sequences during SELEX Using High-Throughput Sequencing: Neutral SELEX

    Science.gov (United States)

    Chen, Doris; Lorenz, Christina; Schroeder, Renée

    2010-01-01

    Background SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. Methodology/Principal Findings To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. Conclusions/Significance Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers. PMID:20161784

  3. Monitoring genomic sequences during SELEX using high-throughput sequencing: neutral SELEX.

    Directory of Open Access Journals (Sweden)

    Bob Zimmermann

    Full Text Available BACKGROUND: SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. METHODOLOGY/PRINCIPAL FINDINGS: To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. CONCLUSIONS/SIGNIFICANCE: Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

  4. Specialized microbial databases for inductive exploration of microbial genome sequences

    Directory of Open Access Journals (Sweden)

    Cabau Cédric

    2005-02-01

    Full Text Available Abstract Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore http://bioinfo.hku.hk/genochore.html, a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis associated to related organisms for comparison.

  5. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol

    2010-01-01

    preferentially selected for sequencing. In accordance with the Bermuda and Fort Lauderdale agreements and the more recent Toronto Statement the data have been released into public sequence repositories (Genbank/EMBL, NCBI/Ensembl trace repositories) in a timely manner and in advance of publication. CONCLUSIONS...

  6. Convergent and divergent evolution of genomic imprinting in the marsupial Monodelphis domestica

    Directory of Open Access Journals (Sweden)

    Das Radhika

    2012-08-01

    Full Text Available Abstract Background Genomic imprinting is an epigenetic phenomenon resulting in parent-of-origin specific monoallelic gene expression. It is postulated to have evolved in placental mammals to modulate intrauterine resource allocation to the offspring. In this study, we determined the imprint status of metatherian orthologues of eutherian imprinted genes. Results L3MBTL and HTR2A were shown to be imprinted in Monodelphis domestica (the gray short-tailed opossum. MEST expressed a monoallelic and a biallelic transcript, as in eutherians. In contrast, IMPACT, COPG2, and PLAGL1 were not imprinted in the opossum. Differentially methylated regions (DMRs involved in regulating imprinting in eutherians were not found at any of the new imprinted loci in the opossum. Interestingly, a novel DMR was identified in intron 11 of the imprinted IGF2R gene, but this was not conserved in eutherians. The promoter regions of the imprinted genes in the opossum were enriched for the activating histone modification H3 Lysine 4 dimethylation. Conclusions The phenomenon of genomic imprinting is conserved in Therians, but the marked difference in the number and location of imprinted genes and DMRs between metatherians and eutherians indicates that imprinting is not fully conserved between the two Therian infra-classes. The identification of a novel DMR at a non-conserved location as well as the first demonstration of histone modifications at imprinted loci in the opossum suggest that genomic imprinting may have evolved in a common ancestor of these two Therian infra-classes with subsequent divergence of regulatory mechanisms in the two lineages.

  7. Open access to sequence: Browsing the Pichia pastoris genome

    Directory of Open Access Journals (Sweden)

    Graf Alexandra

    2009-10-01

    Full Text Available Abstract The first genome sequences of the important yeast protein production host Pichia pastoris have been released into the public domain this spring. In order to provide the scientific community easy and versatile access to the sequence, two web-sites have been installed as a resource for genomic sequence, gene and protein information for P. pastoris: A GBrowse based genome browser was set up at http://www.pichiagenome.org and a genome portal with gene annotation and browsing functionality at http://bioinformatics.psb.ugent.be/webtools/bogas. Both websites are offering information on gene annotation and function, regulation and structure. In addition, a WiKi based platform allows all users to create additional information on genes, proteins, physiology and other items of P. pastoris research, so that the Pichia community can benefit from exchange of knowledge, data and materials.

  8. Complete genome sequence of Shewanella putrefaciens. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Heidelberg, John F.

    2001-04-01

    Seventy percent of the costs for genome sequencing Shewanella putrefaciens (oneidensis) were requested. These funds were expected to allow completion of the low-pass (5-fold) random sequencing and complete closure and annotation of the 200 kbp plasmid. Because of cost reduction that occurred during the period of this grant, these goals have been far exceeded. Currently, the S. putrefaciens genome is very nearly completely closed, even though the genome was significantly larger than expected and extremely repetitive. The entire genome sequence has been made BLAST searchable on the TIGR web page, and an extensive effort has been made to make data and analyses available to all researchers working on S. putrefaciens (oneidensis).

  9. Long-read sequence assembly of the gorilla genome

    Science.gov (United States)

    Gordon, David; Huddleston, John; Chaisson, Mark J. P.; Hill, Christopher M.; Kronenberg, Zev N.; Munson, Katherine M.; Malig, Maika; Raja, Archana; Fiddes, Ian; Hillier, LaDeana W.; Dunn, Christopher; Baker, Carl; Armstrong, Joel; Diekhans, Mark; Paten, Benedict; Shendure, Jay; Wilson, Richard K.; Haussler, David; Chin, Chen-Shan; Eichler, Evan E.

    2016-01-01

    Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome. PMID:27034376

  10. The genome sequence of the model ascomycete fungus Podospora anserina

    NARCIS (Netherlands)

    Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Ségurens, Béatrice; Poulain, Julie; Anthouard, Véronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Déquard-Chablat, Michelle; Picard, Marguerite; Contamine, Véronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Véronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne Gj; Henrissat, Bernard; Khoury, Riyad El; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarré, Bérangère; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe

    2008-01-01

    BACKGROUND: The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. RESULTS: We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed

  11. Genome sequence of Stachybotrys chartarum Strain 51-11

    Science.gov (United States)

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  12. Sequencing and analysis of an Irish human genome.

    LENUS (Irish Health Repository)

    Tong, Pin

    2010-01-01

    Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.

  13. Genome sequence of Stachybotrys chartarum Strain 51-11

    Science.gov (United States)

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  14. Draft Genome Sequence of Lactobacillus fermentum NB-22

    Science.gov (United States)

    Shkoporov, A. N.; Efimov, B. A.; Pikina, A. P.; Borisova, O. Y.; Gladko, I. A.; Postnikova, E. A.; Lordkipanidze, A. E.; Kafarskaia, L. I.

    2015-01-01

    We announce here a draft genome sequence of Lactobacillus fermentum NB-22, a strain isolated from human vaginal microbiota. The assembled sequence consists of 190 contigs, joined into 137 scaffolds, and the total size is 2.01 Mb. PMID:26272572

  15. Complete Genome Sequence of Kocuria palustris MU14/1.

    Science.gov (United States)

    Calcutt, Michael J; Foecking, Mark F

    2015-10-15

    Presented here is the first completely assembled genome sequence of Kocuria palustris, an actinobacterial species with broad ecological distribution. The single, circular chromosome of K. palustris MU14/1 comprises 2,854,447 bp, has a G+C content of 70.5%, and contains a deduced gene set of 2,521 coding sequences.

  16. Complete Genome Sequence of Kocuria palustris MU14/1

    OpenAIRE

    2015-01-01

    Presented here is the first completely assembled genome sequence of Kocuria palustris, an actinobacterial species with broad ecological distribution. The single, circular chromosome of K. palustris MU14/1 comprises 2,854,447 bp, has a G+C content of 70.5%, and contains a deduced gene set of 2,521 coding sequences.

  17. Complete Genome Sequence of Zika Virus Isolated from Semen

    Science.gov (United States)

    Graham, Victoria; Lewandowski, Kuiama; Dowall, Stuart D.; Pullan, Steven T.; Hewson, Roger

    2016-01-01

    Zika virus (ZIKV) is an emerging pathogenic flavivirus currently circulating in numerous countries in South America, the Caribbean, and the Western Pacific Region. Using an unbiased metagenomic sequencing approach, we report here the first complete genome sequence of ZIKV isolated from a clinical semen sample. PMID:27738033

  18. Complete Genomic Sequence of Issyk-Kul Virus.

    Science.gov (United States)

    Atkinson, Barry; Marston, Denise A; Ellis, Richard J; Fooks, Anthony R; Hewson, Roger

    2015-07-02

    Issyk-Kul virus (ISKV) is an ungrouped virus tentatively assigned to the Bunyaviridae family and is associated with an acute febrile illness in several central Asian countries. Using next-generation sequencing technologies, we report here the full-genome sequence for this novel unclassified arboviral pathogen circulating in central Asia.

  19. Complete genome sequence of a new maize-associated cytorhabdovirus

    Science.gov (United States)

    A new 11,877 nt cytorhabdovirus sequence with 6 open reading frames has been identified in a maize sample. It shares 50 and 51% genome-wide nucleotide sequence identity with northern cereal mosaic cytorhabdovirus (NCMV) and barley yellow striate mosaic cytorhabdovirus (BYSMV), respectively....

  20. Single nucleus genome sequencing reveals high similarity among nuclei of an endomycorrhizal fungus.

    Directory of Open Access Journals (Sweden)

    Kui Lin

    2014-01-01

    Full Text Available Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya.

  1. Single Nucleus Genome Sequencing Reveals High Similarity among Nuclei of an Endomycorrhizal Fungus

    Science.gov (United States)

    Zhang, Zhonghua; Ivanov, Sergey; Saunders, Diane G. O.; Mu, Desheng; Pang, Erli; Cao, Huifen; Cha, Hwangho; Lin, Tao; Zhou, Qian; Shang, Yi; Li, Ying; Sharma, Trupti; van Velzen, Robin; de Ruijter, Norbert; Aanen, Duur K.; Win, Joe; Kamoun, Sophien; Bisseling, Ton; Geurts, René; Huang, Sanwen

    2014-01-01

    Nuclei of arbuscular endomycorrhizal fungi have been described as highly diverse due to their asexual nature and absence of a single cell stage with only one nucleus. This has raised fundamental questions concerning speciation, selection and transmission of the genetic make-up to next generations. Although this concept has become textbook knowledge, it is only based on studying a few loci, including 45S rDNA. To provide a more comprehensive insight into the genetic makeup of arbuscular endomycorrhizal fungi, we applied de novo genome sequencing of individual nuclei of Rhizophagus irregularis. This revealed a surprisingly low level of polymorphism between nuclei. In contrast, within a nucleus, the 45S rDNA repeat unit turned out to be highly diverged. This finding demystifies a long-lasting hypothesis on the complex genetic makeup of arbuscular endomycorrhizal fungi. Subsequent genome assembly resulted in the first draft reference genome sequence of an arbuscular endomycorrhizal fungus. Its length is 141 Mbps, representing over 27,000 protein-coding gene models. We used the genomic sequence to reinvestigate the phylogenetic relationships of Rhizophagus irregularis with other fungal phyla. This unambiguously demonstrated that Glomeromycota are more closely related to Mucoromycotina than to its postulated sister Dikarya. PMID:24415955

  2. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    Energy Technology Data Exchange (ETDEWEB)

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  3. Genomic Sequencing of Single Microbial Cells from Environmental Samples

    Energy Technology Data Exchange (ETDEWEB)

    Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

    2008-02-01

    Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

  4. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  5. Combining two technologies for full genome sequencing of human.

    Science.gov (United States)

    Skryabin, K G; Prokhortchouk, E B; Mazur, A M; Boulygina, E S; Tsygankova, S V; Nedoluzhko, A V; Rastorguev, S M; Matveev, V B; Chekanov, N N; D A, Goranskaya; Teslyuk, A B; Gruzdeva, N M; Velikhov, V E; Zaridze, D G; Kovalchuk, M V

    2009-10-01

    At present, the new technologies of DNA sequencing are rapidly developing allowing quick and efficient characterisation of organisms at the level of the genome structure. In this study, the whole genome sequencing of a human (Russian man) was performed using two technologies currently present on the market - Sequencing by Oligonucleotide Ligation and Detection (SOLiD™) (Applied Biosystems) and sequencing technologies of molecular clusters using fluorescently labeled precursors (Illumina). The total number of generated data resulted in 108.3 billion base pairs (60.2 billion from Illumina technology and 48.1 billion from SOLiD technology). Statistics performed on reads generated by GAII and SOLiD showed that they covered 75% and 96% of the genome respectively. Short polymorphic regions were detected with comparable accuracy however, the absolute amount of them revealed by SOLiD was several times less than by GAII. Optimal algorithm for using the latest methods of sequencing was established for the analysis of individual human genomes. The study is the first Russian effort towards whole human genome sequencing.

  6. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp.

  7. Genome sequencing of a single tardigrade Hypsibius dujardini individual.

    Science.gov (United States)

    Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru

    2016-08-16

    Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies.

  8. Biased distribution of DNA uptake sequences towards genome maintenance genes

    DEFF Research Database (Denmark)

    Davidsen, T.; Rodland, E.A.; Lagesen, K.

    2004-01-01

    coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group......Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...

  9. P elements and MITE relatives in the whole genome sequence of Anopheles gambiae

    Directory of Open Access Journals (Sweden)

    Nouaud Danielle

    2006-08-01

    Full Text Available Abstract Background Miniature Inverted-repeat Terminal Elements (MITEs, which are particular class-II transposable elements (TEs, play an important role in genome evolution, because they have very high copy numbers and display recurrent bursts of transposition. The 5' and 3' subterminal regions of a given MITE family often show a high sequence similarity with the corresponding regions of an autonomous Class-II TE family. However, the sustained presence over a prolonged evolutionary time of MITEs and TE master copies able to promote their mobility has been rarely reported within the same genome, and this raises fascinating evolutionary questions. Results We report here the presence of P transposable elements with related MITE families in the Anopheles gambiae genome. Using a TE annotation pipeline we have identified and analyzed all the P sequences in the sequenced A. gambiae PEST strain genome. More than 0.49% of the genome consists of P elements and derivates. P elements can be divided into 9 different subfamilies, separated by more than 30% of nucleotide divergence. Seven of them present full length copies. Ten MITE families are associated with 6 out of the 9 Psubfamilies. Comparing their intra-element nucleotide diversities and their structures allows us to propose the putative dynamics of their emergence. In particular, one MITE family which has a hybrid structure, with ends each of which is related to a different P-subfamily, suggests a new mechanism for their emergence and their mobility. Conclusion This work contributes to a greater understanding of the relationship between full-length class-II TEs and MITEs, in this case P elements and their derivatives in the genome of A. gambiae. Moreover, it provides the most comprehensive catalogue to date of P-like transposons in this genome and provides convincing yet indirect evidence that some of the subfamilies have been recently active.

  10. Genomic insights into Wnt signaling in an early diverging metazoan, the ctenophore Mnemiopsis leidyi.

    Science.gov (United States)

    Pang, Kevin; Ryan, Joseph F; Mullikin, James C; Baxevanis, Andreas D; Martindale, Mark Q

    2010-10-04

    Intercellular signaling pathways are a fundamental component of the integrating cellular behavior required for the evolution of multicellularity. The genomes of three of the four early branching animal phyla (Cnidaria, Placozoa and Porifera) have been surveyed for key components, but not the fourth (Ctenophora). Genomic data from ctenophores could be particularly relevant, as ctenophores have been proposed to be one of the earliest branching metazoan phyla. A preliminary assembly of the lobate ctenophore Mnemiopsis leidyi genome generated using next-generation sequencing technologies were searched for components of a developmentally important signaling pathway, the Wnt/β-catenin pathway. Molecular phylogenetic analysis shows four distinct Wnt ligands (MlWnt6, MlWnt9, MlWntA and MlWntX), and most, but not all components of the receptor and intracellular signaling pathway were detected. In situ hybridization of the four Wnt ligands showed that they are expressed in discrete regions associated with the aboral pole, tentacle apparati and apical organ. Ctenophores show a minimal (but not obviously simple) complement of Wnt signaling components. Furthermore, it is difficult to compare the Mnemiopsis Wnt expression patterns with those of other metazoans. mRNA expression of Wnt pathway components appears later in development than expected, and zygotic gene expression does not appear to play a role in early axis specification. Notably absent in the Mnemiopsis genome are most major secreted antagonists, which suggests that complex regulation of this secreted signaling pathway probably evolved later in animal evolution.

  11. Genome-wide Expansion and Expression Divergence of the Basic Leucine Zipper Transcription Factors in Higher Plants with an Emphasis on Sorghum

    Institute of Scientific and Technical Information of China (English)

    Jizhou Wang; Junxia Zhou; Baolan Zhang; Jeevanandam Vanitha; Srinivasan Ramachandran; Shu-Ye Jiang

    2011-01-01

    Plant bZIP transcription factors play crucial roles in multiple biological processes. However,little is known about the sorghum bZIP gene family although the sorghum genome has been completely sequenced. In this study,we have carried out a genome-wide identification and characterization of this gene family in sorghum.Our data show that the genome encodes at least 92 bZIP transcription factors. These bZIP genes have been expanded mainly by segmental duplication. Such an expansion mechanism has also been observed in rice,arabidopsis and many other plant organisms,suggesting a common expansion mode of this gene family in plants. Further investigation shows that most of the bZIP members have been present in the most recent common ancestor of sorghum and rice and the major expansion would occur before the sorghum-rice split era. Although these bZIP genes have been duplicated with a long history,they exhibited limited functional divergence as shown by nonsynonymous substitutions (Ka)/synonymous substitutions (Ks) analyses. Their retention was mainly due to the high percentages of expression divergence. Our data also showed that this gene family might play a role in multiple developmental stages and tissues and might be regarded as important regulators of various abiotic stresses and sugar signaling.

  12. Next-generation sequencing and large genome assemblies.

    Science.gov (United States)

    Henson, Joseph; Tischler, German; Ning, Zemin

    2012-06-01

    The next-generation sequencing (NGS) revolution has drastically reduced time and cost requirements for sequencing of large genomes, and also qualitatively changed the problem of assembly. This article reviews the state of the art in de novo genome assembly, paying particular attention to mammalian-sized genomes. The strengths and weaknesses of the main sequencing platforms are highlighted, leading to a discussion of assembly and the new challenges associated with NGS data. Current approaches to assembly are outlined and the various software packages available are introduced and compared. The question of whether quality assemblies can be produced using short-read NGS data alone, or whether it must be combined with more expensive sequencing techniques, is considered. Prospects for future assemblers and tests of assembly performance are also discussed.

  13. DNA sequencing leads to genomics progress in China

    Institute of Scientific and Technical Information of China (English)

    WU JiaYan; XIAO JingFa; ZHANG RuoSi; YU Jun

    2011-01-01

    1 Science in the large-scale sequencing era Ten years ago,the first draft sequence assembly of the human genome was completed [1],bringing biomedical research one-step closer toward the goal of revolutionizing diagnosis,prevention,and treatment of human diseases.Recently,journalists from the journal Nature surveyed more than 1000 life scientists regarding this laudable aim [2],obtaining substantially negative responses [3].However,almost all of those surveyed had been influenced,in one way or another,by the availability of the human genome sequence,and they also agreed with the notion that the "sequence is the start." The complexity of genome biology and almost every aspect of human biology is far greater than previously thought [4].

  14. Correlating Traits of Gene Retention, Sequence Divergence, Duplicability and Essentiality in Vertebrates, Arthropods, and Fungi

    Science.gov (United States)

    Waterhouse, Robert M.; Zdobnov, Evgeny M.; Kriventseva, Evgenia V.

    2011-01-01

    Delineating ancestral gene relations among a large set of sequenced eukaryotic genomes allowed us to rigorously examine links between evolutionary and functional traits. We classified 86% of over 1.36 million protein-coding genes from 40 vertebrates, 23 arthropods, and 32 fungi into orthologous groups and linked over 90% of them to Gene Ontology or InterPro annotations. Quantifying properties of ortholog phyletic retention, copy-number variation, and sequence conservation, we examined correlations with gene essentiality and functional traits. More than half of vertebrate, arthropod, and fungal orthologs are universally present across each lineage. These universal orthologs are preferentially distributed in groups with almost all single-copy or all multicopy genes, and sequence evolution of the predominantly single-copy orthologous groups is markedly more constrained. Essential genes from representative model organisms, Mus musculus, Drosophila melanogaster, and Saccharomyces cerevisiae, are significantly enriched in universal orthologs within each lineage, and essential-gene-containing groups consistently exhibit greater sequence conservation than those without. This study of eukaryotic gene repertoire evolution identifies shared fundamental principles and highlights lineage-specific features, it also confirms that essential genes are highly retained and conclusively supports the “knockout-rate prediction” of stronger constraints on essential gene sequence evolution. However, the distinction between sequence conservation of single- versus multicopy orthologs is quantitatively more prominent than between orthologous groups with and without essential genes. The previously underappreciated difference in the tolerance of gene duplications and contrasting evolutionary modes of “single-copy control” versus “multicopy license” may reflect a major evolutionary mechanism that allows extended exploration of gene sequence space. PMID:21148284

  15. Genomic multiple sequence alignments: refinement using a genetic algorithm

    Directory of Open Access Journals (Sweden)

    Lefkowitz Elliot J

    2005-08-01

    Full Text Available Abstract Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned regions of the orthopoxvirus alignment. Overall sequence identity increased only

  16. Sequencing and Analysis of Neanderthal Genomic DNA

    OpenAIRE

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo, Svante; Pritchard, Jonathan K; Rubin, Edward M.

    2006-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library a...

  17. Inconsistencies in Neanderthal genomic DNA sequences.

    Directory of Open Access Journals (Sweden)

    Jeffrey D Wall

    2007-10-01

    Full Text Available Two recently published papers describe nuclear DNA sequences that were obtained from the same Neanderthal fossil. Our reanalyses of the data from these studies show that they are not consistent with each other and point to serious problems with the data quality in one of the studies, possibly due to modern human DNA contaminants and/or a high rate of sequencing errors.

  18. Complete genome sequences of three strains of coxsackievirus a7.

    Science.gov (United States)

    Ylä-Pelto, Jani; Koskinen, Satu; Karelehto, Eveliina; Sittig, Eleonora; Roivainen, Merja; Hyypiä, Timo; Susi, Petri

    2013-04-11

    Genomes of three strains (Parker, USSR, and 275/58) of coxsackievirus A7 (CV-A7) were amplified by the long reverse transcription (RT)-PCR method and sequenced. While the sequences of Parker and USSR were identical, the similarities of 275/58 to the CV-A7 reference sequence, accession no. AY421765, were 82.6% and 96.2% for nucleotides and amino acids, respectively.

  19. Novel microsatellite marker development from the unassembled genome sequence data of the marbled flounder Pseudopleuronectes yokohamae.

    Science.gov (United States)

    Minegishi, Yuki; Ikeda, Minoru; Kijima, Akihiro

    2015-12-01

    Various genome-scale data have been increasingly published in diverged species, but they can be reused for other purposes by re-analyzing in other ways. As a case study to utilize the published genome data, we developed microsatellite markers from the genome sequence data (assembled contigs and unassembled reads) of the marbled flounder Pseudopleuronectes yokohamae. No microsatellites were identified in the contig sequences, whereas the computer software found 781,773 sequences containing microsatellites with di- to hexa-nucleotide motif in the unassembled reads. For 86,732 unique sequences among them, a total of 331,368 primer pairs were designed. Screening based on PCR amplification, polymorphisms and accurate genotyping resulted in sixteen primer sets, which were later characterized using 45 samples collected in Onagawa Bay, Miyagi, Japan. The presence of null alleles was suggested at four loci in the studied population but no evidence of allelic dropout was found. The observed number of alleles and heterozygosity was 2-20 and 0-0.88889, respectively, indicating polymorphisms and usefulness for population genetic analyses of this species. In addition, a large number of the microsatellite primers developed in this study are potentially applicable also for kinship estimation, individual fingerprint and linkage map construction.

  20. Genome sequencing identifies Listeria fleischmannii subsp. coloradonensis subsp. nov., isolated from a ranch.

    Science.gov (United States)

    den Bakker, Henk C; Manuel, Clyde S; Fortes, Esther D; Wiedmann, Martin; Nightingale, Kendra K

    2013-09-01

    Twenty Listeria-like isolates were obtained from environmental samples collected on a cattle ranch in northern Colorado; all of these isolates were found to share an identical partial sigB sequence, suggesting close relatedness. The isolates were similar to members of the genus Listeria in that they were Gram-stain-positive, short rods, oxidase-negative and catalase-positive; the isolates were similar to Listeria fleischmannii because they were non-motile at 25 °C. 16S rRNA gene sequencing for representative isolates and whole genome sequencing for one isolate was performed. The genome of the type strain of Listeria fleischmannii (strain LU2006-1(T)) was also sequenced. The draft genomes were very similar in size and the average MUMmer nucleotide identity across 91% of the genomes was 95.16%. Genome sequence data were used to design primers for a six-gene multi-locus sequence analysis (MLSA) scheme. Phylogenies based on (i) the near-complete 16S rRNA gene, (ii) 31 core genes and (iii) six housekeeping genes illustrated the close relationship of these Listeria-like isolates to Listeria fleischmannii LU2006-1(T). Sufficient genetic divergence of the Listeria-like isolates from the type strain of Listeria fleischmannii and differing phenotypic characteristics warrant these isolates to be classified as members of a distinct infraspecific taxon, for which the name Listeria fleischmannii subsp. coloradonensis subsp. nov. is proposed. The type strain is TTU M1-001(T) ( =BAA-2414(T) =DSM 25391(T)). The isolates of Listeria fleischmannii subsp. coloradonensis subsp. nov. differ from the nominate subspecies by the inability to utilize melezitose, turanose and sucrose, and the ability to utilize inositol. The results also demonstrate the utility of whole genome sequencing to facilitate identification of novel taxa within a well-described genus. The genomes of both subspecies of Listeria fleischmannii contained putative enhancin genes; the Listeria fleischmannii subsp

  1. Sequence divergence in two tandemly located pilin genes of Eikenella corrodens.

    Science.gov (United States)

    Tønjum, T; Weir, S; Bøvre, K; Progulske-Fox, A; Marrs, C F

    1993-05-01

    Eikenella corrodens normally inhabits the human respiratory and gastrointestinal tracts but is frequently the cause of abscesses at various sites. Using the N-terminal portion of the Moraxella nonliquefaciens pilin gene as a hybridization probe, we cloned two tandemly located pilin genes of E. corrodens 31745, ecpC and ecpD, and expressed the two pilin genes separately in Escherichia coli. A comparison of the predicted amino acid sequences of E. corrodens 31745 EcpC and EcpD revealed considerable divergence between the sequences of these two pilins and even less similarity to EcpA and EcpB of E. corrodens type strain ATCC 23834. EcpC from E. corrodens 31745 displayed high degrees of homology to the pilins of Neisseria gonorrhoeae and Pseudomonas aeruginosa. EcpD from E. corrodens 31745 showed the highest homologies with the pilin of one of the three P. aeruginosa classes, whereas EcpA and EcpB of strain ATCC 23834 most closely resemble Moraxella bovis pilins. These findings raise interesting questions about potential genetic transfer between different bacterial species, as opposed to convergent evolution.

  2. Spontaneous and divergent hexaploid triticales derived from common wheat × rye by complete elimination of D-genome chromosomes.

    Directory of Open Access Journals (Sweden)

    Hao Li

    Full Text Available Hexaploid triticale could be either synthesized by crossing tetraploid wheat with rye, or developed by crossing hexaploid wheat with a hexaploid triticale or an octoploid triticale.Here two hexaploid triticales with great morphologic divergence derived from common wheat cultivar M8003 (Triticum aestivum L. × Austrian rye (Secale cereale L. were reported, exhibiting high resistance for powdery mildew and stripe rust and potential for wheat improvement. Sequential fluorescence in situ hybridization (FISH and genomic in situ hybridization (GISH karyotyping revealed that D-genome chromosomes were completely eliminated and the whole A-genome, B-genome and R-genome chromosomes were retained in both lines. Furthermore, plentiful alterations of wheat chromosomes including 5A and 7B were detected in both triticales and additionally altered 5B, 7A chromosome and restructured chromosome 2A was assayed in N9116H and N9116M, respectively, even after selfing for several decades. Besides, meiotic asynchrony was displayed and a variety of storage protein variations were assayed, especially in the HMW/LMW-GS region and secalins region in both triticales.This study confirms that whole D-genome chromosomes could be preferentially eliminated in the hybrid of common wheat × rye, "genome shock" was accompanying the allopolyploidization of nascent triticales, and great morphologic divergence might result from the genetic variations. Moreover, new hexaploid triticale lines contributing potential resistance resources for wheat improvement were produced.

  3. Draft genome sequence of the Tibetan antelope.

    Science.gov (United States)

    Ge, Ri-Li; Cai, Qingle; Shen, Yong-Yi; San, A; Ma, Lan; Zhang, Yong; Yi, Xin; Chen, Yan; Yang, Lingfeng; Huang, Ying; He, Rongjun; Hui, Yuanyuan; Hao, Meirong; Li, Yue; Wang, Bo; Ou, Xiaohua; Xu, Jiaohui; Zhang, Yongfen; Wu, Kui; Geng, Chunyu; Zhou, Weiping; Zhou, Taicheng; Irwin, David M; Yang, Yingzhong; Ying, Liu; Bao, Haihua; Kim, Jaebum; Larkin, Denis M; Ma, Jian; Lewin, Harris A; Xing, Jinchuan; Platt, Roy N; Ray, David A; Auvil, Loretta; Capitanu, Boris; Zhang, Xiufeng; Zhang, Guojie; Murphy, Robert W; Wang, Jun; Zhang, Ya-Ping; Wang, Jian

    2013-01-01

    The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in genes associated with energy metabolism and oxygen transmission. Both the highland American pika, and the Tibetan antelope have signals of positive selection for genes involved in DNA repair and the production of ATPase. Genes associated with hypoxia seem to have experienced convergent evolution. Thus, our study suggests that common genetic mechanisms might have been utilized to enable high-altitude adaptation.

  4. A genome-wide BAC end-sequence survey of sugarcane elucidates genome composition, and identifies BACs covering much of the euchromatin.

    Science.gov (United States)

    Kim, Changsoo; Lee, Tae-Ho; Compton, Rosana O; Robertson, Jon S; Pierce, Gary J; Paterson, Andrew H

    2013-01-01

    BAC-end sequences (BESs) of hybrid sugarcane cultivar R570 are presented. A total of 66,990 informative BESs were obtained from 43,874 BAC clones. Similarity search using a variety of public databases revealed that 13.5 and 42.8 % of BESs match known gene-coding and repeat regions, respectively. That 11.7 % of BESs are still unmatched to any nucleotide sequences in the current public databases despite the fact that a close relative, sorghum, is fully sequenced, indicates that there may be many sugarcane-specific or lineage-specific sequences. We found 1,742 simple sequence repeat motifs in 1,585 BESs, spanning 27,383 bp in length. As simple sequence repeat markers derived from BESs have some advantages over randomly generated markers, these may be particularly useful for comparing BAC-based physical maps with genetic maps. BES and overgo hybridization information was used for anchoring sugarcane BAC clones to the sorghum genome sequence. While sorghum and sugarcane have extensive similarity in terms of genomic structure, only 2,789 BACs (6.4 %) could be confidently anchored to the sorghum genome at the stringent threshold of having both-end information (BESs or overgos) within 300 Kb. This relatively low rate of anchoring may have been caused in part by small- or large-scale genomic rearrangements in the Saccharum genus after two rounds of whole genome duplication since its divergence from the sorghum lineage about 7.8 million years ago. Limiting consideration to only low-copy matches, 1,245 BACs were placed to 1,503 locations, covering ~198 Mb of the sorghum genome or about 78 % of the estimated 252 Mb of euchromatin. BESs and their analyses presented here may provide an early profile of the sugarcane genome as well as a basis for BAC-by-BAC sequencing of much of the basic gene set of sugarcane.

  5. A distinct and divergent lineage of genomic island-associated Type IV Secretion Systems in Legionella.

    Science.gov (United States)

    Wee, Bryan A; Woolfit, Megan; Beatson, Scott A; Petty, Nicola K

    2013-01-01

    Legionella encodes multiple classes of Type IV Secretion Systems (T4SSs), including the Dot/Icm protein secretion system that is essential for intracellular multiplication in amoebal and human hosts. Other T4SSs not essential for virulence are thought to facilitate the acquisition of niche-specific adaptation genes including the numerous effector genes that are a hallmark of this genus. Previously, we identified two novel gene clusters in the draft genome of Legionella pneumophila strain 130b that encode homologues of a subtype of T4SS, the genomic island-associated T4SS (GI-T4SS), usually associated with integrative and conjugative elements (ICE). In this study, we performed genomic analyses of 14 homologous GI-T4SS clusters found in eight publicly available Legionella genomes and show that this cluster is unusually well conserved in a region of high plasticity. Phylogenetic analyses show that Legionella GI-T4SSs are substantially divergent from other members of this subtype of T4SS and represent a novel clade of GI-T4SSs only found in this genus. The GI-T4SS was found to be under purifying selection, suggesting it is functional and may play an important role in the evolution and adaptation of Legionella. Like other GI-T4SSs, the Legionella clusters are also associated with ICEs, but lack the typical integration and replication modules of related ICEs. The absence of complete replication and DNA pre-processing modules, together with the presence of Legionella-specific regulatory elements, suggest the Legionella GI-T4SS-associated ICE is unique and may employ novel mechanisms of regulation, maintenance and excision. The Legionella GI-T4SS cluster was found to be associated with several cargo genes, including numerous antibiotic resistance and virulence factors, which may confer a fitness benefit to the organism. The in-silico characterisation of this new T4SS furthers our understanding of the diversity of secretion systems involved in the frequent horizontal gene

  6. Whole genome sequencing of Chinese clearhead icefish, Protosalanx hyalocranius.

    Science.gov (United States)

    Liu, Kai; Xu, Dongpo; Li, Jia; Bian, Chao; Duan, Jinrong; Zhou, Yanfeng; Zhang, Minying; You, Xinxin; You, Yang; Chen, Jieming; Yu, Hui; Xu, Gangchun; Fang, Di-An; Qiang, Jun; Jiang, Shulun; He, Jie; Xu, Junmin; Shi, Qiong; Zhang, Zhiyong; Xu, Pao

    2017-04-01

    Chinese clearhead icefish, Protosalanx hyalocranius , is a representative icefish species with economic importance and special appearance. Due to its great economic value in China, the fish was introduced into Lake Dianchi and several other lakes from the Lake Taihu half a century ago. Similar to the Sinocyclocheilus cavefish, the clearhead icefish has certain cavefish-like traits, such as transparent body and nearly scaleless skin. Here, we provide the whole genome sequence of this surface-dwelling fish and generated a draft genome assembly, aiming at exploring molecular mechanisms for the biological interests. A total of 252.1 Gb of raw reads were sequenced. Subsequently, a novel draft genome assembly was generated, with the scaffold N50 reaching 1.163 Mb. The genome completeness was estimated to be 98.39 % by using the CEGMA evaluation. Finally, we annotated 19 884 protein-coding genes and observed that repeat sequences account for 24.43 % of the genome assembly. We report the first draft genome of the Chinese clearhead icefish. The genome assembly will provide a solid foundation for further molecular breeding and germplasm resource protection in Chinese clearhead icefish, as well as other icefishes. It is also a valuable genetic resource for revealing the molecular mechanisms for the cavefish-like characters.

  7. Genome-wide sequence variations among Mycobacterium avium subspecies paratuberculosis.

    Directory of Open Access Journals (Sweden)

    Chung-Yi eHsu

    2011-12-01

    Full Text Available Mycobacterium avium subspecies paratuberculosis (M. ap, the causative agent of Johne’s disease (JD, infects many farmed ruminants, wildlife animals and humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole genome sequences of several M. ap and M. avium subspecies avium (M. avium strains isolated from various hosts and environments. Using Next-generation sequencing technology, all 6 M. ap isolates showed a high percentage of homology (98% to the reference genome sequence of M. ap K-10 isolated from cattle. However, 2 M. avium isolates (DT 78 and Env 77 showed significant sequence diversity from the reference strain M. avium 104. The genomes of M. avium isolates DT 78 and Env 77 exhibited only 87% and 40% homology, respectively, to the M. avium 104 reference genome. Within the M. ap isolates, genomic rearrangements (insertions/deletions, Indels were not detected, and only unique single nucleotide polymorphisms (SNPs were observed among the 6 M. ap strains. While most of the SNPs (~100 in M. ap genomes were non-synonymous, a total of ~ 6000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomic analysis showed that isolates from goat and Oryx are closely related to the cattle (K-10 strain while the human isolate (M. ap 4B is closely related to the environmental strains, indicating environmental source to human infections. Overall, SNPs were the most common variations among M. ap isolates while SNPs in addition to Indels were prevalent among M. avium isolates. Genomic variations will be useful in designing host-specific markers for the analysis of mycobacterial evolution and for developing novel diagnostics directed against Johne’s disease in animals.

  8. The diploid genome sequence of an individual human.

    Directory of Open Access Journals (Sweden)

    Samuel Levy

    2007-09-01

    Full Text Available Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel included 3,213,401 single nucleotide polymorphisms (SNPs, 53,823 block substitutions (2-206 bp, 292,102 heterozygous insertion/deletion events (indels(1-571 bp, 559,473 homozygous indels (1-82,711 bp, 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

  9. Comparison of methods for genomic localization of gene trap sequences

    Directory of Open Access Journals (Sweden)

    Ferrin Thomas E

    2006-09-01

    Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

  10. Complete mitochondrial genome sequence of three Tetrahymena species reveals mutation hot spots and accelerated nonsynonymous substitutions in Ymf genes.

    Directory of Open Access Journals (Sweden)

    Mike M Moradian

    Full Text Available The ciliate Tetrahymena, a model organism, contains divergent mitochondrial (Mt genome with unusual properties, where half of its 44 genes still remain without a definitive function. These genes could be categorized into two major groups of KPC (known protein coding and Ymf (genes without an identified function. To gain insights into the mechanisms underlying gene divergence and molecular evolution of Tetrahymena (T. Mt genomes, we sequenced three Mt genomes of T.paravorax, T.pigmentosa, and T.malaccensis. These genomes were aligned and the analyses were carried out using several programs that calculate distance, nucleotide substitution (dn/ds, and their rate ratios (omega on individual codon sites and via a sliding window approach. Comparative genomic analysis indicated a conserved putative transcription control sequence, a GC box, in a region where presumably transcription and replication initiate. We also found distinct features in Mt genome of T.paravorax despite similar genome organization among these approximately 47 kb long linear genomes. Another significant finding was the presence of at least one or more highly variable regions in Ymf genes where majority of substitutions were concentrated. These regions were mutation hotspots where elevated distances and the dn/ds ratios were primarily due to an increase in the number of nonsynonymous substitutions, suggesting relaxed selective constraint. However, in a few Ymf genes, accelerated rates of nonsynonymous substitutions may be due to positive selection. Similarly, on protein level the majority of amino acid replacements occurred in these regions. Ymf genes comprise half of the genes in Tetrahymena Mt genomes, so understanding why they have not been assigned definitive functions is an important aspect of molecular evolution. Importantly, nucleotide substitution types and rates suggest possible reasons for not being able to find homologues for Ymf genes. Additionally, comparative genomic

  11. Projector 2 : contig mapping for efficient gap-closure of prokaryotic genome sequence assemblies

    NARCIS (Netherlands)

    van Hijum, SAFT; Zomer, AL; Kuipers, OP; Kok, J

    2005-01-01

    With genome sequencing efforts increasing exponentially, valuable information accumulates on genomic content of the various organisms sequenced. Projector 2 uses (un) finished genomic sequences of an organism as a template to infer linkage information for a genome sequence assembly of a related orga

  12. Complete genome sequence of Nitrobacter hamburgensis X14 and comparative genomic analysis of species within the genus Nitrobacter.

    Energy Technology Data Exchange (ETDEWEB)

    Starkenburg, Shawn R [Oregon State University; Larimer, Frank W [ORNL; Stein, Lisa Y [University of California, Riverside; Klotz, Martin G [University of Louisville, Louisville; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Sayavedra-Soto, LA [Oregon State University; Poret-Peterson, Amisha T. [University of Louisville, Louisville; Gentry, ME [University of Louisville, Louisville; Arp, D J [Oregon State University; Ward, Bess B. [Princeton University; Bottomley, Peter J [Oregon State University

    2008-05-01

    The alphaproteobacterium Nitrobacter hamburgensis X14 is a gram-negative facultative chemolithoautotroph that conserves energy from the oxidation of nitrite to nitrate. Sequencing and analysis of the Nitrobacter hamburgensis X14 genome revealed four replicons comprised of one chromosome (4.4 Mbp) and three plasmids (294, 188, and 121 kbp). Over 20% of the genome is composed of pseudogenes and paralogs. Whole-genome comparisons were conducted between N. hamburgensis and the finished and draft genome sequences of Nitrobacter winogradskyi and Nitrobacter sp. strain Nb-311A, respectively. Most of the plasmid-borne genes were unique to N. hamburgensis and encode a variety of functions (central metabolism, energy conservation, conjugation, and heavy metal resistance), yet approximately 21 kb of a approximately 28-kb "autotrophic" island on the largest plasmid was conserved in the chromosomes of Nitrobacter winogradskyi Nb-255 and Nitrobacter sp. strain Nb-311A. The N. hamburgensis chromosome also harbors many unique genes, including those for heme-copper oxidases, cytochrome b(561), and putative pathways for the catabolism of aromatic, organic, and one-carbon compounds, which help verify and extend its mixotrophic potential. A Nitrobacter "subcore" genome was also constructed by removing homologs found in strains of the closest evolutionary relatives, Bradyrhizobium japonicum and Rhodopseudomonas palustris. Among the Nitrobacter subcore inventory (116 genes), copies of genes or gene clusters for nitrite oxidoreductase (NXR), cytochromes associated with a dissimilatory nitrite reductase (NirK), PII-like regulators, and polysaccharide formation were identified. Many of the subcore genes have diverged significantly from, or have origins outside, the alphaproteobacterial lineage and may indicate some of the unique genetic requirements for nitrite oxidation in Nitrobacter.

  13. Deep whole-genome sequencing of 100 southeast Asian Malays.

    Science.gov (United States)

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

  14. Complete genome sequence of Nakamurella multipartita type strain (Y-104).

    Science.gov (United States)

    Tice, Hope; Mayilraj, Shanmugam; Sims, David; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Copeland, Alex; Cheng, Jan-Fang; Meincke, Linda; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-03-30

    Nakamurella multipartita (Yoshimi et al. 1996) Tao et al. 2004 is the type species of the monospecific genus Nakamurella in the actinobacterial suborder Frankineae. The nonmotile, coccus-shaped strain was isolated from activated sludge acclimated with sugar-containing synthetic wastewater, and is capable of accumulating large amounts of polysaccharides in its cells. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Nakamurellaceae. The 6,060,298 bp long single replicon genome with its 5415 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  15. Complete genome sequence of Desulfotomaculum acetoxidans type strain (5575T)

    Energy Technology Data Exchange (ETDEWEB)

    Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Schroder, Maren [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Gleim, Dorothea [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sims, David [Los Alamos National Laboratory (LANL); Meincke, Linda [Los Alamos National Laboratory (LANL); Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Brettin, Tom [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Han, Cliff [Los Alamos National Laboratory (LANL)

    2009-01-01

    Desulfotomaculum acetoxidans Widdel and Pfennig 1977 was one of the first sulfate-reducing bacteria known to grow with acetate as sole energy and carbon source. It is able to oxidize substrates completely to carbon dioxide with sulfate as the electron acceptor, which is reduced to hydrogen sulfide. All available data about this species are based on strain 5575T, isolated from piggery waste in Germany. Here we describe the features of this organ-ism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a Desulfotomaculum species with validly published name. The 4,545,624 bp long single replicon genome with its 4370 protein-coding and 100 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  16. Draft genome sequence of the rubber tree Hevea brasiliensis

    Directory of Open Access Journals (Sweden)

    Rahman Ahmad Yamin Abdul

    2013-02-01

    Full Text Available Abstract Background Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR. NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. Results Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. Conclusions The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber.

  17. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  18. Complete genome sequence of Arcobacter nitrofigilis type strain (CIT)

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Gronow, Sabine [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Chertkov, Olga [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2010-01-01

    Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the epsilonproteobacterial family Campylobacteraceae. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel. roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  19. The complete plastid genome sequence of Bomarea edulis (Alstroemeriaceae: Liliales).

    Science.gov (United States)

    Kim, Jung Sung; Kim, Hyoung Tae; Yoon, Chang Young; Kim, Joo-Hwan

    2016-05-01

    Bomarea, a member of the family Alstroemeriaceae, is distributed from Chile to Mexico and includes approximately 120 species. Recent molecular phylogenetic studies have clarified the monophyly of the family within the order Liliales and the sister relationship with the family Colchicaceae. At this time, five plastid genomes of Liliales have been analyzed at the familial level. To examine plastid genome variation at the generic level, we sequenced the plastid genome of Bomarea edulis, which is the most widely distributed species in the genus, and compared it with Alstroemeria aurea. The plastid genome sequence of B. edulis was 154,925 bp in length with a similar structure as A. aurea, excluding the IR-LSC junction. Ycf68 and infA were pseudogenes caused by frameshift mutations, and the ycf15 gene was deleted, similar to A. aurea.

  20. Complete genome sequence of Tsukamurella paurometabola type strain (no. 33).

    Science.gov (United States)

    Munk, A Christine; Lapidus, Alla; Lucas, Susan; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Del Rio, Tijana Glavina; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Huntemann, Marcel; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Tapia, Roxanne; Han, Cliff; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Yasawong, Montri; Brambilla, Evelyne-Marie; Rohde, Manfred; Sikorski, Johannes; Göker, Markus; Detter, John C; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2011-07-01

    Tsukamurella paurometabola corrig. (Steinhaus 1941) Collins et al. 1988 is the type species of the genus Tsukamurella, which is the type genus to the family Tsukamurellaceae. The species is not only of interest because of its isolated phylogenetic location, but also because it is a human opportunistic pathogen with some strains of the species reported to cause lung infection, lethal meningitis, and necrotizing tenosynovitis. This is the first completed genome sequence of a member of the genus Tsukamurella and the first genome sequence of a member of the family Tsukamurellaceae. The 4,479,724 bp long genome contains a 99,806 bp long plasmid and a total of 4,335 protein-coding and 56 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.