WorldWideScience

Sample records for partial genome assembly

  1. Hapsembler: An Assembler for Highly Polymorphic Genomes

    Science.gov (United States)

    Donmez, Nilgun; Brudno, Michael

    As whole genome sequencing has become a routine biological experiment, algorithms for assembly of whole genome shotgun data has become a topic of extensive research, with a plethora of off-the-shelf methods that can reconstruct the genomes of many organisms. Simultaneously, several recently sequenced genomes exhibit very high polymorphism rates. For these organisms genome assembly remains a challenge as most assemblers are unable to handle highly divergent haplotypes in a single individual. In this paper we describe Hapsembler, an assembler for highly polymorphic genomes, which makes use of paired reads. Our experiments show that Hapsembler produces accurate and contiguous assemblies of highly polymorphic genomes, while performing on par with the leading tools on haploid genomes. Hapsembler is available for download at http://compbio.cs.toronto.edu/hapsembler.

  2. V-GAP: Viral genome assembly pipeline

    KAUST Repository

    Nakamura, Yoji

    2015-10-22

    Next-generation sequencing technologies have allowed the rapid determination of the complete genomes of many organisms. Although shotgun sequences from large genome organisms are still difficult to reconstruct perfect contigs each of which represents a full chromosome, those from small genomes have been assembled successfully into a very small number of contigs. In this study, we show that shotgun reads from phage genomes can be reconstructed into a single contig by controlling the number of read sequences used in de novo assembly. We have developed a pipeline to assemble small viral genomes with good reliability using a resampling method from shotgun data. This pipeline, named V-GAP (Viral Genome Assembly Pipeline), will contribute to the rapid genome typing of viruses, which are highly divergent, and thus will meet the increasing need for viral genome comparisons in metagenomic studies.

  3. V-GAP: Viral genome assembly pipeline

    KAUST Repository

    Nakamura, Yoji; Yasuike, Motoshige; Nishiki, Issei; Iwasaki, Yuki; Fujiwara, Atushi; Kawato, Yasuhiko; Nakai, Toshihiro; Nagai, Satoshi; Kobayashi, Takanori; Gojobori, Takashi; Ototake, Mitsuru

    2015-01-01

    Next-generation sequencing technologies have allowed the rapid determination of the complete genomes of many organisms. Although shotgun sequences from large genome organisms are still difficult to reconstruct perfect contigs each of which represents a full chromosome, those from small genomes have been assembled successfully into a very small number of contigs. In this study, we show that shotgun reads from phage genomes can be reconstructed into a single contig by controlling the number of read sequences used in de novo assembly. We have developed a pipeline to assemble small viral genomes with good reliability using a resampling method from shotgun data. This pipeline, named V-GAP (Viral Genome Assembly Pipeline), will contribute to the rapid genome typing of viruses, which are highly divergent, and thus will meet the increasing need for viral genome comparisons in metagenomic studies.

  4. Genome Sequence Databases (Overview): Sequencing and Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  5. Human Contamination in Public Genome Assemblies.

    Science.gov (United States)

    Kryukov, Kirill; Imanishi, Tadashi

    2016-01-01

    Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases.

  6. Extreme-Scale De Novo Genome Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Georganas, Evangelos [Intel Corporation, Santa Clara, CA (United States); Hofmeyr, Steven [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Egan, Rob [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Buluc, Aydin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Rokhsar, Daniel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Yelick, Katherine [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.

    2017-09-26

    De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and the large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.

  7. QUAST: quality assessment tool for genome assemblies.

    Science.gov (United States)

    Gurevich, Alexey; Saveliev, Vladislav; Vyahhi, Nikolay; Tesler, Glenn

    2013-04-15

    Limitations of genome sequencing techniques have led to dozens of assembly algorithms, none of which is perfect. A number of methods for comparing assemblers have been developed, but none is yet a recognized benchmark. Further, most existing methods for comparing assemblies are only applicable to new assemblies of finished genomes; the problem of evaluating assemblies of previously unsequenced species has not been adequately considered. Here, we present QUAST-a quality assessment tool for evaluating and comparing genome assemblies. This tool improves on leading assembly comparison software with new ideas and quality metrics. QUAST can evaluate assemblies both with a reference genome, as well as without a reference. QUAST produces many reports, summary tables and plots to help scientists in their research and in their publications. In this study, we used QUAST to compare several genome assemblers on three datasets. QUAST tables and plots for all of them are available in the Supplementary Material, and interactive versions of these reports are on the QUAST website. http://bioinf.spbau.ru/quast . Supplementary data are available at Bioinformatics online.

  8. Pseudo Boolean Programming for Partially Ordered Genomes

    Science.gov (United States)

    Angibaud, Sébastien; Fertin, Guillaume; Thévenin, Annelyse; Vialette, Stéphane

    Comparing genomes of different species is a crucial problem in comparative genomics. Different measures have been proposed to compare two genomes: number of common intervals, number of adjacencies, number of reversals, etc. These measures are classically used between two totally ordered genomes. However, genetic mapping techniques often give rise to different maps with some unordered genes. Starting from a partial order between genes of a genome, one method to find a total order consists in optimizing a given measure between a linear extension of this partial order and a given total order of a close and well-known genome. However, for most common measures, the problem turns out to be NP-hard. In this paper, we propose a (0,1)-linear programming approach to compute a linear extension of one genome that maximizes the number of common intervals (resp. the number of adjacencies) between this linear extension and a given total order. Next, we propose an algorithm to find linear extensions of two partial orders that maximize the number of adjacencies.

  9. Quality Assessment of Domesticated Animal Genome Assemblies

    DEFF Research Database (Denmark)

    Seemann, Stefan E; Anthon, Christian; Palasca, Oana

    2015-01-01

    affected by the lack of genomic sequence. Herein, we quantify the quality of the genome assemblies of 20 domesticated animals and related species by assessing a range of measurable parameters, and we show that there is a positive correlation between the fraction of mappable reads from RNAseq data...... domesticated animal genomes still need to be sequenced deeper in order to produce high-quality assemblies. In the meanwhile, ironically, the extent to which RNAseq and other next-generation data is produced frequently far exceeds that of the genomic sequence. Furthermore, basic comparative analysis is often...

  10. Genomics using the Assembly of the Mink Genome

    DEFF Research Database (Denmark)

    Guldbrandtsen, Bernt; Cai, Zexi; Sahana, Goutam

    2018-01-01

    The American Mink’s (Neovison vison) genome has recently been sequenced. This opens numerous avenues of research both for studying the basic genetics and physiology of the mink as well as genetic improvement in mink. Using genotyping-by-sequencing (GBS) generated marker data for 2,352 Danish farm...... mink runs of homozygosity (ROH) were detect in mink genomes. Detectable ROH made up on average 1.7% of the genome indicating the presence of at most a moderate level of genomic inbreeding. The fraction of genome regions found in ROH varied. Ten percent of the included regions were never found in ROH....... The ability to detect ROH in the mink genome also demonstrates the general reliability of the new mink genome assembly. Keywords: american mink, run of homozygosity, genome, selection, genomic inbreeding...

  11. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  12. SAGE: String-overlap Assembly of GEnomes.

    Science.gov (United States)

    Ilie, Lucian; Haider, Bahlul; Molnar, Michael; Solis-Oba, Roberto

    2014-09-15

    De novo genome assembly of next-generation sequencing data is one of the most important current problems in bioinformatics, essential in many biological applications. In spite of significant amount of work in this area, better solutions are still very much needed. We present a new program, SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers. SAGE benefits from innovations in almost every aspect of the assembly process: error correction of input reads, string-overlap graph construction, read copy counts estimation, overlap graph analysis and reduction, contig extraction, and scaffolding. We hope that these new ideas will help advance the current state-of-the-art in an essential area of research in genomics.

  13. Assembly of viral genomes from metagenomes

    Directory of Open Access Journals (Sweden)

    Saskia L Smits

    2014-12-01

    Full Text Available Viral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow rapid phylogenetic characterization of these new viruses. Often, however, complete viral genomes are not recovered, but rather several distinct contigs derived from a single entity, some of which have no sequence homology to any known proteins. De novo assembly of single viruses from a metagenome is challenging, not only because of the lack of a reference genome, but also because of intrapopulation variation and uneven or insufficient coverage. Here we explored different assembly algorithms, remote homology searches, genome-specific sequence motifs, k-mer frequency ranking, and coverage profile binning to detect and obtain viral target genomes from metagenomes. All methods were tested on 454-generated sequencing datasets containing three recently described RNA viruses with a relatively large genome which were divergent to previously known viruses from the viral families Rhabdoviridae and Coronaviridae. Depending on specific characteristics of the target virus and the metagenomic community, different assembly and in silico gap closure strategies were successful in obtaining near complete viral genomes.

  14. The Amaranth Genome: Genome, Transcriptome, and Physical Map Assembly

    Directory of Open Access Journals (Sweden)

    J. W. Clouse

    2016-03-01

    Full Text Available Amaranth ( L. is an emerging pseudocereal native to the New World that has garnered increased attention in recent years because of its nutritional quality, in particular its seed protein and more specifically its high levels of the essential amino acid lysine. It belongs to the Amaranthaceae family, is an ancient paleopolyploid that shows disomic inheritance (2 = 32, and has an estimated genome size of 466 Mb. Here we present a high-quality draft genome sequence of the grain amaranth. The genome assembly consisted of 377 Mb in 3518 scaffolds with an N of 371 kb. Repetitive element analysis predicted that 48% of the genome is comprised of repeat sequences, of which -like elements were the most commonly classified retrotransposon. A de novo transcriptome consisting of 66,370 contigs was assembled from eight different amaranth tissue and abiotic stress libraries. Annotation of the genome identified 23,059 protein-coding genes. Seven grain amaranths (, , and and their putative progenitor ( were resequenced. A single nucleotide polymorphism (SNP phylogeny supported the classification of as the progenitor species of the grain amaranths. Lastly, we generated a de novo physical map for using the BioNano Genomics’ Genome Mapping platform. The physical map spanned 340 Mb and a hybrid assembly using the BioNano physical maps nearly doubled the N of the assembly to 697 kb. Moreover, we analyzed synteny between amaranth and sugar beet ( L. and estimated, using analysis, the age of the most recent polyploidization event in amaranth.

  15. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    Energy Technology Data Exchange (ETDEWEB)

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  16. Enabling Graph Appliance for Genome Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Singh, Rina [ORNL; Graves, Jeffrey A [ORNL; Lee, Sangkeun (Matt) [ORNL; Sukumar, Sreenivas R [ORNL; Shankar, Mallikarjun [ORNL

    2015-01-01

    In recent years, there has been a huge growth in the amount of genomic data available as reads generated from various genome sequencers. The number of reads generated can be huge, ranging from hundreds to billions of nucleotide, each varying in size. Assembling such large amounts of data is one of the challenging computational problems for both biomedical and data scientists. Most of the genome assemblers developed have used de Bruijn graph techniques. A de Bruijn graph represents a collection of read sequences by billions of vertices and edges, which require large amounts of memory and computational power to store and process. This is the major drawback to de Bruijn graph assembly. Massively parallel, multi-threaded, shared memory systems can be leveraged to overcome some of these issues. The objective of our research is to investigate the feasibility and scalability issues of de Bruijn graph assembly on Cray s Urika-GD system; Urika-GD is a high performance graph appliance with a large shared memory and massively multithreaded custom processor designed for executing SPARQL queries over large-scale RDF data sets. However, to the best of our knowledge, there is no research on representing a de Bruijn graph as an RDF graph or finding Eulerian paths in RDF graphs using SPARQL for potential genome discovery. In this paper, we address the issues involved in representing a de Bruin graphs as RDF graphs and propose an iterative querying approach for finding Eulerian paths in large RDF graphs. We evaluate the performance of our implementation on real world ebola genome datasets and illustrate how genome assembly can be accomplished with Urika-GD using iterative SPARQL queries.

  17. A partial grid for a nuclear reactor fuel assembly

    International Nuclear Information System (INIS)

    Demario, E.E.

    1985-01-01

    The invention relates to a nuclear-reactor fuel assembly including fuel-rod supporting transverse grids. The fuel assembly includes at least one additional transverse grid which is disposed between two fuel-rod supporting grids and consists of at least one partial grid structure extending across only a portion of the fuel assembly and having fuel rods and control-rod guide thimbles of only said portion extending therethrough. The partial grid structure includes means for providing lateral support of the fuel rods and/or means for laterally deflecting coolant flow, and it is formed of inter-leaved inner straps and border straps, the interleaved inner straps preferably being of substantially smaller height than the border straps to reduce the amount of material capable of parasitically absorbing neutrons. The additional transverse grid may comprise several partial grid structures associated with different groups of fuel rods of the fuel assembly

  18. De novo assembly of a haplotype-resolved human genome.

    Science.gov (United States)

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang; Huang, Shujia; Sun, Yuhui; Tong, Xin; Xie, Yinlong; Liu, Binghang; Yang, Hailong; Zheng, Hancheng; Li, Jian; Li, Bo; Wang, Yu; Yang, Fang; Sun, Peng; Liu, Siyang; Gao, Peng; Huang, Haodong; Sun, Jing; Chen, Dan; He, Guangzhu; Huang, Weihua; Huang, Zheng; Li, Yue; Tellier, Laurent C A M; Liu, Xiao; Feng, Qiang; Xu, Xun; Zhang, Xiuqing; Bolund, Lars; Krogh, Anders; Kristiansen, Karsten; Drmanac, Radoje; Drmanac, Snezana; Nielsen, Rasmus; Li, Songgang; Wang, Jian; Yang, Huanming; Li, Yingrui; Wong, Gane Ka-Shu; Wang, Jun

    2015-06-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-genome shotgun strategies, based solely on next-generation sequencing and hierarchical assembly methods. We applied our sequencing method to the genome of an Asian individual and generated a 5.15-Gb assembled genome with a haplotype N50 of 484 kb. Our analysis identified previously undetected indels and 7.49 Mb of novel coding sequences that could not be aligned to the human reference genome, which include at least six predicted genes. This haplotype-resolved genome represents the most complete de novo human genome assembly to date. Application of our approach to identify individual haplotype differences should aid in translating genotypes to phenotypes for the development of personalized medicine.

  19. Improved de novo genomic assembly for the domestic donkey

    DEFF Research Database (Denmark)

    Renaud, Gabriel; Petersen, Bent; Seguin-Orlando, Andaine

    2018-01-01

    Donkeys and horses share a common ancestor dating back to about 4 million years ago. Although a high-quality genome assembly at the chromosomal level is available for the horse, current assemblies available for the donkey are limited to moderately sized scaffolds. The absence of a better......-quality assembly for the donkey has hampered studies involving the characterization of patterns of genetic variation at the genome-wide scale. These range from the application of genomic tools to selective breeding and conservation to the more fundamental characterization of the genomic loci underlying speciation...... and domestication. We present a new high-quality donkey genome assembly obtained using the Chicago HiRise assembly technology, providing scaffolds of subchromosomal size. We make use of this new assembly to obtain more accurate measures of heterozygosity for equine species other than the horse, both genome...

  20. Automated ensemble assembly and validation of microbial genomes

    Science.gov (United States)

    2014-01-01

    Background The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. Results To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Conclusions Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to

  1. GRAbB : Selective Assembly of Genomic Regions, a New Niche for Genomic Research

    NARCIS (Netherlands)

    Brankovics, Balázs; Zhang, Hao; van Diepeningen, Anne D; van der Lee, Theo A J; Waalwijk, Cees; de Hoog, G Sybren

    GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often

  2. The A, C, G, and T of Genome Assembly

    Directory of Open Access Journals (Sweden)

    Bilal Wajid

    2016-01-01

    Full Text Available Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research.

  3. The A, C, G, and T of Genome Assembly.

    Science.gov (United States)

    Wajid, Bilal; Sohail, Muhammad U; Ekti, Ali R; Serpedin, Erchin

    2016-01-01

    Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research.

  4. Genome Assembly Forensics: Metrics for Assessing Assembly Correctness (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Pop, Mihai

    2011-10-13

    University of Maryland's Mihai Pop on Genome Assembly Forensics: Metrics for Assessing Assembly Correctness at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  5. An efficient approach to BAC based assembly of complex genomes.

    Science.gov (United States)

    Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David

    2016-01-01

    There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.

  6. Towards accurate de novo assembly for genomes with repeats

    NARCIS (Netherlands)

    Bucur, Doina

    2017-01-01

    De novo genome assemblers designed for short k-mer length or using short raw reads are unlikely to recover complex features of the underlying genome, such as repeats hundreds of bases long. We implement a stochastic machine-learning method which obtains accurate assemblies with repeats and

  7. Programming biological operating systems: genome design, assembly and activation.

    Science.gov (United States)

    Gibson, Daniel G

    2014-05-01

    The DNA technologies developed over the past 20 years for reading and writing the genetic code converged when the first synthetic cell was created 4 years ago. An outcome of this work has been an extraordinary set of tools for synthesizing, assembling, engineering and transplanting whole bacterial genomes. Technical progress, options and applications for bacterial genome design, assembly and activation are discussed.

  8. A comparative evaluation of genome assembly reconciliation tools.

    Science.gov (United States)

    Alhakami, Hind; Mirebrahim, Hamid; Lonardi, Stefano

    2017-05-18

    The majority of eukaryotic genomes are unfinished due to the algorithmic challenges of assembling them. A variety of assembly and scaffolding tools are available, but it is not always obvious which tool or parameters to use for a specific genome size and complexity. It is, therefore, common practice to produce multiple assemblies using different assemblers and parameters, then select the best one for public release. A more compelling approach would allow one to merge multiple assemblies with the intent of producing a higher quality consensus assembly, which is the objective of assembly reconciliation. Several assembly reconciliation tools have been proposed in the literature, but their strengths and weaknesses have never been compared on a common dataset. We fill this need with this work, in which we report on an extensive comparative evaluation of several tools. Specifically, we evaluate contiguity, correctness, coverage, and the duplication ratio of the merged assembly compared to the individual assemblies provided as input. None of the tools we tested consistently improved the quality of the input GAGE and synthetic assemblies. Our experiments show an increase in contiguity in the consensus assembly when the original assemblies already have high quality. In terms of correctness, the quality of the results depends on the specific tool, as well as on the quality and the ranking of the input assemblies. In general, the number of misassemblies ranges from being comparable to the best of the input assembly to being comparable to the worst of the input assembly.

  9. Mind the gap; seven reasons to close fragmented genome assemblies.

    Science.gov (United States)

    Thomma, Bart P H J; Seidl, Michael F; Shi-Kunne, Xiaoqian; Cook, David E; Bolton, Melvin D; van Kan, Jan A L; Faino, Luigi

    2016-05-01

    Like other domains of life, research into the biology of filamentous microbes has greatly benefited from the advent of whole-genome sequencing. Next-generation sequencing (NGS) technologies have revolutionized sequencing, making genomic sciences accessible to many academic laboratories including those that study non-model organisms. Thus, hundreds of fungal genomes have been sequenced and are publically available today, although these initiatives have typically yielded considerably fragmented genome assemblies that often lack large contiguous genomic regions. Many important genomic features are contained in intergenic DNA that is often missing in current genome assemblies, and recent studies underscore the significance of non-coding regions and repetitive elements for the life style, adaptability and evolution of many organisms. The study of particular types of genetic elements, such as telomeres, centromeres, repetitive elements, effectors, and clusters of co-regulated genes, but also of phenomena such as structural rearrangements, genome compartmentalization and epigenetics, greatly benefits from having a contiguous and high-quality, preferably even complete and gapless, genome assembly. Here we discuss a number of important reasons to produce gapless, finished, genome assemblies to help answer important biological questions. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Improved de novo genomic assembly for the domestic donkey

    Science.gov (United States)

    Newton, Richard; Paillot, Romain; Bryant, Neil; Vaudin, Mark

    2018-01-01

    Donkeys and horses share a common ancestor dating back to about 4 million years ago. Although a high-quality genome assembly at the chromosomal level is available for the horse, current assemblies available for the donkey are limited to moderately sized scaffolds. The absence of a better-quality assembly for the donkey has hampered studies involving the characterization of patterns of genetic variation at the genome-wide scale. These range from the application of genomic tools to selective breeding and conservation to the more fundamental characterization of the genomic loci underlying speciation and domestication. We present a new high-quality donkey genome assembly obtained using the Chicago HiRise assembly technology, providing scaffolds of subchromosomal size. We make use of this new assembly to obtain more accurate measures of heterozygosity for equine species other than the horse, both genome-wide and locally, and to detect runs of homozygosity potentially pertaining to positive selection in domestic donkeys. Finally, this new assembly allowed us to identify fine-scale chromosomal rearrangements between the horse and the donkey that likely played an active role in their divergence and, ultimately, speciation. PMID:29740610

  11. Improved de novo genomic assembly for the domestic donkey.

    Science.gov (United States)

    Renaud, Gabriel; Petersen, Bent; Seguin-Orlando, Andaine; Bertelsen, Mads Frost; Waller, Andrew; Newton, Richard; Paillot, Romain; Bryant, Neil; Vaudin, Mark; Librado, Pablo; Orlando, Ludovic

    2018-04-01

    Donkeys and horses share a common ancestor dating back to about 4 million years ago. Although a high-quality genome assembly at the chromosomal level is available for the horse, current assemblies available for the donkey are limited to moderately sized scaffolds. The absence of a better-quality assembly for the donkey has hampered studies involving the characterization of patterns of genetic variation at the genome-wide scale. These range from the application of genomic tools to selective breeding and conservation to the more fundamental characterization of the genomic loci underlying speciation and domestication. We present a new high-quality donkey genome assembly obtained using the Chicago HiRise assembly technology, providing scaffolds of subchromosomal size. We make use of this new assembly to obtain more accurate measures of heterozygosity for equine species other than the horse, both genome-wide and locally, and to detect runs of homozygosity potentially pertaining to positive selection in domestic donkeys. Finally, this new assembly allowed us to identify fine-scale chromosomal rearrangements between the horse and the donkey that likely played an active role in their divergence and, ultimately, speciation.

  12. AutoAssemblyD: a graphical user interface system for several genome assemblers.

    Science.gov (United States)

    Veras, Adonney Allan de Oliveira; de Sá, Pablo Henrique Caracciolo Gomes; Azevedo, Vasco; Silva, Artur; Ramos, Rommel Thiago Jucá

    2013-01-01

    Next-generation sequencing technologies have increased the amount of biological data generated. Thus, bioinformatics has become important because new methods and algorithms are necessary to manipulate and process such data. However, certain challenges have emerged, such as genome assembly using short reads and high-throughput platforms. In this context, several algorithms have been developed, such as Velvet, Abyss, Euler-SR, Mira, Edna, Maq, SHRiMP, Newbler, ALLPATHS, Bowtie and BWA. However, most such assemblers do not have a graphical interface, which makes their use difficult for users without computing experience given the complexity of the assembler syntax. Thus, to make the operation of such assemblers accessible to users without a computing background, we developed AutoAssemblyD, which is a graphical tool for genome assembly submission and remote management by multiple assemblers through XML templates. AssemblyD is freely available at https://sourceforge.net/projects/autoassemblyd. It requires Sun jdk 6 or higher.

  13. GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes.

    Science.gov (United States)

    Yuan, Lina; Yu, Yang; Zhu, Yanmin; Li, Yulai; Li, Changqing; Li, Rujiao; Ma, Qin; Siu, Gilman Kit-Hang; Yu, Jun; Jiang, Taijiao; Xiao, Jingfa; Kang, Yu

    2017-01-25

    Next-generation sequencing (NGS) technologies have greatly promoted the genomic study of prokaryotes. However, highly fragmented assemblies due to short reads from NGS are still a limiting factor in gaining insights into the genome biology. Reference-assisted tools are promising in genome assembly, but tend to result in false assembly when the assigned reference has extensive rearrangements. Herein, we present GAAP, a genome assembly pipeline for scaffolding based on core-gene-defined Genome Organizational Framework (cGOF) described in our previous study. Instead of assigning references, we use the multiple-reference-derived cGOFs as indexes to assist in order and orientation of the scaffolds and build a skeleton structure, and then use read pairs to extend scaffolds, called local scaffolding, and distinguish between true and chimeric adjacencies in the scaffolds. In our performance tests using both empirical and simulated data of 15 genomes in six species with diverse genome size, complexity, and all three categories of cGOFs, GAAP outcompetes or achieves comparable results when compared to three other reference-assisted programs, AlignGraph, Ragout and MeDuSa. GAAP uses both cGOF and pair-end reads to create assemblies in genomic scale, and performs better than the currently available reference-assisted assembly tools as it recovers more assemblies and makes fewer false locations, especially for species with extensive rearranged genomes. Our method is a promising solution for reconstruction of genome sequence from short reads of NGS.

  14. An Improved Genome Assembly of Azadirachta indica A. Juss.

    Directory of Open Access Journals (Sweden)

    Neeraja M. Krishnan

    2016-07-01

    Full Text Available Neem (Azadirachta indica A. Juss., an evergreen tree of the Meliaceae family, is known for its medicinal, cosmetic, pesticidal and insecticidal properties. We had previously sequenced and published the draft genome of a neem plant, using mainly short read sequencing data. In this report, we present an improved genome assembly generated using additional short reads from Illumina and long reads from Pacific Biosciences SMRT sequencer. We assembled short reads and error-corrected long reads using Platanus, an assembler designed to perform well for heterozygous genomes. The updated genome assembly (v2.0 yielded 3- and 3.5-fold increase in N50 and N75, respectively; 2.6-fold decrease in the total number of scaffolds; 1.25-fold increase in the number of valid transcriptome alignments; 13.4-fold less misassembly and 1.85-fold increase in the percentage repeat, over the earlier assembly (v1.0. The current assembly also maps better to the genes known to be involved in the terpenoid biosynthesis pathway. Together, the data represent an improved assembly of the A. indica genome.

  15. Evaluation of nine popular de novo assemblers in microbial genome assembly.

    Science.gov (United States)

    Forouzan, Esmaeil; Maleki, Masoumeh Sadat Mousavi; Karkhane, Ali Asghar; Yakhchali, Bagher

    2017-12-01

    Next generation sequencing (NGS) technologies are revolutionizing biology, with Illumina being the most popular NGS platform. Short read assembly is a critical part of most genome studies using NGS. Hence, in this study, the performance of nine well-known assemblers was evaluated in the assembly of seven different microbial genomes. Effect of different read coverage and k-mer parameters on the quality of the assembly were also evaluated on both simulated and actual read datasets. Our results show that the performance of assemblers on real and simulated datasets could be significantly different, mainly because of coverage bias. According to outputs on actual read datasets, for all studied read coverages (of 7×, 25× and 100×), SPAdes and IDBA-UD clearly outperformed other assemblers based on NGA50 and accuracy metrics. Velvet is the most conservative assembler with the lowest NGA50 and error rate. Copyright © 2017. Published by Elsevier B.V.

  16. Assembling draft genomes using contiBAIT

    OpenAIRE

    O'Neill, Kieran; Hills, Mark; Gottlieb, Mike; Borkowski, Matthew; Karsan, Aly; Lansdorp, Peter M.

    2017-01-01

    A Summary: Massively parallel sequencing is now widely used, but data interpretation is only as good as the reference assembly to which it is aligned. While the number of reference assemblies has rapidly expanded, most of these remain at intermediate stages of completion, either as scaffold builds, or as chromosome builds (consisting of correctly ordered, but not necessarily correctly oriented scaffolds separated by gaps). Completion of de novo assemblies remains difficult, as regions that ar...

  17. Dramatic improvement in genome assembly achieved using doubled-haploid genomes.

    Science.gov (United States)

    Zhang, Hong; Tan, Engkong; Suzuki, Yutaka; Hirose, Yusuke; Kinoshita, Shigeharu; Okano, Hideyuki; Kudoh, Jun; Shimizu, Atsushi; Saito, Kazuyoshi; Watabe, Shugo; Asakawa, Shuichi

    2014-10-27

    Improvement in de novo assembly of large genomes is still to be desired. Here, we improved draft genome sequence quality by employing doubled-haploid individuals. We sequenced wildtype and doubled-haploid Takifugu rubripes genomes, under the same conditions, using the Illumina platform and assembled contigs with SOAPdenovo2. We observed 5.4-fold and 2.6-fold improvement in the sizes of the N50 contig and scaffold of doubled-haploid individuals, respectively, compared to the wildtype, indicating that the use of a doubled-haploid genome aids in accurate genome analysis.

  18. Ten steps to get started in Genome Assembly and Annotation

    Science.gov (United States)

    Dominguez Del Angel, Victoria; Hjerde, Erik; Sterck, Lieven; Capella-Gutierrez, Salvadors; Notredame, Cederic; Vinnere Pettersson, Olga; Amselem, Joelle; Bouri, Laurent; Bocs, Stephanie; Klopp, Christophe; Gibrat, Jean-Francois; Vlasova, Anna; Leskosek, Brane L.; Soler, Lucile; Binzer-Panchal, Mahesh; Lantz, Henrik

    2018-01-01

    As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR). PMID:29568489

  19. Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data.

    Science.gov (United States)

    Al-Nakeeb, Kosai; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas

    2017-11-21

    Whole-genome sequencing (WGS) projects provide short read nucleotide sequences from nuclear and possibly organelle DNA depending on the source of origin. Mitochondrial DNA is present in animals and fungi, while plants contain DNA from both mitochondria and chloroplasts. Current techniques for separating organelle reads from nuclear reads in WGS data require full reference or partial seed sequences for assembling. Norgal (de Novo ORGAneLle extractor) avoids this requirement by identifying a high frequency subset of k-mers that are predominantly of mitochondrial origin and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences in the range from 98.5 to 99.5%. We also assembled the chloroplasts of grape vines and cucumbers using Norgal together with seed-based de novo assemblers. Norgal is a pipeline that can extract and assemble full or partial mitochondrial and chloroplast genomes from WGS short reads without prior knowledge. The program is available at: https://bitbucket.org/kosaidtu/norgal .

  20. Assembly of viral genomes from metagenomes

    NARCIS (Netherlands)

    S.L. Smits (Saskia); R. Bodewes (Rogier); A. Ruiz-Gonzalez (Aritz); V. Baumgärtner (Volkmar); M.P.G. Koopmans D.V.M. (Marion); A.D.M.E. Osterhaus (Albert); A. Schürch (Anita)

    2014-01-01

    textabstractViral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow

  1. Oxford Nanopore MinION Sequencing and Genome Assembly

    Directory of Open Access Journals (Sweden)

    Hengyun Lu

    2016-10-01

    Full Text Available The revolution of genome sequencing is continuing after the successful second-generation sequencing (SGS technology. The third-generation sequencing (TGS technology, led by Pacific Biosciences (PacBio, is progressing rapidly, moving from a technology once only capable of providing data for small genome analysis, or for performing targeted screening, to one that promises high quality de novo assembly and structural variation detection for human-sized genomes. In 2014, the MinION, the first commercial sequencer using nanopore technology, was released by Oxford Nanopore Technologies (ONT. MinION identifies DNA bases by measuring the changes in electrical conductivity generated as DNA strands pass through a biological pore. Its portability, affordability, and speed in data production makes it suitable for real-time applications, the release of the long read sequencer MinION has thus generated much excitement and interest in the genomics community. While de novo genome assemblies can be cheaply produced from SGS data, assembly continuity is often relatively poor, due to the limited ability of short reads to handle long repeats. Assembly quality can be greatly improved by using TGS long reads, since repetitive regions can be easily expanded into using longer sequencing lengths, despite having higher error rates at the base level. The potential of nanopore sequencing has been demonstrated by various studies in genome surveillance at locations where rapid and reliable sequencing is needed, but where resources are limited.

  2. Cyprinus carpio Genome sequencing and assembly

    NARCIS (Netherlands)

    Kolder, I.C.R.M.; Plas-Duivesteijn, van der Suzanne J.; Tan, G.; Wiegertjes, G.; Forlenza, M.; Guler, A.T.; Travin, D.Y.; Nakao, M.; Moritomo, T.; Irnazarow, I.; Jansen, H.J.

    2013-01-01

    Sequencing of the common carp (Cyprinus carpio carpio Linnaeus, 1758) genome, with the objective of establishing carp as a model organism to supplement the closely related zebrafish (Danio rerio). The sequenced individual is a homozygous female (by gynogenesis) of R3 x R8 carp, the heterozygous

  3. De novo assembly and phasing of a Korean human genome.

    Science.gov (United States)

    Seo, Jeong-Sun; Rhie, Arang; Kim, Junsoo; Lee, Sangjin; Sohn, Min-Hwan; Kim, Chang-Uk; Hastie, Alex; Cao, Han; Yun, Ji-Young; Kim, Jihye; Kuk, Junho; Park, Gun Hwa; Kim, Juhyeok; Ryu, Hanna; Kim, Jongbum; Roh, Mira; Baek, Jeonghun; Hunkapiller, Michael W; Korlach, Jonas; Shin, Jong-Yeon; Kim, Changhoon

    2016-10-13

    Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of

  4. Genome Assembly and Computational Analysis Pipelines for Bacterial Pathogens

    KAUST Repository

    Rangkuti, Farania Gama Ardhina

    2011-06-01

    Pathogens lie behind the deadliest pandemics in history. To date, AIDS pandemic has resulted in more than 25 million fatal cases, while tuberculosis and malaria annually claim more than 2 million lives. Comparative genomic analyses are needed to gain insights into the molecular mechanisms of pathogens, but the abundance of biological data dictates that such studies cannot be performed without the assistance of computational approaches. This explains the significant need for computational pipelines for genome assembly and analyses. The aim of this research is to develop such pipelines. This work utilizes various bioinformatics approaches to analyze the high-­throughput genomic sequence data that has been obtained from several strains of bacterial pathogens. A pipeline has been compiled for quality control for sequencing and assembly, and several protocols have been developed to detect contaminations. Visualization has been generated of genomic data in various formats, in addition to alignment, homology detection and sequence variant detection. We have also implemented a metaheuristic algorithm that significantly improves bacterial genome assemblies compared to other known methods. Experiments on Mycobacterium tuberculosis H37Rv data showed that our method resulted in improvement of N50 value of up to 9697% while consistently maintaining high accuracy, covering around 98% of the published reference genome. Other improvement efforts were also implemented, consisting of iterative local assemblies and iterative correction of contiguated bases. Our result expedites the genomic analysis of virulent genes up to single base pair resolution. It is also applicable to virtually every pathogenic microorganism, propelling further research in the control of and protection from pathogen-­associated diseases.

  5. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes

    DEFF Research Database (Denmark)

    Nielsen, Henrik Bjørn; Almeida, Mathieu; Juncker, Agnieszka

    2014-01-01

    of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify...

  6. A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.

    Science.gov (United States)

    Swain, Martin T; Tsai, Isheng J; Assefa, Samual A; Newbold, Chris; Berriman, Matthew; Otto, Thomas D

    2012-06-07

    Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ∼24 h: it doubles the average contig size and annotates over 4,300 gene models.

  7. Draft Genome Assembly of a Wolbachia Endosymbiont of Plutella australiana.

    Science.gov (United States)

    Ward, Christopher M; Baxter, Simon W

    2017-10-26

    Wolbachia spp. are endosymbiotic bacteria that infect around 50% of arthropods and cause a broad range of effects, including manipulating host reproduction. Here, we present the annotated draft genome assembly of Wolbachia strain wAus, which infects Plutella australiana , a cryptic ally of the major Brassica pest Plutella xylostella (diamondback moth). Copyright © 2017 Ward and Baxter.

  8. Genomes correction and assembling: present methods and tools

    Science.gov (United States)

    Wojcieszek, Michał; Pawełkowicz, Magdalena; Nowak, Robert; Przybecki, Zbigniew

    2014-11-01

    Recent rapid development of next generation sequencing (NGS) technologies provided significant impact into genomics field of study enabling implementation of many de novo sequencing projects of new species which was previously confined by technological costs. Along with advancement of NGS there was need for adjustment in assembly programs. New algorithms must cope with massive amounts of data computation in reasonable time limits and processing power and hardware is also an important factor. In this paper, we address the issue of assembly pipeline for de novo genome assembly provided by programs presently available for scientist both as commercial and as open - source software. The implementation of four different approaches - Greedy, Overlap - Layout - Consensus (OLC), De Bruijn and Integrated resulting in variation of performance is the main focus of our discussion with additional insight into issue of short and long reads correction.

  9. SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Large Scale

    Energy Technology Data Exchange (ETDEWEB)

    Meng, Jintao; Seo, Sangmin; Balaji, Pavan; Wei, Yanjie; Wang, Bingqiang; Feng, Shengzhong

    2016-08-16

    In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.

  10. Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor; Kohler, Annegret; Martin, Francis

    2013-03-08

    Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012 alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.

  11. SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores.

    Science.gov (United States)

    Meng, Jintao; Wang, Bingqiang; Wei, Yanjie; Feng, Shengzhong; Balaji, Pavan

    2014-01-01

    There is a widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these sequencing data. Traditional assembly methods requiring long execution time and large amount of memory on a single workstation limit their use on these massive data. This paper presents a highly scalable assembler named as SWAP-Assembler for processing massive sequencing data using thousands of cores, where SWAP is an acronym for Small World Asynchronous Parallel model. In the paper, a mathematical description of multi-step bi-directed graph (MSG) is provided to resolve the computational interdependence on merging edges, and a highly scalable computational framework for SWAP is developed to automatically preform the parallel computation of all operations. Graph cleaning and contig extension are also included for generating contigs with high quality. Experimental results show that SWAP-Assembler scales up to 2048 cores on Yanhuang dataset using only 26 minutes, which is better than several other parallel assemblers, such as ABySS, Ray, and PASHA. Results also show that SWAP-Assembler can generate high quality contigs with good N50 size and low error rate, especially it generated the longest N50 contig sizes for Fish and Yanhuang datasets. In this paper, we presented a highly scalable and efficient genome assembly software, SWAP-Assembler. Compared with several other assemblers, it showed very good performance in terms of scalability and contig quality. This software is available at: https://sourceforge.net/projects/swapassembler.

  12. Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures

    KAUST Repository

    Kleftogiannis, Dimitrios A.; Kalnis, Panos; Bajic, Vladimir B.

    2013-01-01

    methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment

  13. A draft genome assembly of the army worm, Spodoptera frugiperda.

    Science.gov (United States)

    Kakumani, Pavan Kumar; Malhotra, Pawan; Mukherjee, Sunil K; Bhatnagar, Raj K

    2014-08-01

    Spodoptera is an agriculturally important pest insect and studies in understanding its biology have been limited by the unavailability of its genome. In the present study, the genomic DNA was sequenced and assembled into 37,243 scaffolds of size, 358 Mb with N50 of 53.7 kb. Based on degree of identity, we could anchor 305 Mb of the genome onto all the 28 chromosomes of Bombyx mori. Repeat elements were identified, which accounts for 20.28% of the total genome. Further, we predicted 11,595 genes, with an average intron length of 726 bp. The genes were annotated and domain analysis revealed that Sf genes share a significant homology and expression pattern with B. mori, despite differences in KOG gene categories and representation of certain protein families. The present study on Sf genome would help in the characterization of cellular pathways to understand its biology and comparative evolutionary studies among lepidopteran family members to help annotate their genomes. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. An Efficient Genome Fragment Assembling Using GA with Neighborhood Aware Fitness Function

    Directory of Open Access Journals (Sweden)

    Satoko Kikuchi

    2012-01-01

    Full Text Available To decode a long genome sequence, shotgun sequencing is the state-of-the-art technique. It needs to properly sequence a very large number, sometimes as large as millions, of short partially readable strings (fragments. Arranging those fragments in correct sequence is known as fragment assembling, which is an NP-problem. Presently used methods require enormous computational cost. In this work, we have shown how our modified genetic algorithm (GA could solve this problem efficiently. In the proposed GA, the length of the chromosome, which represents the volume of the search space, is reduced with advancing generations, and thereby improves search efficiency. We also introduced a greedy mutation, by swapping nearby fragments using some heuristics, to improve the fitness of chromosomes. We compared results with Parsons’ algorithm which is based on GA too. We used fragments with partial reads on both sides, mimicking fragments in real genome assembling process. In Parsons’ work base-pair array of the whole fragment is known. Even then, we could obtain much better results, and we succeeded in restructuring contigs covering 100% of the genome sequences.

  15. GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research.

    Directory of Open Access Journals (Sweden)

    Balázs Brankovics

    2016-06-01

    Full Text Available GRAbB (Genomic Region Assembly by Baiting is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome, extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a, as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04, Fedora (23, CentOS (7.1.1503 and Mac OS X (10.7. Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/.

  16. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    Science.gov (United States)

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  17. A chromosomal genomics approach to assess and validate the desi and kabuli draft chickpea genome assemblies

    Czech Academy of Sciences Publication Activity Database

    Ruperao, P.; Chan, C.K.K.; Azam, S.; Karafiátová, Miroslava; Hayashi, S.; Čížková, Jana; Šimková, Hana; Vrána, Jan; Doležel, Jaroslav; Varshney, R.K.; Edwards, D.

    2014-01-01

    Roč. 12, č. 6 (2014), s. 778-786 ISSN 1467-7644 R&D Projects: GA ČR GBP501/12/G090; GA MŠk(CZ) LO1204 Institutional support: RVO:61389030 Keywords : chickpea * genome assembly * cytogenetics Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 5.752, year: 2014

  18. Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies.

    Science.gov (United States)

    Card, Daren C; Schield, Drew R; Reyes-Velasco, Jacobo; Fujita, Matthew K; Andrew, Audra L; Oyler-McCance, Sara J; Fike, Jennifer A; Tomback, Diana F; Ruggiero, Robert P; Castoe, Todd A

    2014-01-01

    As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5-5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.

  19. Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies

    Science.gov (United States)

    Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthre K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.

    2014-01-01

    As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (~3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.

  20. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes.

    Science.gov (United States)

    Nielsen, H Bjørn; Almeida, Mathieu; Juncker, Agnieszka Sierakowska; Rasmussen, Simon; Li, Junhua; Sunagawa, Shinichi; Plichta, Damian R; Gautier, Laurent; Pedersen, Anders G; Le Chatelier, Emmanuelle; Pelletier, Eric; Bonde, Ida; Nielsen, Trine; Manichanh, Chaysavanh; Arumugam, Manimozhiyan; Batto, Jean-Michel; Quintanilha Dos Santos, Marcelo B; Blom, Nikolaj; Borruel, Natalia; Burgdorf, Kristoffer S; Boumezbeur, Fouad; Casellas, Francesc; Doré, Joël; Dworzynski, Piotr; Guarner, Francisco; Hansen, Torben; Hildebrand, Falk; Kaas, Rolf S; Kennedy, Sean; Kristiansen, Karsten; Kultima, Jens Roat; Léonard, Pierre; Levenez, Florence; Lund, Ole; Moumen, Bouziane; Le Paslier, Denis; Pons, Nicolas; Pedersen, Oluf; Prifti, Edi; Qin, Junjie; Raes, Jeroen; Sørensen, Søren; Tap, Julien; Tims, Sebastian; Ussery, David W; Yamada, Takuji; Renault, Pierre; Sicheritz-Ponten, Thomas; Bork, Peer; Wang, Jun; Brunak, Søren; Ehrlich, S Dusko

    2014-08-01

    Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.

  1. Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231.

    Science.gov (United States)

    Baptista, Rodrigo P; Reis-Cunha, Joao Luis; DeBarry, Jeremy D; Chiari, Egler; Kissinger, Jessica C; Bartholomeu, Daniella C; Macedo, Andrea M

    2018-02-14

    Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.

  2. From NGS assembly challenges to instability of fungal mitochondrial genomes: A case study in genome complexity.

    Science.gov (United States)

    Misas, Elizabeth; Muñoz, José Fernando; Gallo, Juan Esteban; McEwen, Juan Guillermo; Clay, Oliver Keatinge

    2016-04-01

    The presence of repetitive or non-unique DNA persisting over sizable regions of a eukaryotic genome can hinder the genome's successful de novo assembly from short reads: ambiguities in assigning genome locations to the non-unique subsequences can result in premature termination of contigs and thus overfragmented assemblies. Fungal mitochondrial (mtDNA) genomes are compact (typically less than 100 kb), yet often contain short non-unique sequences that can be shown to impede their successful de novo assembly in silico. Such repeats can also confuse processes in the cell in vivo. A well-studied example is ectopic (out-of-register, illegitimate) recombination associated with repeat pairs, which can lead to deletion of functionally important genes that are located between the repeats. Repeats that remain conserved over micro- or macroevolutionary timescales despite such risks may indicate functionally or structurally (e.g., for replication) important regions. This principle could form the basis of a mining strategy for accelerating discovery of function in genome sequences. We present here our screening of a sample of 11 fully sequenced fungal mitochondrial genomes by observing where exact k-mer repeats occurred several times; initial analyses motivated us to focus on 17-mers occurring more than three times. Based on the diverse repeats we observe, we propose that such screening may serve as an efficient expedient for gaining a rapid but representative first insight into the repeat landscapes of sparsely characterized mitochondrial chromosomes. Our matching of the flagged repeats to previously reported regions of interest supports the idea that systems of persisting, non-trivial repeats in genomes can often highlight features meriting further attention. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. GAViT: Genome Assembly Visualization Tool for Short Read Data

    Energy Technology Data Exchange (ETDEWEB)

    Syed, Aijazuddin; Shapiro, Harris; Tu, Hank; Pangilinan, Jasmyn; Trong, Stephan

    2008-03-14

    It is a challenging job for genome analysts to accurately debug, troubleshoot, and validate genome assembly results. Genome analysts rely on visualization tools to help validate and troubleshoot assembly results, including such problems as mis-assemblies, low-quality regions, and repeats. Short read data adds further complexity and makes it extremely challenging for the visualization tools to scale and to view all needed assembly information. As a result, there is a need for a visualization tool that can scale to display assembly data from the new sequencing technologies. We present Genome Assembly Visualization Tool (GAViT), a highly scalable and interactive assembly visualization tool developed at the DOE Joint Genome Institute (JGI).

  4. Challenges, Solutions, and Quality Metrics of Personal Genome Assembly in Advancing Precision Medicine

    Directory of Open Access Journals (Sweden)

    Wenming Xiao

    2016-04-01

    Full Text Available Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging “third generation sequencing” technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host

  5. Virtual Genome Walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.

    Science.gov (United States)

    Evans, Teri; Johnson, Andrew D; Loose, Matthew

    2018-01-12

    Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .

  6. Large-scale parallel genome assembler over cloud computing environment.

    Science.gov (United States)

    Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

    2017-06-01

    The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.

  7. De novo assembly of a haplotype-resolved human genome

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Wu, Honglong; Luo, Ruibang

    2015-01-01

    The human genome is diploid, and knowledge of the variants on each chromosome is important for the interpretation of genomic information. Here we report the assembly of a haplotype-resolved diploid genome without using a reference genome. Our pipeline relies on fosmid pooling together with whole-...

  8. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...

  9. Genomic characterization of large heterochromatic gaps in the human genome assembly.

    Directory of Open Access Journals (Sweden)

    Nicolas Altemose

    2014-05-01

    Full Text Available The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3. The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.

  10. GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers.

    Directory of Open Access Journals (Sweden)

    Sebastian Jünemann

    Full Text Available De novo genome assembly is the process of reconstructing a complete genomic sequence from countless small sequencing reads. Due to the complexity of this task, numerous genome assemblers have been developed to cope with different requirements and the different kinds of data provided by sequencers within the fast evolving field of next-generation sequencing technologies. In particular, the recently introduced generation of benchtop sequencers, like Illumina's MiSeq and Ion Torrent's Personal Genome Machine (PGM, popularized the easy, fast, and cheap sequencing of bacterial organisms to a broad range of academic and clinical institutions. With a strong pragmatic focus, here, we give a novel insight into the line of assembly evaluation surveys as we benchmark popular de novo genome assemblers based on bacterial data generated by benchtop sequencers. Therefore, single-library assemblies were generated, assembled, and compared to each other by metrics describing assembly contiguity and accuracy, and also by practice-oriented criteria as for instance computing time. In addition, we extensively analyzed the effect of the depth of coverage on the genome assemblies within reasonable ranges and the k-mer optimization problem of de Bruijn Graph assemblers. Our results show that, although both MiSeq and PGM allow for good genome assemblies, they require different approaches. They not only pair with different assembler types, but also affect assemblies differently regarding the depth of coverage where oversampling can become problematic. Assemblies vary greatly with respect to contiguity and accuracy but also by the requirement on the computing power. Consequently, no assembler can be rated best for all preconditions. Instead, the given kind of data, the demands on assembly quality, and the available computing infrastructure determines which assembler suits best. The data sets, scripts and all additional information needed to replicate our results are freely

  11. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...

  12. Reducing assembly complexity of microbial genomes with single-molecule sequencing

    Science.gov (United States)

    Genome assembly algorithms cannot fully reconstruct microbial chromosomes from the DNA reads output by first or second-generation sequencing instruments. Therefore, most genomes are left unfinished due to the significant resources required to manually close gaps left in the draft assemblies. Single-...

  13. Extensive error in the number of genes inferred from draft genome assemblies.

    Directory of Open Access Journals (Sweden)

    James F Denton

    2014-12-01

    Full Text Available Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.

  14. Evaluation of the Cow Rumen Metagenome: Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Sczyrba, Alex

    2011-10-13

    DOE JGI's Alex Sczyrba on "Evaluation of the Cow Rumen Metagenome" and "Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  15. Improvement of the Threespine Stickleback Genome Using a Hi-C-Based Proximity-Guided Assembly.

    Science.gov (United States)

    Peichel, Catherine L; Sullivan, Shawn T; Liachko, Ivan; White, Michael A

    2017-09-01

    Scaffolding genomes into complete chromosome assemblies remains challenging even with the rapidly increasing sequence coverage generated by current next-generation sequence technologies. Even with scaffolding information, many genome assemblies remain incomplete. The genome of the threespine stickleback (Gasterosteus aculeatus), a fish model system in evolutionary genetics and genomics, is not completely assembled despite scaffolding with high-density linkage maps. Here, we first test the ability of a Hi-C based proximity-guided assembly (PGA) to perform a de novo genome assembly from relatively short contigs. Using Hi-C based PGA, we generated complete chromosome assemblies from a distribution of short contigs (20-100 kb). We found that 96.40% of contigs were correctly assigned to linkage groups (LGs), with ordering nearly identical to the previous genome assembly. Using available bacterial artificial chromosome (BAC) end sequences, we provide evidence that some of the few discrepancies between the Hi-C assembly and the existing assembly are due to structural variation between the populations used for the 2 assemblies or errors in the existing assembly. This Hi-C assembly also allowed us to improve the existing assembly, assigning over 60% (13.35 Mb) of the previously unassigned (~21.7 Mb) contigs to LGs. Together, our results highlight the potential of the Hi-C based PGA method to be used in combination with short read data to perform relatively inexpensive de novo genome assemblies. This approach will be particularly useful in organisms in which it is difficult to perform linkage mapping or to obtain high molecular weight DNA required for other scaffolding methods. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. Comparing Memory-Efficient Genome Assemblers on Stand-Alone and Cloud Infrastructures

    KAUST Repository

    Kleftogiannis, Dimitrios A.

    2013-09-27

    A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.

  17. Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures.

    Science.gov (United States)

    Kleftogiannis, Dimitrios; Kalnis, Panos; Bajic, Vladimir B

    2013-01-01

    A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.

  18. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...... or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high...

  19. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference

    DEFF Research Database (Denmark)

    Maretty, Lasse; Jensen, Jacob Malte; Petersen, Bent

    2017-01-01

    Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome...... or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high......-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set...

  20. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    Energy Technology Data Exchange (ETDEWEB)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; Harmon-Smith, Miranda; Doud, Devin; Reddy, T. B. K.; Schulz, Frederik; Jarett, Jessica; Rivers, Adam R.; Eloe-Fadrosh, Emiley A.; Tringe, Susannah G.; Ivanova, Natalia N.; Copeland, Alex; Clum, Alicia; Becraft, Eric D.; Malmstrom, Rex R.; Birren, Bruce; Podar, Mircea; Bork, Peer; Weinstock, George M.; Garrity, George M.; Dodsworth, Jeremy A.; Yooseph, Shibu; Sutton, Granger; Glöckner, Frank O.; Gilbert, Jack A.; Nelson, William C.; Hallam, Steven J.; Jungbluth, Sean P.; Ettema, Thijs J. G.; Tighe, Scott; Konstantinidis, Konstantinos T.; Liu, Wen-Tso; Baker, Brett J.; Rattei, Thomas; Eisen, Jonathan A.; Hedlund, Brian; McMahon, Katherine D.; Fierer, Noah; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Tyson, Gene W.; Rinke, Christian; Kyrpides, Nikos C.; Schriml, Lynn; Garrity, George M.; Hugenholtz, Philip; Sutton, Granger; Yilmaz, Pelin; Meyer, Folker; Glöckner, Frank O.; Gilbert, Jack A.; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Lapidus, Alla; Meyer, Folker; Yilmaz, Pelin; Parks, Donovan H.; Eren, A. M.; Schriml, Lynn; Banfield, Jillian F.; Hugenholtz, Philip; Woyke, Tanja

    2017-08-08

    We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.

  1. Analysis Of Transcriptomes In A Porcine Tissue Collection Using RNA-Seq And Genome Assembly 10

    DEFF Research Database (Denmark)

    Hornshøj, Henrik; Thomsen, Bo; Hedegaard, Jakob

    2011-01-01

    The release of Sus scrofa genome assembly 10 supports improvement of the pig genome annotation and in depth transcriptome analyses using next-generation sequencing technologies. In this study we analyze RNA-seq reads from a tissue collection, including 10 separate tissues from Duroc boars and 10...... short read alignment software we mapped the reads to the genome assembly 10. We extracted contig sequences of gene transcripts using the Cufflinks software. Based on this information we identified expressed genes that are present in the genome assembly. The portion of these genes being previously known...... was roughly estimated by sequence comparison to known genes. Similarly, we searched for genes that are expressed in the tissues but not present in the genome assembly by aligning the non-genome-mapped reads to known gene transcripts. For the genes predicted to have alternative transcript variants by Cufflinks...

  2. De Novo Genome and Transcriptome Assembly of the Canadian Beaver (Castor canadensis

    Directory of Open Access Journals (Sweden)

    Si Lok

    2017-02-01

    Full Text Available The Canadian beaver (Castor canadensis is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 × long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 × and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon–gene models derived from 9805 full-length open reading frames (FL-ORFs constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology.

  3. De novo assembly of human genomes with massively parallel short read sequencing

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue

    2010-01-01

    genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities...... for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way....

  4. Detection and correction of false segmental duplications caused by genome mis-assembly

    Science.gov (United States)

    2010-01-01

    Diploid genomes with divergent chromosomes present special problems for assembly software as two copies of especially polymorphic regions may be mistakenly constructed, creating the appearance of a recent segmental duplication. We developed a method for identifying such false duplications and applied it to four vertebrate genomes. For each genome, we corrected mis-assemblies, improved estimates of the amount of duplicated sequence, and recovered polymorphisms between the sequenced chromosomes. PMID:20219098

  5. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

    Science.gov (United States)

    Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

    2016-07-01

    The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  6. A high-quality carrot genome assembly provides new insights into carotenoid accumulation and asterid genome evolution

    Science.gov (United States)

    We report a chromosome-scale assembly and analysis of the Daucus carota genome, an important source of provitamin A in the human diet and the first sequenced genome among members of the Euasterid II clade. We characterized two new polyploidization events, both occurring after the divergence of carro...

  7. Genome-wide engineering of an infectious clone of herpes simplex virus type 1 using synthetic genomics assembly methods.

    Science.gov (United States)

    Oldfield, Lauren M; Grzesik, Peter; Voorhies, Alexander A; Alperovich, Nina; MacMath, Derek; Najera, Claudia D; Chandra, Diya Sabrina; Prasad, Sanjana; Noskov, Vladimir N; Montague, Michael G; Friedman, Robert M; Desai, Prashant J; Vashee, Sanjay

    2017-10-17

    Here, we present a transformational approach to genome engineering of herpes simplex virus type 1 (HSV-1), which has a large DNA genome, using synthetic genomics tools. We believe this method will enable more rapid and complex modifications of HSV-1 and other large DNA viruses than previous technologies, facilitating many useful applications. Yeast transformation-associated recombination was used to clone 11 fragments comprising the HSV-1 strain KOS 152 kb genome. Using overlapping sequences between the adjacent pieces, we assembled the fragments into a complete virus genome in yeast, transferred it into an Escherichia coli host, and reconstituted infectious virus following transfection into mammalian cells. The virus derived from this yeast-assembled genome, KOS YA , replicated with kinetics similar to wild-type virus. We demonstrated the utility of this modular assembly technology by making numerous modifications to a single gene, making changes to two genes at the same time and, finally, generating individual and combinatorial deletions to a set of five conserved genes that encode virion structural proteins. While the ability to perform genome-wide editing through assembly methods in large DNA virus genomes raises dual-use concerns, we believe the incremental risks are outweighed by potential benefits. These include enhanced functional studies, generation of oncolytic virus vectors, development of delivery platforms of genes for vaccines or therapy, as well as more rapid development of countermeasures against potential biothreats.

  8. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

    Science.gov (United States)

    Schneider, Valerie A; Graves-Lindsay, Tina; Howe, Kerstin; Bouk, Nathan; Chen, Hsiu-Chuan; Kitts, Paul A; Murphy, Terence D; Pruitt, Kim D; Thibaud-Nissen, Françoise; Albracht, Derek; Fulton, Robert S; Kremitzki, Milinn; Magrini, Vincent; Markovic, Chris; McGrath, Sean; Steinberg, Karyn Meltz; Auger, Kate; Chow, William; Collins, Joanna; Harden, Glenn; Hubbard, Timothy; Pelan, Sarah; Simpson, Jared T; Threadgold, Glen; Torrance, James; Wood, Jonathan M; Clarke, Laura; Koren, Sergey; Boitano, Matthew; Peluso, Paul; Li, Heng; Chin, Chen-Shan; Phillippy, Adam M; Durbin, Richard; Wilson, Richard K; Flicek, Paul; Eichler, Evan E; Church, Deanna M

    2017-05-01

    The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health. © 2017 Schneider et al.; Published by Cold Spring Harbor Laboratory Press.

  9. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

    Science.gov (United States)

    Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas

    2013-06-01

    We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

  10. Assembly of the Complete Sitka Spruce Chloroplast Genome Using 10X Genomics' GemCode Sequencing Data.

    Directory of Open Access Journals (Sweden)

    Lauren Coombe

    Full Text Available The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis. Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.

  11. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    Energy Technology Data Exchange (ETDEWEB)

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives

  12. Multiple hybrid de novo genome assembly of finger millet, an orphan allotetraploid crop.

    Science.gov (United States)

    Hatakeyama, Masaomi; Aluri, Sirisha; Balachadran, Mathi Thumilan; Sivarajan, Sajeevan Radha; Patrignani, Andrea; Grüter, Simon; Poveda, Lucy; Shimizu-Inatsugi, Rie; Baeten, John; Francoijs, Kees-Jan; Nataraja, Karaba N; Reddy, Yellodu A Nanja; Phadnis, Shamprasad; Ravikumar, Ramapura L; Schlapbach, Ralph; Sreeman, Sheshshayee M; Shimizu, Kentaro K

    2017-09-05

    Finger millet (Eleusine coracana (L.) Gaertn) is an important crop for food security because of its tolerance to drought, which is expected to be exacerbated by global climate changes. Nevertheless, it is often classified as an orphan/underutilized crop because of the paucity of scientific attention. Among several small millets, finger millet is considered as an excellent source of essential nutrient elements, such as iron and zinc; hence, it has potential as an alternate coarse cereal. However, high-quality genome sequence data of finger millet are currently not available. One of the major problems encountered in the genome assembly of this species was its polyploidy, which hampers genome assembly compared with a diploid genome. To overcome this problem, we sequenced its genome using diverse technologies with sufficient coverage and assembled it via a novel multiple hybrid assembly workflow that combines next-generation with single-molecule sequencing, followed by whole-genome optical mapping using the Bionano Irys® system. The total number of scaffolds was 1,897 with an N50 length >2.6 Mb and detection of 96% of the universal single-copy orthologs. The majority of the homeologs were assembled separately. This indicates that the proposed workflow is applicable to the assembly of other allotetraploid genomes. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  13. Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools.

    Science.gov (United States)

    Kisand, Veljo; Lettieri, Teresa

    2013-04-01

    De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom. The error rate of pyrosequencing the Alcanivorax borkumensis genome was such that thousands of insertions and deletions were artificially introduced into the finished genome. Despite a high coverage (~30 fold), it did not allow the reference genome to be fully mapped. Reads from regions with errors had low quality, low coverage, or were missing. The main defect of the reference mapping was the introduction of artificial indels into contigs through lower than 100% consensus and distracting gene calling due to artificial stop codons. No assembler was able to perform de novo assembly comparable to reference mapping. Automated annotation tools performed similarly on reference mapped and de novo draft genomes, and annotated most CDSs in the de novo assembled draft genomes. Free and open source software (FOSS) tools for assembly and annotation of NGS data are being developed rapidly to provide accurate results with less computational effort. Usability is not high priority and these tools currently do not allow the data to be processed without manual intervention. Despite this, genome assemblers now readily assemble medium short reads into long contigs (>97-98% genome coverage). A notable gap in pyrosequencing technology is the quality of base pair calling and conflicting base pairs between single reads at the same nucleotide position. Regardless, using draft whole genomes that are not finished and remain fragmented into tens of contigs allows one to characterize

  14. Microsatellite marker development by partial sequencing of the sour passion fruit genome (Passiflora edulis Sims).

    Science.gov (United States)

    Araya, Susan; Martins, Alexandre M; Junqueira, Nilton T V; Costa, Ana Maria; Faleiro, Fábio G; Ferreira, Márcio E

    2017-07-21

    The Passiflora genus comprises hundreds of wild and cultivated species of passion fruit used for food, industrial, ornamental and medicinal purposes. Efforts to develop genomic tools for genetic analysis of P. edulis, the most important commercial Passiflora species, are still incipient. In spite of many recognized applications of microsatellite markers in genetics and breeding, their availability for passion fruit research remains restricted. Microsatellite markers in P. edulis are usually limited in number, show reduced polymorphism, and are mostly based on compound or imperfect repeats. Furthermore, they are confined to only a few Passiflora species. We describe the use of NGS technology to partially assemble the P. edulis genome in order to develop hundreds of new microsatellite markers. A total of 14.11 Gbp of Illumina paired-end sequence reads were analyzed to detect simple sequence repeat sites in the sour passion fruit genome. A sample of 1300 contigs containing perfect repeat microsatellite sequences was selected for PCR primer development. Panels of di- and tri-nucleotide repeat markers were then tested in P. edulis germplasm accessions for validation. DNA polymorphism was detected in 74% of the markers (PIC = 0.16 to 0.77; number of alleles/locus = 2 to 7). A core panel of highly polymorphic markers (PIC = 0.46 to 0.77) was used to cross-amplify PCR products in 79 species of Passiflora (including P. edulis), belonging to four subgenera (Astrophea, Decaloba, Distephana and Passiflora). Approximately 71% of the marker/species combinations resulted in positive amplicons in all species tested. DNA polymorphism was detected in germplasm accessions of six closely related Passiflora species (P. edulis, P. alata, P. maliformis, P. nitida, P. quadrangularis and P. setacea) and the data used for accession discrimination and species assignment. A database of P. edulis DNA sequences obtained by NGS technology was examined to identify microsatellite repeats in

  15. SEQUENCING AND DE NOVO DRAFT ASSEMBLIES OF A FATHEAD MINNOW (Pimpehales promelas) reference genome

    Data.gov (United States)

    U.S. Environmental Protection Agency — The dataset provides the URLs for accessing the genome sequence data and two draft assemblies as well as fathead minnow genotyping data associated with estimating...

  16. The sequence and de novo assembly of the giant panda genome

    Science.gov (United States)

    Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

    2013-01-01

    Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809

  17. Assembly and Multiplex Genome Integration of Metabolic Pathways in Yeast Using CasEMBLR

    DEFF Research Database (Denmark)

    Jakočiūnas, Tadas; Jensen, Emil D.; Jensen, Michael Krogh

    2018-01-01

    and marker-free integration of the carotenoid pathway from 15 exogenously supplied DNA parts into three targeted genomic loci. As a second proof-of-principle, a total of ten DNA parts were assembled and integrated in two genomic loci to construct a tyrosine production strain, and at the same time knocking......Genome integration is a vital step for implementing large biochemical pathways to build a stable microbial cell factory. Although traditional strain construction strategies are well established for the model organism Saccharomyces cerevisiae, recent advances in CRISPR/Cas9-mediated genome...... engineering allow much higher throughput and robustness in terms of strain construction. In this chapter, we describe CasEMBLR, a highly efficient and marker-free genome engineering method for one-step integration of in vivo assembled expression cassettes in multiple genomic sites simultaneously. Cas...

  18. The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

    Science.gov (United States)

    Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C

    2012-01-01

    The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

  19. A practical comparison of de novo genome assembly software tools for next-generation sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Wenyu Zhang

    Full Text Available The advent of next-generation sequencing technologies is accompanied with the development of many whole-genome sequence assembly methods and software, especially for de novo fragment assembly. Due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here, we provide the information of adaptivity for each program, then above all, compare the performance of eight distinct tools against eight groups of simulated datasets from Solexa sequencing platform. Considering the computational time, maximum random access memory (RAM occupancy, assembly accuracy and integrity, our study indicate that string-based assemblers, overlap-layout-consensus (OLC assemblers are well-suited for very short reads and longer reads of small genomes respectively. For large datasets of more than hundred millions of short reads, De Bruijn graph-based assemblers would be more appropriate. In terms of software implementation, string-based assemblers are superior to graph-based ones, of which SOAPdenovo is complex for the creation of configuration file. Our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the improvement of existing assemblers or the developing of novel assemblers.

  20. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.

    Science.gov (United States)

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K

    2012-11-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.

  1. A hybrid reference-guided de novo assembly approach for generating Cyclospora mitochondrion genomes.

    Science.gov (United States)

    Gopinath, G R; Cinar, H N; Murphy, H R; Durigan, M; Almeria, M; Tall, B D; DaSilva, A J

    2018-01-01

    Cyclospora cayetanensis is a coccidian parasite associated with large and complex foodborne outbreaks worldwide. Linking samples from cyclosporiasis patients during foodborne outbreaks with suspected contaminated food sources, using conventional epidemiological methods, has been a persistent challenge. To address this issue, development of new methods based on potential genomically-derived markers for strain-level identification has been a priority for the food safety research community. The absence of reference genomes to identify nucleotide and structural variants with a high degree of confidence has limited the application of using sequencing data for source tracking during outbreak investigations. In this work, we determined the quality of a high resolution, curated, public mitochondrial genome assembly to be used as a reference genome by applying bioinformatic analyses. Using this reference genome, three new mitochondrial genome assemblies were built starting with metagenomic reads generated by sequencing DNA extracted from oocysts present in stool samples from cyclosporiasis patients. Nucleotide variants were identified in the new and other publicly available genomes in comparison with the mitochondrial reference genome. A consolidated workflow, presented here, to generate new mitochondrion genomes using our reference-guided de novo assembly approach could be useful in facilitating the generation of other mitochondrion sequences, and in their application for subtyping C. cayetanensis strains during foodborne outbreak investigations.

  2. Three invariant Hi-C interaction patterns: Applications to genome assembly.

    Science.gov (United States)

    Oddes, Sivan; Zelig, Aviv; Kaplan, Noam

    2018-06-01

    Assembly of reference-quality genomes from next-generation sequencing data is a key challenge in genomics. Recently, we and others have shown that Hi-C data can be used to address several outstanding challenges in the field of genome assembly. This principle has since been developed in academia and industry, and has been used in the assembly of several major genomes. In this paper, we explore the central principles underlying Hi-C-based assembly approaches, by quantitatively defining and characterizing three invariant Hi-C interaction patterns on which these approaches can build: Intrachromosomal interaction enrichment, distance-dependent interaction decay and local interaction smoothness. Specifically, we evaluate to what degree each invariant pattern holds on a single locus level in different species, cell types and Hi-C map resolutions. We find that these patterns are generally consistent across species and cell types but are affected by sequencing depth, and that matrix balancing improves consistency of loci with all three invariant patterns. Finally, we overview current Hi-C-based assembly approaches in light of these invariant patterns and demonstrate how local interaction smoothness can be used to easily detect scaffolding errors in extremely sparse Hi-C maps. We suggest that simultaneously considering all three invariant patterns may lead to better Hi-C-based genome assembly methods. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. Ten steps to get started in Genome Assembly and Annotation [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Victoria Dominguez Del Angel

    2018-02-01

    Full Text Available As a part of the ELIXIR-EXCELERATE efforts in capacity building, we present here 10 steps to facilitate researchers getting started in genome assembly and genome annotation. The guidelines given are broadly applicable, intended to be stable over time, and cover all aspects from start to finish of a general assembly and annotation project. Intrinsic properties of genomes are discussed, as is the importance of using high quality DNA. Different sequencing technologies and generally applicable workflows for genome assembly are also detailed. We cover structural and functional annotation and encourage readers to also annotate transposable elements, something that is often omitted from annotation workflows. The importance of data management is stressed, and we give advice on where to submit data and how to make your results Findable, Accessible, Interoperable, and Reusable (FAIR.

  4. Haplotype assembly in polyploid genomes and identical by descent shared tracts.

    Science.gov (United States)

    Aguiar, Derek; Istrail, Sorin

    2013-07-01

    Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (i) do not consider individuals sharing haplotypes jointly, which reduces the size and accuracy of assembled haplotypes, and (ii) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Polyploid organisms are increasingly becoming the target of many research groups interested in the genomics of disease, phylogenetics, botany and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction. In this work, we present a number of results, extensions and generalizations of compass graphs and our HapCompass framework. We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. Furthermore, we present graph theory-based algorithms for the problem of haplotype assembly using our previously developed HapCompass framework for (i) novel implementations of haplotype assembly optimizations (minimum error correction), (ii) assembly of a pair of individuals sharing a haplotype tract identical by descent and (iii) assembly of polyploid genomes. We evaluate our methods on 1000 Genomes Project, Pacific Biosciences and simulated sequence data. HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/. Supplementary data are available at Bioinformatics online.

  5. BAUM: Improving genome assembly by adaptive unique mapping and local overlap-layout-consensus approach.

    Science.gov (United States)

    Wang, Anqi; Wang, Zhanyu; Li, Zheng; Li, Lei M

    2018-01-15

    It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the Second Generation Sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the Single-Molecule Real-Time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can: (1) perform reference-assisted assembly based on the genome of a close species; (2) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. lilei@amss.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  6. Assembly and Multiplex Genome Integration of Metabolic Pathways in Yeast Using CasEMBLR.

    Science.gov (United States)

    Jakočiūnas, Tadas; Jensen, Emil D; Jensen, Michael K; Keasling, Jay D

    2018-01-01

    Genome integration is a vital step for implementing large biochemical pathways to build a stable microbial cell factory. Although traditional strain construction strategies are well established for the model organism Saccharomyces cerevisiae, recent advances in CRISPR/Cas9-mediated genome engineering allow much higher throughput and robustness in terms of strain construction. In this chapter, we describe CasEMBLR, a highly efficient and marker-free genome engineering method for one-step integration of in vivo assembled expression cassettes in multiple genomic sites simultaneously. CasEMBLR capitalizes on the CRISPR/Cas9 technology to generate double-strand breaks in genomic loci, thus prompting native homologous recombination (HR) machinery to integrate exogenously derived homology templates. As proof-of-principle for microbial cell factory development, CasEMBLR was used for one-step assembly and marker-free integration of the carotenoid pathway from 15 exogenously supplied DNA parts into three targeted genomic loci. As a second proof-of-principle, a total of ten DNA parts were assembled and integrated in two genomic loci to construct a tyrosine production strain, and at the same time knocking out two genes. This new method complements and improves the field of genome engineering in S. cerevisiae by providing a more flexible platform for rapid and precise strain building.

  7. A stochastic de novo assembly algorithm for viral-sized genomes obtains correct genomes and builds consensus

    NARCIS (Netherlands)

    Bucur, Doina

    2017-01-01

    A genetic algorithm with stochastic macro mutation operators which merge, split, move, reverse and align DNA contigs on a scaffold is shown to accurately and consistently assemble raw DNA reads from an accurately sequenced single-read library into a contiguous genome. A candidate solution is a

  8. Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data.

    Science.gov (United States)

    Nishito, Yukari; Osana, Yasunori; Hachiya, Tsuyoshi; Popendorf, Kris; Toyoda, Atsushi; Fujiyama, Asao; Itaya, Mitsuhiro; Sakakibara, Yasubumi

    2010-04-16

    Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for gamma-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks

  9. Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data

    Directory of Open Access Journals (Sweden)

    Fujiyama Asao

    2010-04-01

    Full Text Available Abstract Background Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. Results We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for γ-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. Conclusions The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B

  10. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea

    Energy Technology Data Exchange (ETDEWEB)

    Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; Harmon-Smith, Miranda; Doud, Devin; Reddy, T. B. K.; Schulz, Frederik; Jarett, Jessica; Rivers, Adam R.; Eloe-Fadrosh, Emiley A.; Tringe, Susannah G.; Ivanova, Natalia N.; Copeland, Alex; Clum, Alicia; Becraft, Eric D.; Malmstrom, Rex R.; Birren, Bruce; Podar, Mircea; Bork, Peer; Weinstock, George M.; Garrity, George M.; Dodsworth, Jeremy A.; Yooseph, Shibu; Sutton, Granger; Glöckner, Frank O.; Gilbert, Jack A.; Nelson, William C.; Hallam, Steven J.; Jungbluth, Sean P.; Ettema, Thijs J. G.; Tighe, Scott; Konstantinidis, Konstantinos T.; Liu, Wen-Tso; Baker, Brett J.; Rattei, Thomas; Eisen, Jonathan A.; Hedlund, Brian; McMahon, Katherine D.; Fierer, Noah; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Tyson, Gene W.; Rinke, Christian; Kyrpides, Nikos C.; Schriml, Lynn; Garrity, George M.; Hugenholtz, Philip; Sutton, Granger; Yilmaz, Pelin; Meyer, Folker; Glöckner, Frank O.; Gilbert, Jack A.; Knight, Rob; Finn, Rob; Cochrane, Guy; Karsch-Mizrachi, Ilene; Lapidus, Alla; Meyer, Folker; Yilmaz, Pelin; Parks, Donovan H.; Eren, A. M.; Schriml, Lynn; Banfield, Jillian F.; Hugenholtz, Philip; Woyke, Tanja

    2017-08-08

    The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as a supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.

  11. Efficient assembly of de novo human artificial chromosomes from large genomic loci

    Directory of Open Access Journals (Sweden)

    Stromberg Gregory

    2005-07-01

    Full Text Available Abstract Background Human Artificial Chromosomes (HACs are potentially useful vectors for gene transfer studies and for functional annotation of the genome because of their suitability for cloning, manipulating and transferring large segments of the genome. However, development of HACs for the transfer of large genomic loci into mammalian cells has been limited by difficulties in manipulating high-molecular weight DNA, as well as by the low overall frequencies of de novo HAC formation. Indeed, to date, only a small number of large (>100 kb genomic loci have been reported to be successfully packaged into de novo HACs. Results We have developed novel methodologies to enable efficient assembly of HAC vectors containing any genomic locus of interest. We report here the creation of a novel, bimolecular system based on bacterial artificial chromosomes (BACs for the construction of HACs incorporating any defined genomic region. We have utilized this vector system to rapidly design, construct and validate multiple de novo HACs containing large (100–200 kb genomic loci including therapeutically significant genes for human growth hormone (HGH, polycystic kidney disease (PKD1 and ß-globin. We report significant differences in the ability of different genomic loci to support de novo HAC formation, suggesting possible effects of cis-acting genomic elements. Finally, as a proof of principle, we have observed sustained ß-globin gene expression from HACs incorporating the entire 200 kb ß-globin genomic locus for over 90 days in the absence of selection. Conclusion Taken together, these results are significant for the development of HAC vector technology, as they enable high-throughput assembly and functional validation of HACs containing any large genomic locus. We have evaluated the impact of different genomic loci on the frequency of HAC formation and identified segments of genomic DNA that appear to facilitate de novo HAC formation. These genomic loci

  12. Common genetic variation and susceptibility to partial epilepsies: a genome-wide association study.

    Science.gov (United States)

    Kasperaviciūte, Dalia; Catarino, Claudia B; Heinzen, Erin L; Depondt, Chantal; Cavalleri, Gianpiero L; Caboclo, Luis O; Tate, Sarah K; Jamnadas-Khoda, Jenny; Chinthapalli, Krishna; Clayton, Lisa M S; Shianna, Kevin V; Radtke, Rodney A; Mikati, Mohamad A; Gallentine, William B; Husain, Aatif M; Alhusaini, Saud; Leppert, David; Middleton, Lefkos T; Gibson, Rachel A; Johnson, Michael R; Matthews, Paul M; Hosford, David; Heuser, Kjell; Amos, Leslie; Ortega, Marcos; Zumsteg, Dominik; Wieser, Heinz-Gregor; Steinhoff, Bernhard J; Krämer, Günter; Hansen, Jörg; Dorn, Thomas; Kantanen, Anne-Mari; Gjerstad, Leif; Peuralinna, Terhi; Hernandez, Dena G; Eriksson, Kai J; Kälviäinen, Reetta K; Doherty, Colin P; Wood, Nicholas W; Pandolfo, Massimo; Duncan, John S; Sander, Josemir W; Delanty, Norman; Goldstein, David B; Sisodiya, Sanjay M

    2010-07-01

    Partial epilepsies have a substantial heritability. However, the actual genetic causes are largely unknown. In contrast to many other common diseases for which genetic association-studies have successfully revealed common variants associated with disease risk, the role of common variation in partial epilepsies has not yet been explored in a well-powered study. We undertook a genome-wide association-study to identify common variants which influence risk for epilepsy shared amongst partial epilepsy syndromes, in 3445 patients and 6935 controls of European ancestry. We did not identify any genome-wide significant association. A few single nucleotide polymorphisms may warrant further investigation. We exclude common genetic variants with effect sizes above a modest 1.3 odds ratio for a single variant as contributors to genetic susceptibility shared across the partial epilepsies. We show that, at best, common genetic variation can only have a modest role in predisposition to the partial epilepsies when considered across syndromes in Europeans. The genetic architecture of the partial epilepsies is likely to be very complex, reflecting genotypic and phenotypic heterogeneity. Larger meta-analyses are required to identify variants of smaller effect sizes (odds ratio<1.3) or syndrome-specific variants. Further, our results suggest research efforts should also be directed towards identifying the multiple rare variants likely to account for at least part of the heritability of the partial epilepsies. Data emerging from genome-wide association-studies will be valuable during the next serious challenge of interpreting all the genetic variation emerging from whole-genome sequencing studies.

  13. Annotation of a hybrid partial genome of the Coffee Rust (Hemileia vastatrix contributes to the gene repertoire catalogue of the Pucciniales

    Directory of Open Access Journals (Sweden)

    Marco Aurelio Cristancho

    2014-10-01

    Full Text Available Coffee leaf rust caused by the fungus Hemileia vastatrix is the most damaging disease to coffee worldwide. The pathogen has recently appeared in multiple outbreaks in coffee producing countries resulting in significant yield losses and increases in costs related to its control. New races/isolates are constantly emerging as evidenced by the presence of the fungus in plants that were previously resistant. Genomic studies are opening new avenues for the study of the evolution of pathogens, the detailed description of plant-pathogen interactions and the development of molecular techniques for the identification of individual isolates. For this purpose we sequenced 8 different H. vastatrix isolates using NGS technologies and gathered partial genome assemblies due to the large repetitive content in the coffee rust hybrid genome; 74.4% of the assembled contigs harbor repetitive sequences. A hybrid assembly of 333Mb was built based on the 8 isolates; this assembly was used for subsequent analyses.Analysis of the conserved gene space showed that the hybrid H. vastatrix genome, though highly fragmented, had a satisfactory level of completion with 91.94% of core protein-coding orthologous genes present. RNA-Seq from urediniospores was used to guide the de novo annotation of the H. vastatrix gene complement. In total, 14,445 genes organized in 3,921 families were uncovered; a considerable proportion of the predicted proteins (73.8% were homologous to other Pucciniales species genomes. Several gene families related to the fungal lifestyle were identified, particularly 483 predicted secreted proteins that represent candidate effector genes and will provide interesting hints to decipher virulence in the coffee rust fungus. The genome sequence of Hva will serve as a template to understand the molecular mechanisms used by this fungus to attack the coffee plant, to study the diversity of this species and for the development of molecular markers to distinguish

  14. Accurate DNA assembly and genome engineering with optimized uracil excision cloning

    DEFF Research Database (Denmark)

    Cavaleiro, Mafalda; Kim, Se Hyeuk; Seppala, Susanna

    2015-01-01

    Simple and reliable DNA editing by uracil excision (a.k.a. USER cloning) has been described by several research groups, but the optimal design of cohesive DNA ends for multigene assembly remains elusive. Here, we use two model constructs based on expression of gfp and a four-gene pathway that pro......Simple and reliable DNA editing by uracil excision (a.k.a. USER cloning) has been described by several research groups, but the optimal design of cohesive DNA ends for multigene assembly remains elusive. Here, we use two model constructs based on expression of gfp and a four-gene pathway...... that produces β-carotene to optimize assembly junctions and the uracil excision protocol. By combining uracil excision cloning with a genomic integration technology, we demonstrate that up to six DNA fragments can be assembled in a one-tube reaction for direct genome integration with high accuracy, greatly...... facilitating the advanced engineering of robust cell factories....

  15. Interactions Between HIV-1 Gag and Viral RNA Genome Enhance Virion Assembly

    DEFF Research Database (Denmark)

    Dilley, Kari A; Nikolaitchik, Olga A; Galli, Andrea

    2017-01-01

    between Gag and viral RNA are required for the enhancement of particle production. Taken together, these studies are consistent with our previous hypothesis that specific dimeric viral RNA:Gag interactions are the nucleation event of infectious virion assembly, ensuring that one RNA dimer is packaged......Most HIV-1 virions contain two copies of full-length viral RNA, indicating that genome packaging is efficient and tightly regulated. However, the structural protein Gag is the only component required for the assembly of noninfectious virus-like particles and the viral RNA is dispensable...... in this process. The mechanism that allows HIV-1 to achieve such high efficiency of genome packaging when a packageable viral RNA is not required for virus assembly is currently unknown. In this report, we examined the role of HIV-1 RNA in virus assembly and found that packageable HIV-1 RNA enhances particle...

  16. Modular assembly of transposable element arrays by microsatellite targeting in the guayule and rice genomes.

    Science.gov (United States)

    Valdes Franco, José A; Wang, Yi; Huo, Naxin; Ponciano, Grisel; Colvin, Howard A; McMahan, Colleen M; Gu, Yong Q; Belknap, William R

    2018-04-19

    Guayule (Parthenium argentatum A. Gray) is a rubber-producing desert shrub native to Mexico and the United States. Guayule represents an alternative to Hevea brasiliensis as a source for commercial natural rubber. The efficient application of modern molecular/genetic tools to guayule improvement requires characterization of its genome. The 1.6 Gb guayule genome was sequenced, assembled and annotated. The final 1.5 Gb assembly, while fragmented (N 50  = 22 kb), maps > 95% of the shotgun reads and is essentially complete. Approximately 40,000 transcribed, protein encoding genes were annotated on the assembly. Further characterization of this genome revealed 15 families of small, microsatellite-associated, transposable elements (TEs) with unexpected chromosomal distribution profiles. These SaTar (Satellite Targeted) elements, which are non-autonomous Mu-like elements (MULEs), were frequently observed in multimeric linear arrays of unrelated individual elements within which no individual element is interrupted by another. This uniformly non-nested TE multimer architecture has not been previously described in either eukaryotic or prokaryotic genomes. Five families of similarly distributed non-autonomous MULEs (microsatellite associated, modularly assembled) were characterized in the rice genome. Families of TEs with similar structures and distribution profiles were identified in sorghum and citrus. The sequencing and assembly of the guayule genome provides a foundation for application of current crop improvement technologies to this plant. In addition, characterization of this genome revealed SaTar elements with distribution profiles unique among TEs. Satar targeting appears based on an alternative MULE recombination mechanism with the potential to impact gene evolution.

  17. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds.

    Science.gov (United States)

    Dudchenko, Olga; Batra, Sanjit S; Omer, Arina D; Nyquist, Sarah K; Hoeger, Marie; Durand, Neva C; Shamim, Muhammad S; Machol, Ido; Lander, Eric S; Aiden, Aviva Presser; Aiden, Erez Lieberman

    2017-04-07

    The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective way. Here we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67× coverage). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Ae aegypti and Culex quinquefasciatus , each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that almost all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, and accurate, and can be applied to many species. Copyright © 2017, American Association for the Advancement of Science.

  18. Improved Genome Assembly and Annotation for the Rock Pigeon (Columba livia).

    Science.gov (United States)

    Holt, Carson; Campbell, Michael; Keays, David A; Edelman, Nathaniel; Kapusta, Aurélie; Maclary, Emily; T Domyan, Eric; Suh, Alexander; Warren, Wesley C; Yandell, Mark; Gilbert, M Thomas P; Shapiro, Michael D

    2018-05-04

    The domestic rock pigeon ( Columba livia ) is among the most widely distributed and phenotypically diverse avian species. C. livia is broadly studied in ecology, genetics, physiology, behavior, and evolutionary biology, and has recently emerged as a model for understanding the molecular basis of anatomical diversity, the magnetic sense, and other key aspects of avian biology. Here we report an update to the C. livia genome reference assembly and gene annotation dataset. Greatly increased scaffold lengths in the updated reference assembly, along with an updated annotation set, provide improved tools for evolutionary and functional genetic studies of the pigeon, and for comparative avian genomics in general. Copyright © 2018 Holt et al.

  19. Tips and tricks for the assembly of a Corynebacterium pseudotuberculosis genome using a semiconductor sequencer

    DEFF Research Database (Denmark)

    Ramos, Rommel Thiago Jucá; Carneiro, Adriana Ribeiro; Soares, Siomar de Castro

    2013-01-01

    that enable reference-based assembly, such as the one used in the present study, Corynebacterium pseudotuberculosis biovar equi, which causes high economic losses in the US equine industry. The quality treatment strategy incorporated into the assembly pipeline enabled a 16-fold greater use of the sequencing...... was validated by comparative genomics with other species of the genus Corynebacterium. The present study presents a modus operandi that enables a greater and better use of data obtained from semiconductor sequencing for obtaining the complete genome from a prokaryotic microorganism, C. pseudotuberculosis, which...

  20. Meraculous: De Novo Genome Assembly with Short Paired-End Reads

    Energy Technology Data Exchange (ETDEWEB)

    Chapman, Jarrod A.; Ho, Isaac; Sunkara, Sirisha; Luo, Shujun; Schroth, Gary P.; Rokhsar, Daniel S.; Salzberg, Steven L.

    2011-08-18

    We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ~280 bp or ~3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.

  1. Assembling networks of microbial genomes using linear programming.

    Science.gov (United States)

    Holloway, Catherine; Beiko, Robert G

    2010-11-20

    Microbial genomes exhibit complex sets of genetic affinities due to lateral genetic transfer. Assessing the relative contributions of parent-to-offspring inheritance and gene sharing is a vital step in understanding the evolutionary origins and modern-day function of an organism, but recovering and showing these relationships is a challenging problem. We have developed a new approach that uses linear programming to find between-genome relationships, by treating tables of genetic affinities (here, represented by transformed BLAST e-values) as an optimization problem. Validation trials on simulated data demonstrate the effectiveness of the approach in recovering and representing vertical and lateral relationships among genomes. Application of the technique to a set comprising Aquifex aeolicus and 75 other thermophiles showed an important role for large genomes as 'hubs' in the gene sharing network, and suggested that genes are preferentially shared between organisms with similar optimal growth temperatures. We were also able to discover distinct and common genetic contributors to each sequenced representative of genus Pseudomonas. The linear programming approach we have developed can serve as an effective inference tool in its own right, and can be an efficient first step in a more-intensive phylogenomic analysis.

  2. Impact of genome assembly status on ChIP-Seq and ChIP-PET data mapping

    Directory of Open Access Journals (Sweden)

    Sachs Laurent

    2009-12-01

    Full Text Available Abstract Background ChIP-Seq and ChIP-PET can potentially be used with any genome for genome wide profiling of protein-DNA interaction sites. Unfortunately, it is probable that most genome assemblies will never reach the quality of the human genome assembly. Therefore, it remains to be determined whether ChIP-Seq and ChIP-PET are practicable with genome sequences other than a few (e.g. human and mouse. Findings Here, we used in silico simulations to assess the impact of completeness or fragmentation of genome assemblies on ChIP-Seq and ChIP-PET data mapping. Conclusions Most currently published genome assemblies are suitable for mapping the short sequence tags produced by ChIP-Seq or ChIP-PET.

  3. Herbarium genomics

    DEFF Research Database (Denmark)

    Bakker, Freek T.; Lei, Di; Yu, Jiaying

    2016-01-01

    Herbarium genomics is proving promising as next-generation sequencing approaches are well suited to deal with the usually fragmented nature of archival DNA. We show that routine assembly of partial plastome sequences from herbarium specimens is feasible, from total DNA extracts and with specimens...... up to 146 years old. We use genome skimming and an automated assembly pipeline, Iterative Organelle Genome Assembly, that assembles paired-end reads into a series of candidate assemblies, the best one of which is selected based on likelihood estimation. We used 93 specimens from 12 different...... correlation between plastome coverage and nuclear genome size (C value) in our samples, but the range of C values included is limited. Finally, we conclude that routine plastome sequencing from herbarium specimens is feasible and cost-effective (compared with Sanger sequencing or plastome...

  4. Alignment of 1000 Genomes Project reads to reference assembly GRCh38.

    Science.gov (United States)

    Zheng-Bradley, Xiangqun; Streeter, Ian; Fairley, Susan; Richardson, David; Clarke, Laura; Flicek, Paul

    2017-07-01

    The 1000 Genomes Project produced more than 100 trillion basepairs of short read sequence from more than 2600 samples in 26 populations over a period of five years. In its final phase, the project released over 85 million genotyped and phased variants on human reference genome assembly GRCh37. An updated reference assembly, GRCh38, was released in late 2013, but there was insufficient time for the final phase of the project analysis to change to the new assembly. Although it is possible to lift the coordinates of the 1000 Genomes Project variants to the new assembly, this is a potentially error-prone process as coordinate remapping is most appropriate only for non-repetitive regions of the genome and those that did not see significant change between the two assemblies. It will also miss variants in any region that was newly added to GRCh38. Thus, to produce the highest quality variants and genotypes on GRCh38, the best strategy is to realign the reads and recall the variants based on the new alignment. As the first step of variant calling for the 1000 Genomes Project data, we have finished remapping all of the 1000 Genomes sequence reads to GRCh38 with alternative scaffold-aware BWA-MEM. The resulting alignments are available as CRAM, a reference-based sequence compression format. The data have been released on our FTP site and are also available from European Nucleotide Archive to facilitate researchers discovering variants on the primary sequences and alternative contigs of GRCh38. © The Authors 2017. Published by Oxford University Press.

  5. The mitochondrial genomes of Atlas Geckos (Quedenfeldtia): mitogenome assembly from transcriptomes and anchored hybrid enrichment datasets

    OpenAIRE

    Lyra, Mariana L.; Joger, Ulrich; Schulte, Ulrich; Slimani, Tahar; El Mouden, El Hassan; Bouazza, Abdellah; Künzel, Sven; Lemmon, Alan R.; Moriarty Lemmon, Emily; Vences, Miguel

    2017-01-01

    The nearly complete mitogenomes of the two species of North African Atlas geckos, Quedenfeldtia moerens and Q. trachyblepharus were assembled from anchored hybrid enrichment data and RNAseq data. Congruent assemblies were obtained for four samples included in both datasets. We recovered the 13 protein-coding genes, 22 tRNA genes, and two rRNA genes for both species, including partial control region. The order of genes agrees with that of other geckos.

  6. The human genome: Some assembly required. Final report

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1994-12-31

    The Human Genome Project promises to be one of the most rewarding endeavors in modern biology. The cost and the ethical and social implications, however, have made this project the source of considerable debate both in the scientific community and in the public at large. The 1994 Graduate Student Symposium addresses the scientific merits of the project, the technical issues involved in accomplishing the task, as well as the medical and social issues which stem from the wealth of knowledge which the Human Genome Project will help create. To this end, speakers were brought together who represent the diverse areas of expertise characteristic of this multidisciplinary project. The keynote speaker addresses the project`s motivations and goals in the larger context of biological and medical sciences. The first two sessions address relevant technical issues, data collection with a focus on high-throughput sequencing methods and data analysis with an emphasis on identification of coding sequences. The third session explores recent advances in the understanding of genetic diseases and possible routes to treatment. Finally, the last session addresses some of the ethical, social and legal issues which will undoubtedly arise from having a detailed knowledge of the human genome.

  7. Long-read sequencing and de novo assembly of a Chinese genome

    Science.gov (United States)

    Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arr...

  8. Assembly of the Boechera retrofracta Genome and Evolutionary Analysis of Apomixis-Associated Genes

    Directory of Open Access Journals (Sweden)

    Sergei Kliver

    2018-03-01

    Full Text Available Closely related to the model plant Arabidopsis thaliana, the genus Boechera is known to contain both sexual and apomictic species or accessions. Boechera retrofracta is a diploid sexually reproducing species and is thought to be an ancestral parent species of apomictic species. Here we report the de novo assembly of the B. retrofracta genome using short Illumina and Roche reads from 1 paired-end and 3 mate pair libraries. The distribution of 23-mers from the paired end library has indicated a low level of heterozygosity and the presence of detectable duplications and triplications. The genome size was estimated to be equal 227 Mb. N50 of the assembled scaffolds was 2.3 Mb. Using a hybrid approach that combines homology-based and de novo methods 27,048 protein-coding genes were predicted. Also repeats, transfer RNA (tRNA and ribosomal RNA (rRNA genes were annotated. Finally, genes of B. retrofracta and 6 other Brassicaceae species were used for phylogenetic tree reconstruction. In addition, we explored the histidine exonuclease APOLLO locus, related to apomixis in Boechera, and proposed model of its evolution through the series of duplications. An assembled genome of B. retrofracta will help in the challenging assembly of the highly heterozygous genomes of hybrid apomictic species.

  9. Norgal: Extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

    DEFF Research Database (Denmark)

    Al-Nakeeb, Kosai Ali Ahmed; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas

    2017-01-01

    and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences...

  10. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

    Science.gov (United States)

    Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

  11. Sequencing and De novo Draft Assemblies of the Fathead Minnow (Pimphales promelas)Reference Genome

    Science.gov (United States)

    This study was undertaken to develop genome-scale resources for the fathead minnow (Pimphales promelas) an important model organism widely used in both aquatic ecotoxicology research and in regulatory toxicity testing. We report on the first sequencing and two draft assemblies fo...

  12. De novo assembling and primary analysis of genome and transcriptome of gray whale Eschrichtius robustus.

    Science.gov (United States)

    Moskalev, Alexey А; Kudryavtseva, Anna V; Graphodatsky, Alexander S; Beklemisheva, Violetta R; Serdyukova, Natalya A; Krutovsky, Konstantin V; Sharov, Vadim V; Kulakovskiy, Ivan V; Lando, Andrey S; Kasianov, Artem S; Kuzmin, Dmitry A; Putintseva, Yuliya A; Feranchuk, Sergey I; Shaposhnikov, Mikhail V; Fraifeld, Vadim E; Toren, Dmitri; Snezhkina, Anastasia V; Sitnik, Vasily V

    2017-12-28

    Gray whale, Eschrichtius robustus (E. robustus), is a single member of the family Eschrichtiidae, which is considered to be the most primitive in the class Cetacea. Gray whale is often described as a "living fossil". It is adapted to extreme marine conditions and has a high life expectancy (77 years). The assembly of a gray whale genome and transcriptome will allow to carry out further studies of whale evolution, longevity, and resistance to extreme environment. In this work, we report the first de novo assembly and primary analysis of the E. robustus genome and transcriptome based on kidney and liver samples. The presented draft genome assembly is complete by 55% in terms of a total genome length, but only by 24% in terms of the BUSCO complete gene groups, although 10,895 genes were identified. Transcriptome annotation and comparison with other whale species revealed robust expression of DNA repair and hypoxia-response genes, which is expected for whales. This preliminary study of the gray whale genome and transcriptome provides new data to better understand the whale evolution and the mechanisms of their adaptation to the hypoxic conditions.

  13. Modeling biological problems in computer science: a case study in genome assembly.

    Science.gov (United States)

    Medvedev, Paul

    2018-01-30

    As computer scientists working in bioinformatics/computational biology, we often face the challenge of coming up with an algorithm to answer a biological question. This occurs in many areas, such as variant calling, alignment and assembly. In this tutorial, we use the example of the genome assembly problem to demonstrate how to go from a question in the biological realm to a solution in the computer science realm. We show the modeling process step-by-step, including all the intermediate failed attempts. Please note this is not an introduction to how genome assembly algorithms work and, if treated as such, would be incomplete and unnecessarily long-winded. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. Assembly and diploid architecture of an individual human genome via single-molecule technologies.

    Science.gov (United States)

    Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali

    2015-08-01

    We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.

  15. Numerical simulation of fuel assembly thermohydraulics of fast reactors with the partial blockage of cross section under the coolant

    International Nuclear Information System (INIS)

    Zhukov, A.V.; Sorokin, A.P.

    2000-01-01

    The problems of numerical modeling of thermohydraulics in assembly of fuel elements of fast reactors with the partial blockage of cross-section under the coolant are considered. The information about existing codes constructed on use of subchannel technique and model of porous body are presented. The results of calculation obtained by these codes are presented. (author)

  16. An advanced draft genome assembly of a desi type chickpea (Cicer arietinum L.).

    Science.gov (United States)

    Parween, Sabiha; Nawaz, Kashif; Roy, Riti; Pole, Anil K; Venkata Suresh, B; Misra, Gopal; Jain, Mukesh; Yadav, Gitanjali; Parida, Swarup K; Tyagi, Akhilesh K; Bhatia, Sabhyata; Chattopadhyay, Debasis

    2015-08-11

    Chickpea (Cicer arietinum L.) is an important pulse legume crop. We previously reported a draft genome assembly of the desi chickpea cultivar ICC 4958. Here we report an advanced version of the ICC 4958 genome assembly (version 2.0) generated using additional sequence data and an improved genetic map. This resulted in 2.7-fold increase in the length of the pseudomolecules and substantial reduction of sequence gaps. The genome assembly covered more than 94% of the estimated gene space and predicted the presence of 30,257 protein-coding genes including 2230 and 133 genes encoding potential transcription factors (TF) and resistance gene homologs, respectively. Gene expression analysis identified several TF and chickpea-specific genes with tissue-specific expression and displayed functional diversification of the paralogous genes. Pairwise comparison of pseudomolecules in the desi (ICC 4958) and the earlier reported kabuli (CDC Frontier) chickpea assemblies showed an extensive local collinearity with incongruity in the placement of large sequence blocks along the linkage groups, apparently due to use of different genetic maps. Single nucleotide polymorphism (SNP)-based mining of intra-specific polymorphism identified more than four thousand SNPs differentiating a desi group and a kabuli group of chickpea genotypes.

  17. Sequencing, de novo assembling, and annotating the genome of the endangered Chinese crocodile lizard Shinisaurus crocodilurus.

    Science.gov (United States)

    Gao, Jian; Li, Qiye; Wang, Zongji; Zhou, Yang; Martelli, Paolo; Li, Fang; Xiong, Zijun; Wang, Jian; Yang, Huanming; Zhang, Guojie

    2017-07-01

    The Chinese crocodile lizard, Shinisaurus crocodilurus, is the only living representative of the monotypic family Shinisauridae under the order Squamata. It is an obligate semi-aquatic, viviparous, diurnal species restricted to specific portions of mountainous locations in southwestern China and northeastern Vietnam. However, in the past several decades, this species has undergone a rapid decrease in population size due to illegal poaching and habitat disruption, making this unique reptile species endangered and listed in the Convention on International Trade in Endangered Species of Wild Fauna and Flora Appendix II since 1990. A proposal to uplist it to Appendix I was passed at the Convention on International Trade in Endangered Species of Wild Fauna and Flora Seventeenth meeting of the Conference of the Parties in 2016. To promote the conservation of this species, we sequenced the genome of a male Chinese crocodile lizard using a whole-genome shotgun strategy on the Illumina HiSeq 2000 platform. In total, we generated ∼291 Gb of raw sequencing data (×149 depth) from 13 libraries with insert sizes ranging from 250 bp to 40 kb. After filtering for polymerase chain reaction-duplicated and low-quality reads, ∼137 Gb of clean data (×70 depth) were obtained for genome assembly. We yielded a draft genome assembly with a total length of 2.24 Gb and an N50 scaffold size of 1.47 Mb. The assembled genome was predicted to contain 20 150 protein-coding genes and up to 1114 Mb (49.6%) of repetitive elements. The genomic resource of the Chinese crocodile lizard will contribute to deciphering the biology of this organism and provides an essential tool for conservation efforts. It also provides a valuable resource for future study of squamate evolution. © The Authors 2017. Published by Oxford University Press.

  18. High-resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly.

    Science.gov (United States)

    Bartholomé, Jérôme; Mandrou, Eric; Mabiala, André; Jenkins, Jerry; Nabihoudine, Ibouniyamine; Klopp, Christophe; Schmutz, Jeremy; Plomion, Christophe; Gion, Jean-Marc

    2015-06-01

    Genetic maps are key tools in genetic research as they constitute the framework for many applications, such as quantitative trait locus analysis, and support the assembly of genome sequences. The resequencing of the two parents of a cross between Eucalyptus urophylla and Eucalyptus grandis was used to design a single nucleotide polymorphism (SNP) array of 6000 markers evenly distributed along the E. grandis genome. The genotyping of 1025 offspring enabled the construction of two high-resolution genetic maps containing 1832 and 1773 markers with an average marker interval of 0.45 and 0.5 cM for E. grandis and E. urophylla, respectively. The comparison between genetic maps and the reference genome highlighted 85% of collinear regions. A total of 43 noncollinear regions and 13 nonsynthetic regions were detected and corrected in the new genome assembly. This improved version contains 4943 scaffolds totalling 691.3 Mb of which 88.6% were captured by the 11 chromosomes. The mapping data were also used to investigate the effect of population size and number of markers on linkage mapping accuracy. This study provides the most reliable linkage maps for Eucalyptus and version 2.0 of the E. grandis genome. © 2014 CIRAD. New Phytologist © 2014 New Phytologist Trust.

  19. DNA sequence explains seemingly disordered methylation levels in partially methylated domains of Mammalian genomes.

    Directory of Open Access Journals (Sweden)

    Dimos Gaidatzis

    2014-02-01

    Full Text Available For the most part metazoan genomes are highly methylated and harbor only small regions with low or absent methylation. In contrast, partially methylated domains (PMDs, recently discovered in a variety of cell lines and tissues, do not fit this paradigm as they show partial methylation for large portions (20%-40% of the genome. While in PMDs methylation levels are reduced on average, we found that at single CpG resolution, they show extensive variability along the genome outside of CpG islands and DNase I hypersensitive sites (DHS. Methylation levels range from 0% to 100% in a roughly uniform fashion with only little similarity between neighboring CpGs. A comparison of various PMD-containing methylomes showed that these seemingly disordered states of methylation are strongly conserved across cell types for virtually every PMD. Comparative sequence analysis suggests that DNA sequence is a major determinant of these methylation states. This is further substantiated by a purely sequence based model which can predict 31% (R(2 of the variation in methylation. The model revealed CpG density as the main driving feature promoting methylation, opposite to what has been shown for CpG islands, followed by various dinucleotides immediately flanking the CpG and a minor contribution from sequence preferences reflecting nucleosome positioning. Taken together we provide a reinterpretation for the nucleotide-specific methylation levels observed in PMDs, demonstrate their conservation across tissues and suggest that they are mainly determined by specific DNA sequence features.

  20. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads

    DEFF Research Database (Denmark)

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo

    2012-01-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp...... these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species....

  1. Comparison of carnivore, omnivore, and herbivore mammalian genomes with a new leopard assembly.

    Science.gov (United States)

    Kim, Soonok; Cho, Yun Sung; Kim, Hak-Min; Chung, Oksung; Kim, Hyunho; Jho, Sungwoong; Seomun, Hong; Kim, Jeongho; Bang, Woo Young; Kim, Changmu; An, Junghwa; Bae, Chang Hwan; Bhak, Youngjune; Jeon, Sungwon; Yoon, Hyejun; Kim, Yumi; Jun, JeHoon; Lee, HyeJin; Cho, Suan; Uphyrkina, Olga; Kostyria, Aleksey; Goodrich, John; Miquelle, Dale; Roelke, Melody; Lewis, John; Yurchenko, Andrey; Bankevich, Anton; Cho, Juok; Lee, Semin; Edwards, Jeremy S; Weber, Jessica A; Cook, Jo; Kim, Sangsoo; Lee, Hang; Manica, Andrea; Lee, Ilbeum; O'Brien, Stephen J; Bhak, Jong; Yeo, Joo-Hong

    2016-10-11

    There are three main dietary groups in mammals: carnivores, omnivores, and herbivores. Currently, there is limited comparative genomics insight into the evolution of dietary specializations in mammals. Due to recent advances in sequencing technologies, we were able to perform in-depth whole genome analyses of representatives of these three dietary groups. We investigated the evolution of carnivory by comparing 18 representative genomes from across Mammalia with carnivorous, omnivorous, and herbivorous dietary specializations, focusing on Felidae (domestic cat, tiger, lion, cheetah, and leopard), Hominidae, and Bovidae genomes. We generated a new high-quality leopard genome assembly, as well as two wild Amur leopard whole genomes. In addition to a clear contraction in gene families for starch and sucrose metabolism, the carnivore genomes showed evidence of shared evolutionary adaptations in genes associated with diet, muscle strength, agility, and other traits responsible for successful hunting and meat consumption. Additionally, an analysis of highly conserved regions at the family level revealed molecular signatures of dietary adaptation in each of Felidae, Hominidae, and Bovidae. However, unlike carnivores, omnivores and herbivores showed fewer shared adaptive signatures, indicating that carnivores are under strong selective pressure related to diet. Finally, felids showed recent reductions in genetic diversity associated with decreased population sizes, which may be due to the inflexible nature of their strict diet, highlighting their vulnerability and critical conservation status. Our study provides a large-scale family level comparative genomic analysis to address genomic changes associated with dietary specialization. Our genomic analyses also provide useful resources for diet-related genetic and health research.

  2. The Whole Genome Assembly and Comparative Genomic Research of Thellungiella parvula (Extremophile Crucifer Mitochondrion

    Directory of Open Access Journals (Sweden)

    Xuelin Wang

    2016-01-01

    Full Text Available The complete nucleotide sequences of the mitochondrial (mt genome of an extremophile species Thellungiella parvula (T. parvula have been determined with the lengths of 255,773 bp. T. parvula mt genome is a circular sequence and contains 32 protein-coding genes, 19 tRNA genes, and three ribosomal RNA genes with a 11.5% coding sequence. The base composition of 27.5% A, 27.5% T, 22.7% C, and 22.3% G in descending order shows a slight bias of 55% AT. Fifty-three repeats were identified in the mitochondrial genome of T. parvula, including 24 direct repeats, 28 tandem repeats (TRs, and one palindromic repeat. Furthermore, a total of 199 perfect microsatellites have been mined with a high A/T content (83.1% through simple sequence repeat (SSR analysis and they were distributed unevenly within this mitochondrial genome. We also analyzed other plant mitochondrial genomes’ evolution in general, providing clues for the understanding of the evolution of organelles genomes in plants. Comparing with other Brassicaceae species, T. parvula is related to Arabidopsis thaliana whose characters of low temperature resistance have been well documented. This study will provide important genetic tools for other Brassicaceae species research and improve yields of economically important plants.

  3. Assembling the Setaria italica L. Beauv. genome into nine chromosomes and insights into regions affecting growth and drought tolerance.

    Science.gov (United States)

    Tsai, Kevin J; Lu, Mei-Yeh Jade; Yang, Kai-Jung; Li, Mengyun; Teng, Yuchuan; Chen, Shihmay; Ku, Maurice S B; Li, Wen-Hsiung

    2016-10-13

    The diploid C 4 plant foxtail millet (Setaria italica L. Beauv.) is an important crop in many parts of Africa and Asia for the vast consumption of its grain and ability to grow in harsh environments, but remains understudied in terms of complete genomic architecture. To date, there have been only two genome assembly and annotation efforts with neither assembly reaching over 86% of the estimated genome size. We have combined de novo assembly with custom reference-guided improvements on a popular cultivar of foxtail millet and have achieved a genome assembly of 477 Mbp in length, which represents over 97% of the estimated 490 Mbp. The assembly anchors over 98% of the predicted genes to the nine assembled nuclear chromosomes and contains more functional annotation gene models than previous assemblies. Our annotation has identified a large number of unique gene ontology terms related to metabolic activities, a region of chromosome 9 with several growth factor proteins, and regions syntenic with pearl millet or maize genomic regions that have been previously shown to affect growth. The new assembly and annotation for this important species can be used for detailed investigation and future innovations in growth for millet and other grains.

  4. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    Science.gov (United States)

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  5. Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches.

    Science.gov (United States)

    Lin, Hsin-Hung; Liao, Yu-Chieh

    2015-01-01

    Despite the ever-increasing output of next-generation sequencing data along with developing assemblers, dozens to hundreds of gaps still exist in de novo microbial assemblies due to uneven coverage and large genomic repeats. Third-generation single-molecule, real-time (SMRT) sequencing technology avoids amplification artifacts and generates kilobase-long reads with the potential to complete microbial genome assembly. However, due to the low accuracy (~85%) of third-generation sequences, a considerable amount of long reads (>50X) are required for self-correction and for subsequent de novo assembly. Recently-developed hybrid approaches, using next-generation sequencing data and as few as 5X long reads, have been proposed to improve the completeness of microbial assembly. In this study we have evaluated the contemporary hybrid approaches and demonstrated that assembling corrected long reads (by runCA) produced the best assembly compared to long-read scaffolding (e.g., AHA, Cerulean and SSPACE-LongRead) and gap-filling (SPAdes). For generating corrected long reads, we further examined long-read correction tools, such as ECTools, LSC, LoRDEC, PBcR pipeline and proovread. We have demonstrated that three microbial genomes including Escherichia coli K12 MG1655, Meiothermus ruber DSM1279 and Pdeobacter heparinus DSM2366 were successfully hybrid assembled by runCA into near-perfect assemblies using ECTools-corrected long reads. In addition, we developed a tool, Patch, which implements corrected long reads and pre-assembled contigs as inputs, to enhance microbial genome assemblies. With the additional 20X long reads, short reads of S. cerevisiae W303 were hybrid assembled into 115 contigs using the verified strategy, ECTools + runCA. Patch was subsequently applied to upgrade the assembly to a 35-contig draft genome. Our evaluation of the hybrid approaches shows that assembling the ECTools-corrected long reads via runCA generates near complete microbial genomes, suggesting

  6. High-coverage sequencing and annotated assembly of the genome of the Australian dragon lizard Pogona vitticeps.

    Science.gov (United States)

    Georges, Arthur; Li, Qiye; Lian, Jinmin; O'Meally, Denis; Deakin, Janine; Wang, Zongji; Zhang, Pei; Fujita, Matthew; Patel, Hardip R; Holleley, Clare E; Zhou, Yang; Zhang, Xiuwen; Matsubara, Kazumi; Waters, Paul; Graves, Jennifer A Marshall; Sarre, Stephen D; Zhang, Guojie

    2015-01-01

    The lizards of the family Agamidae are one of the most prominent elements of the Australian reptile fauna. Here, we present a genomic resource built on the basis of a wild-caught male ZZ central bearded dragon Pogona vitticeps. The genomic sequence for P. vitticeps, generated on the Illumina HiSeq 2000 platform, comprised 317 Gbp (179X raw read depth) from 13 insert libraries ranging from 250 bp to 40 kbp. After filtering for low-quality and duplicated reads, 146 Gbp of data (83X) was available for assembly. Exceptionally high levels of heterozygosity (0.85 % of single nucleotide polymorphisms plus sequence insertions or deletions) complicated assembly; nevertheless, 96.4 % of reads mapped back to the assembled scaffolds, indicating that the assembly included most of the sequenced genome. Length of the assembly was 1.8 Gbp in 545,310 scaffolds (69,852 longer than 300 bp), the longest being 14.68 Mbp. N50 was 2.29 Mbp. Genes were annotated on the basis of de novo prediction, similarity to the green anole Anolis carolinensis, Gallus gallus and Homo sapiens proteins, and P. vitticeps transcriptome sequence assemblies, to yield 19,406 protein-coding genes in the assembly, 63 % of which had intact open reading frames. Our assembly captured 99 % (246 of 248) of core CEGMA genes, with 93 % (231) being complete. The quality of the P. vitticeps assembly is comparable or superior to that of other published squamate genomes, and the annotated P. vitticeps genome can be accessed through a genome browser available at https://genomics.canberra.edu.au.

  7. Draft Sequencing of the Heterozygous Diploid Genome of Satsuma (Citrus unshiu Marc. Using a Hybrid Assembly Approach

    Directory of Open Access Journals (Sweden)

    Tokurou Shimizu

    2017-12-01

    Full Text Available Satsuma (Citrus unshiu Marc. is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma (“Miyagawa Wase” was conducted by a hybrid assembly approach using short-read sequences, three mate-pair libraries, and a long-read sequence of PacBio by the PLATANUS assembler. The assembled sequence, with a total size of 359.7 Mb at the N50 length of 386,404 bp, consisted of 20,876 scaffolds. Pseudomolecules of Satsuma constructed by aligning the scaffolds to three genetic maps showed genome-wide synteny to the genomes of Clementine, pummelo, and sweet orange. Gene prediction by modeling with MAKER-P proposed 29,024 genes and 37,970 mRNA; additionally, gene prediction analysis found candidates for novel genes in several biosynthesis pathways for gibberellin and violaxanthin catabolism. BUSCO scores for the assembled scaffold and predicted transcripts, and another analysis by BAC end sequence mapping indicated the assembled genome consistency was close to those of the haploid Clementine, pummel, and sweet orange genomes. The number of repeat elements and long terminal repeat retrotransposon were comparable to those of the seven citrus genomes; this suggested no significant failure in the assembly at the repeat region. A resequencing application using the assembled sequence confirmed that both kunenbo-A and Satsuma are offsprings of Kishu, and Satsuma is a back-crossed offspring of Kishu. These results illustrated the performance of the hybrid assembly approach and its ability to construct an accurate heterozygous diploid genome.

  8. Partial digestion with restriction enzymes of ultraviolet-irradiated human genomic DNA: a method for identifying restriction site polymorphisms

    International Nuclear Information System (INIS)

    Nobile, C.; Romeo, G.

    1988-01-01

    A method for partial digestion of total human DNA with restriction enzymes has been developed on the basis of a principle already utilized by P.A. Whittaker and E. Southern for the analysis of phage lambda recombinants. Total human DNA irradiated with uv light of 254 nm is partially digested by restriction enzymes that recognize sequences containing adjacent thymidines because of TT dimer formation. The products resulting from partial digestion of specific genomic regions are detected in Southern blots by genomic-unique DNA probes with high reproducibility. This procedure is rapid and simple to perform because the same conditions of uv irradiation are used for different enzymes and probes. It is shown that restriction site polymorphisms occurring in the genomic regions analyzed are recognized by the allelic partial digest patterns they determine

  9. Experimental Assessment of a New Passive Neutron Multiplication Counter for Partial Defect Verification of LWR Fuel Assemblies

    International Nuclear Information System (INIS)

    LaFleur, A.; Menlove, H.; Park, S.-H.; Lee, S. K.; Oh, J.-M.; Kim, H.-D.

    2015-01-01

    The development of non-destructive assay (NDA) capabilities to improve partial defect verification of spent fuel assemblies is needed to improve the timely detection of the diversion of significant quantities of fissile material. This NDA capability is important to the implementation of integrated safeguards for spent fuel verification by the International Atomic Energy Agency (IAEA) and would improve deterrence of possible diversions by increasing the risk of early detection. A new NDA technique called Passive Neutron Multiplication Counter (PNMC) is currently being developed at Los Alamos National Laboratory (LANL) to improve safeguards measurements of LightWater Reactor (LWR) fuel assemblies. The PNMC uses the ratio of the fast-neutron emission rate to the thermalneutron emission rate to quantify the neutron multiplication of the item. The fast neutrons versus thermal neutrons are measured using fission chambers (FC) that have differential shielding to isolate fast and thermal energies. The fast-neutron emission rate is directly proportional to the neutron multiplication in the spent fuel assembly; whereas, the thermalneutron leakage is suppressed by the fissile material absorption in the assembly. These FCs are already implemented in the basic Self-Interrogation Neutron Resonance Densitometry (SINRD) detector package. Experimental measurements of fresh and spent PWR fuel assemblies were performed at LANL and the Korea Atomic Energy Research Institute (KAERI), respectively, using a hybrid PNMC and SINRD detector. The results from these measurements provides valuable experimental data that directly supports safeguards research and development (R&D) efforts on the viability of passive neutron NDA techniques and detector designs for partial defect verification of spent fuel assemblies. (author)

  10. Draft Genome Sequence of a “Candidatus Liberibacter europaeus” Strain Assembled from Broom Psyllids (Arytainilla spartiophila) from New Zealand

    Science.gov (United States)

    Thompson, Sarah M.; Kalamorz, Falk; David, Charles; Addison, Shea M.; Smith, Grant R.

    2018-01-01

    ABSTRACT Here, we report the draft genome sequence of “Candidatus Liberibacter europaeus” ASNZ1, assembled from broom psyllids (Arytainilla spartiophila) from New Zealand. The assembly comprises 15 contigs, with a total length of 1.33 Mb and a G+C content of 33.5%. PMID:29773636

  11. Assembly and comparative analysis of complete mitochondrial genome sequence of an economic plant Salix suchowensis

    Directory of Open Access Journals (Sweden)

    Ning Ye

    2017-03-01

    Full Text Available Willow is a widely used dioecious woody plant of Salicaceae family in China. Due to their high biomass yields, willows are promising sources for bioenergy crops. In this study, we assembled the complete mitochondrial (mt genome sequence of S. suchowensis with the length of 644,437 bp using Roche-454 GS FLX Titanium sequencing technologies. Base composition of the S. suchowensis mt genome is A (27.43%, T (27.59%, C (22.34%, and G (22.64%, which shows a prevalent GC content with that of other angiosperms. This long circular mt genome encodes 58 unique genes (32 protein-coding genes, 23 tRNA genes and 3 rRNA genes, and 9 of the 32 protein-coding genes contain 17 introns. Through the phylogenetic analysis of 35 species based on 23 protein-coding genes, it is supported that Salix as a sister to Populus. With the detailed phylogenetic information and the identification of phylogenetic position, some ribosomal protein genes and succinate dehydrogenase genes are found usually lost during evolution. As a native shrub willow species, this worthwhile research of S. suchowensis mt genome will provide more desirable information for better understanding the genomic breeding and missing pieces of sex determination evolution in the future.

  12. Reference quality assembly of the 3.5 Gb genome of Capsicum annuum form a single linked-read library

    Science.gov (United States)

    Linked-Read sequencing technology has recently been employed successfully for de novo assembly of multiple human genomes, however the utility of this technology for complex plant genomes is unproven. We evaluated the technology for this purpose by sequencing the 3.5 gigabase (Gb) diploid pepper (Cap...

  13. De novo Transcriptome Assemblies of Rana (Lithobates catesbeiana and Xenopus laevis Tadpole Livers for Comparative Genomics without Reference Genomes.

    Directory of Open Access Journals (Sweden)

    Inanc Birol

    Full Text Available In this work we studied the liver transcriptomes of two frog species, the American bullfrog (Rana (Lithobates catesbeiana and the African clawed frog (Xenopus laevis. We used high throughput RNA sequencing (RNA-seq data to assemble and annotate these transcriptomes, and compared how their baseline expression profiles change when tadpoles of the two species are exposed to thyroid hormone. We generated more than 1.5 billion RNA-seq reads in total for the two species under two conditions as treatment/control pairs. We de novo assembled these reads using Trans-ABySS to reconstruct reference transcriptomes, obtaining over 350,000 and 130,000 putative transcripts for R. catesbeiana and X. laevis, respectively. Using available genomics resources for X. laevis, we annotated over 97% of our X. laevis transcriptome contigs, demonstrating the utility and efficacy of our methodology. Leveraging this validated analysis pipeline, we also annotated the assembled R. catesbeiana transcriptome. We used the expression profiles of the annotated genes of the two species to examine the similarities and differences between the tadpole liver transcriptomes. We also compared the gene ontology terms of expressed genes to measure how the animals react to a challenge by thyroid hormone. Our study reports three main conclusions. First, de novo assembly of RNA-seq data is a powerful method for annotating and establishing transcriptomes of non-model organisms. Second, the liver transcriptomes of the two frog species, R. catesbeiana and X. laevis, show many common features, and the distribution of their gene ontology profiles are statistically indistinguishable. Third, although they broadly respond the same way to the presence of thyroid hormone in their environment, their receptor/signal transduction pathways display marked differences.

  14. Assembling large genomes: analysis of the stick insect (Clitarchus hookeri) genome reveals a high repeat content and sex-biased genes associated with reproduction.

    Science.gov (United States)

    Wu, Chen; Twort, Victoria G; Crowhurst, Ross N; Newcomb, Richard D; Buckley, Thomas R

    2017-11-16

    Stick insects (Phasmatodea) have a high incidence of parthenogenesis and other alternative reproductive strategies, yet the genetic basis of reproduction is poorly understood. Phasmatodea includes nearly 3000 species, yet only the genome of Timema cristinae has been published to date. Clitarchus hookeri is a geographical parthenogenetic stick insect distributed across New Zealand. Sexual reproduction dominates in northern habitats but is replaced by parthenogenesis in the south. Here, we present a de novo genome assembly of a female C. hookeri and use it to detect candidate genes associated with gamete production and development in females and males. We also explore the factors underlying large genome size in stick insects. The C. hookeri genome assembly was 4.2 Gb, similar to the flow cytometry estimate, making it the second largest insect genome sequenced and assembled to date. Like the large genome of Locusta migratoria, the genome of C. hookeri is also highly repetitive and the predicted gene models are much longer than those from most other sequenced insect genomes, largely due to longer introns. Miniature inverted repeat transposable elements (MITEs), absent in the much smaller T. cristinae genome, is the most abundant repeat type in the C. hookeri genome assembly. Mapping RNA-Seq reads from female and male gonadal transcriptomes onto the genome assembly resulted in the identification of 39,940 gene loci, 15.8% and 37.6% of which showed female-biased and male-biased expression, respectively. The genes that were over-expressed in females were mostly associated with molecular transportation, developmental process, oocyte growth and reproductive process; whereas, the male-biased genes were enriched in rhythmic process, molecular transducer activity and synapse. Several genes involved in the juvenile hormone synthesis pathway were also identified. The evolution of large insect genomes such as L. migratoria and C. hookeri genomes is most likely due to the

  15. HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly

    Directory of Open Access Journals (Sweden)

    Md Mahfuzer Rahman

    2017-01-01

    Full Text Available Background. The rapid advancement of sequencing technologies has made it possible to regularly produce millions of high-quality reads from the DNA samples in the sequencing laboratories. To this end, the de Bruijn graph is a popular data structure in the genome assembly literature for efficient representation and processing of data. Due to the number of nodes in a de Bruijn graph, the main barrier here is the memory and runtime. Therefore, this area has received significant attention in contemporary literature. Results. In this paper, we present an approach called HaVec that attempts to achieve a balance between the memory consumption and the running time. HaVec uses a hash table along with an auxiliary vector data structure to store the de Bruijn graph thereby improving the total memory usage and the running time. A critical and noteworthy feature of HaVec is that it exhibits no false positive error. Conclusions. In general, the graph construction procedure takes the major share of the time involved in an assembly process. HaVec can be seen as a significant advancement in this aspect. We anticipate that HaVec will be extremely useful in the de Bruijn graph-based genome assembly.

  16. Functional Annotation, Genome Organization and Phylogeny of the Grapevine (Vitis vinifera Terpene Synthase Gene Family Based on Genome Assembly, FLcDNA Cloning, and Enzyme Assays

    Directory of Open Access Journals (Sweden)

    Toub Omid

    2010-10-01

    Full Text Available Abstract Background Terpenoids are among the most important constituents of grape flavour and wine bouquet, and serve as useful metabolite markers in viticulture and enology. Based on the initial 8-fold sequencing of a nearly homozygous Pinot noir inbred line, 89 putative terpenoid synthase genes (VvTPS were predicted by in silico analysis of the grapevine (Vitis vinifera genome assembly 1. The finding of this very large VvTPS family, combined with the importance of terpenoid metabolism for the organoleptic properties of grapevine berries and finished wines, prompted a detailed examination of this gene family at the genomic level as well as an investigation into VvTPS biochemical functions. Results We present findings from the analysis of the up-dated 12-fold sequencing and assembly of the grapevine genome that place the number of predicted VvTPS genes at 69 putatively functional VvTPS, 20 partial VvTPS, and 63 VvTPS probable pseudogenes. Gene discovery and annotation included information about gene architecture and chromosomal location. A dense cluster of 45 VvTPS is localized on chromosome 18. Extensive FLcDNA cloning, gene synthesis, and protein expression enabled functional characterization of 39 VvTPS; this is the largest number of functionally characterized TPS for any species reported to date. Of these enzymes, 23 have unique functions and/or phylogenetic locations within the plant TPS gene family. Phylogenetic analyses of the TPS gene family showed that while most VvTPS form species-specific gene clusters, there are several examples of gene orthology with TPS of other plant species, representing perhaps more ancient VvTPS, which have maintained functions independent of speciation. Conclusions The highly expanded VvTPS gene family underpins the prominence of terpenoid metabolism in grapevine. We provide a detailed experimental functional annotation of 39 members of this important gene family in grapevine and comprehensive information

  17. FANTOM5 CAGE profiles of human and mouse reprocessed for GRCh38 and GRCm38 genome assemblies.

    Science.gov (United States)

    Abugessaisa, Imad; Noguchi, Shuhei; Hasegawa, Akira; Harshbarger, Jayson; Kondo, Atsushi; Lizio, Marina; Severin, Jessica; Carninci, Piero; Kawaji, Hideya; Kasukawa, Takeya

    2017-08-29

    The FANTOM5 consortium described the promoter-level expression atlas of human and mouse by using CAGE (Cap Analysis of Gene Expression) with single molecule sequencing. In the original publications, GRCh37/hg19 and NCBI37/mm9 assemblies were used as the reference genomes of human and mouse respectively; later, the Genome Reference Consortium released newer genome assemblies GRCh38/hg38 and GRCm38/mm10. To increase the utility of the atlas in forthcoming researches, we reprocessed the data to make them available on the recent genome assemblies. The data include observed frequencies of transcription starting sites (TSSs) based on the realignment of CAGE reads, and TSS peaks that are converted from those based on the previous reference. Annotations of the peak names were also updated based on the latest public databases. The reprocessed results enable us to examine frequencies of transcription initiations on the recent genome assemblies and to refer promoters with updated information across the genome assemblies consistently.

  18. Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes

    Energy Technology Data Exchange (ETDEWEB)

    White, Richard Allen; Bottos, Eric M.; Roy Chowdhury, Taniya; Zucker, Jeremy D.; Brislawn, Colin J.; Nicora, Carrie D.; Fansler, Sarah J.; Glaesemann, Kurt R.; Glass, Kevin; Jansson, Janet K.; Langille, Morgan

    2016-06-28

    ABSTRACT

    Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600 reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “CandidatusPseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundanceAcidobacteriawere highly transcriptionally active, whereas bins corresponding to high-relative-abundanceVerrucomicrobiawere not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities.

    IMPORTANCESoil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their

  19. Information-optimal genome assembly via sparse read-overlap graphs.

    Science.gov (United States)

    Shomorony, Ilan; Kim, Samuel H; Courtade, Thomas A; Tse, David N C

    2016-09-01

    In the context of third-generation long-read sequencing technologies, read-overlap-based approaches are expected to play a central role in the assembly step. A fundamental challenge in assembling from a read-overlap graph is that the true sequence corresponds to a Hamiltonian path on the graph, and, under most formulations, the assembly problem becomes NP-hard, restricting practical approaches to heuristics. In this work, we avoid this seemingly fundamental barrier by first setting the computational complexity issue aside, and seeking an algorithm that targets information limits In particular, we consider a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the true sequence? Based on insights from this information feasibility question, we present an algorithm-the Not-So-Greedy algorithm-to construct a sparse read-overlap graph. Unlike most other assembly algorithms, Not-So-Greedy comes with a performance guarantee: whenever information feasibility conditions are satisfied, the algorithm reduces the assembly problem to an Eulerian path problem on the resulting graph, and can thus be solved in linear time. In practice, this theoretical guarantee translates into assemblies of higher quality. Evaluations on both simulated reads from real genomes and a PacBio Escherichia coli K12 dataset demonstrate that Not-So-Greedy compares favorably with standard string graph approaches in terms of accuracy of the resulting read-overlap graph and contig N50. Available at github.com/samhykim/nsg courtade@eecs.berkeley.edu or dntse@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Genomics of Compositae crops: reference transcriptome assemblies and evidence of hybridization with wild relatives.

    Science.gov (United States)

    Hodgins, Kathryn A; Lai, Zhao; Oliveira, Luiz O; Still, David W; Scascitelli, Moira; Barker, Michael S; Kane, Nolan C; Dempewolf, Hannes; Kozik, Alex; Kesseli, Richard V; Burke, John M; Michelmore, Richard W; Rieseberg, Loren H

    2014-01-01

    Although the Compositae harbours only two major food crops, sunflower and lettuce, many other species in this family are utilized by humans and have experienced various levels of domestication. Here, we have used next-generation sequencing technology to develop 15 reference transcriptome assemblies for Compositae crops or their wild relatives. These data allow us to gain insight into the evolutionary and genomic consequences of plant domestication. Specifically, we performed Illumina sequencing of Cichorium endivia, Cichorium intybus, Echinacea angustifolia, Iva annua, Helianthus tuberosus, Dahlia hybrida, Leontodon taraxacoides and Glebionis segetum, as well 454 sequencing of Guizotia scabra, Stevia rebaudiana, Parthenium argentatum and Smallanthus sonchifolius. Illumina reads were assembled using Trinity, and 454 reads were assembled using MIRA and CAP3. We evaluated the coverage of the transcriptomes using BLASTX analysis of a set of ultra-conserved orthologs (UCOs) and recovered most of these genes (88-98%). We found a correlation between contig length and read length for the 454 assemblies, and greater contig lengths for the 454 compared with the Illumina assemblies. This suggests that longer reads can aid in the assembly of more complete transcripts. Finally, we compared the divergence of orthologs at synonymous sites (Ks) between Compositae crops and their wild relatives and found greater divergence when the progenitors were self-incompatible. We also found greater divergence between pairs of taxa that had some evidence of postzygotic isolation. For several more distantly related congeners, such as chicory and endive, we identified a signature of introgression in the distribution of Ks values. © 2013 John Wiley & Sons Ltd.

  1. Retroviral Gag protein-RNA interactions: Implications for specific genomic RNA packaging and virion assembly.

    Science.gov (United States)

    Olson, Erik D; Musier-Forsyth, Karin

    2018-03-31

    Retroviral Gag proteins are responsible for coordinating many aspects of virion assembly. Gag possesses two distinct nucleic acid binding domains, matrix (MA) and nucleocapsid (NC). One of the critical functions of Gag is to specifically recognize, bind, and package the retroviral genomic RNA (gRNA) into assembling virions. Gag interactions with cellular RNAs have also been shown to regulate aspects of assembly. Recent results have shed light on the role of MA and NC domain interactions with nucleic acids, and how they jointly function to ensure packaging of the retroviral gRNA. Here, we will review the literature regarding RNA interactions with NC, MA, as well as overall mechanisms employed by Gag to interact with RNA. The discussion focuses on human immunodeficiency virus type-1, but other retroviruses will also be discussed. A model is presented combining all of the available data summarizing the various factors and layers of selection Gag employs to ensure specific gRNA packaging and correct virion assembly. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.

    Science.gov (United States)

    Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin; Chitsaz, Hamidreza

    2013-10-01

    Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.

  3. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment.

    Science.gov (United States)

    Baichoo, Shakuntala; Ouzounis, Christos A

    A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Partial replicas of uv-irradiated bacteriophage T4 genomes and their role in multiplicity reactivation

    International Nuclear Information System (INIS)

    Rayssiguier, C.; Kozinski, A.W.; Doermann, A.H.

    1980-01-01

    A physicochemical study was made of the replication and transmission of uv-irradiated T4 genomes. The data presented in this paper justify the following conclusions. (i) For both low and high multiplicity of infection there was abundant replication from uv-irradiated parental templates. It exceeded by far the efficiency predicted by the hypothesis that a single lethal hit completely prevents replication of the killed phage DNA: i.e., some dead phage particles must replicate parts of their DNA. (ii) Replication of the uv-irradiated DNA was repetitive as shown by density reversal experiments. (iii) Newly synthesized progeny DNA originating from uv-irradiated templates appeared as significantly shorter segments of the genomes than progeny DNA produced from non-uv-irradiated templates. A good correlation existed between the number of uv hits and the number of random cuts that would be needed to reduce replication fragments to the length observed. (iv) The contribution of uv-irradiated parental DNA among progeny phage in multiplicity reactivation was disposed in shorter subunits than was the DNA from unirradiated parental phage. It is important to emphasize that it was mainly in the form of replicative hybrid. These conclusions appear to justify excluding interparental recombination as a prerequisite for multiplicity reactivation. They lead directly to some form of partial replica hypothesis for multiplicity reactivation

  5. ATLAS (Automatic Tool for Local Assembly Structures) - A Comprehensive Infrastructure for Assembly, Annotation, and Genomic Binning of Metagenomic and Metaranscripomic Data

    Energy Technology Data Exchange (ETDEWEB)

    White, Richard A.; Brown, Joseph M.; Colby, Sean M.; Overall, Christopher C.; Lee, Joon-Yong; Zucker, Jeremy D.; Glaesemann, Kurt R.; Jansson, Georg C.; Jansson, Janet K.

    2017-03-02

    ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multiomics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequence data into functional and taxonomic data at the microbial population level and provides genome-centric resolution through genome binning. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS is user-friendly, easy install through bioconda maintained as open-source on GitHub, and is implemented in Snakemake for modular customizable workflows.

  6. Finding Nemo's Genes: A chromosome-scale reference assembly of the genome of the orange clownfish Amphiprion percula

    KAUST Repository

    Lehmann, Robert; Lightfoot, Damien J; Schunter, Celia Marei; Michell, Craig T; Ohyanagi, Hajime; Mineta, Katsuhiko; Foret, Sylvain; Berumen, Michael L.; Miller, David J; Aranda, Manuel; Gojobori, Takashi; Munday, Philip L; Ravasi, Timothy

    2018-01-01

    The iconic orange clownfish, Amphiprion percula, is a model organism for studying the ecology and evolution of reef fishes, including patterns of population connectivity, sex change, social organization, habitat selection and adaptation to climate change. Notably, the orange clownfish is the only reef fish for which a complete larval dispersal kernel has been established and was the first fish species for which it was demonstrated that anti-predator responses of reef fishes could be impaired by ocean acidification. Despite its importance, molecular resources for this species remain scarce and until now it lacked a reference genome assembly. Here we present a de novo chromosome-scale assembly of the genome of the orange clownfish Amphiprion percula. We utilized single-molecule real-time sequencing technology from Pacific Biosciences to produce an initial polished assembly comprised of 1,414 contigs, with a contig N50 length of 1.86 Mb. Using Hi-C based chromatin contact maps, 98% of the genome assembly were placed into 24 chromosomes, resulting in a final assembly of 908.8 Mb in length with contig and scaffold N50s of 3.12 and 38.4 Mb, respectively. This makes it one of the most contiguous and complete fish genome assemblies currently available. The genome was annotated with 26,597 protein coding genes and contains 96% of the core set of conserved actinopterygian orthologs. The availability of this reference genome assembly as a community resource will further strengthen the role of the orange clownfish as a model species for research on the ecology and evolution of reef fishes.

  7. Finding Nemo's Genes: A chromosome-scale reference assembly of the genome of the orange clownfish Amphiprion percula

    KAUST Repository

    Lehmann, Robert

    2018-03-08

    The iconic orange clownfish, Amphiprion percula, is a model organism for studying the ecology and evolution of reef fishes, including patterns of population connectivity, sex change, social organization, habitat selection and adaptation to climate change. Notably, the orange clownfish is the only reef fish for which a complete larval dispersal kernel has been established and was the first fish species for which it was demonstrated that anti-predator responses of reef fishes could be impaired by ocean acidification. Despite its importance, molecular resources for this species remain scarce and until now it lacked a reference genome assembly. Here we present a de novo chromosome-scale assembly of the genome of the orange clownfish Amphiprion percula. We utilized single-molecule real-time sequencing technology from Pacific Biosciences to produce an initial polished assembly comprised of 1,414 contigs, with a contig N50 length of 1.86 Mb. Using Hi-C based chromatin contact maps, 98% of the genome assembly were placed into 24 chromosomes, resulting in a final assembly of 908.8 Mb in length with contig and scaffold N50s of 3.12 and 38.4 Mb, respectively. This makes it one of the most contiguous and complete fish genome assemblies currently available. The genome was annotated with 26,597 protein coding genes and contains 96% of the core set of conserved actinopterygian orthologs. The availability of this reference genome assembly as a community resource will further strengthen the role of the orange clownfish as a model species for research on the ecology and evolution of reef fishes.

  8. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo: genome assembly and analysis.

    Directory of Open Access Journals (Sweden)

    Rami A Dalloul

    2010-09-01

    Full Text Available A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo. Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.

  9. Highly precise and developmentally programmed genome assembly in Paramecium requires ligase IV-dependent end joining.

    Directory of Open Access Journals (Sweden)

    Aurélie Kapusta

    2011-04-01

    Full Text Available During the sexual cycle of the ciliate Paramecium, assembly of the somatic genome includes the precise excision of tens of thousands of short, non-coding germline sequences (Internal Eliminated Sequences or IESs, each one flanked by two TA dinucleotides. It has been reported previously that these genome rearrangements are initiated by the introduction of developmentally programmed DNA double-strand breaks (DSBs, which depend on the domesticated transposase PiggyMac. These DSBs all exhibit a characteristic geometry, with 4-base 5' overhangs centered on the conserved TA, and may readily align and undergo ligation with minimal processing. However, the molecular steps and actors involved in the final and precise assembly of somatic genes have remained unknown. We demonstrate here that Ligase IV and Xrcc4p, core components of the non-homologous end-joining pathway (NHEJ, are required both for the repair of IES excision sites and for the circularization of excised IESs. The transcription of LIG4 and XRCC4 is induced early during the sexual cycle and a Lig4p-GFP fusion protein accumulates in the developing somatic nucleus by the time IES excision takes place. RNAi-mediated silencing of either gene results in the persistence of free broken DNA ends, apparently protected against extensive resection. At the nucleotide level, controlled removal of the 5'-terminal nucleotide occurs normally in LIG4-silenced cells, while nucleotide addition to the 3' ends of the breaks is blocked, together with the final joining step, indicative of a coupling between NHEJ polymerase and ligase activities. Taken together, our data indicate that IES excision is a "cut-and-close" mechanism, which involves the introduction of initiating double-strand cleavages at both ends of each IES, followed by DSB repair via highly precise end joining. This work broadens our current view on how the cellular NHEJ pathway has cooperated with domesticated transposases for the emergence of new

  10. Highly precise and developmentally programmed genome assembly in Paramecium requires ligase IV-dependent end joining.

    Science.gov (United States)

    Kapusta, Aurélie; Matsuda, Atsushi; Marmignon, Antoine; Ku, Michael; Silve, Aude; Meyer, Eric; Forney, James D; Malinsky, Sophie; Bétermier, Mireille

    2011-04-01

    During the sexual cycle of the ciliate Paramecium, assembly of the somatic genome includes the precise excision of tens of thousands of short, non-coding germline sequences (Internal Eliminated Sequences or IESs), each one flanked by two TA dinucleotides. It has been reported previously that these genome rearrangements are initiated by the introduction of developmentally programmed DNA double-strand breaks (DSBs), which depend on the domesticated transposase PiggyMac. These DSBs all exhibit a characteristic geometry, with 4-base 5' overhangs centered on the conserved TA, and may readily align and undergo ligation with minimal processing. However, the molecular steps and actors involved in the final and precise assembly of somatic genes have remained unknown. We demonstrate here that Ligase IV and Xrcc4p, core components of the non-homologous end-joining pathway (NHEJ), are required both for the repair of IES excision sites and for the circularization of excised IESs. The transcription of LIG4 and XRCC4 is induced early during the sexual cycle and a Lig4p-GFP fusion protein accumulates in the developing somatic nucleus by the time IES excision takes place. RNAi-mediated silencing of either gene results in the persistence of free broken DNA ends, apparently protected against extensive resection. At the nucleotide level, controlled removal of the 5'-terminal nucleotide occurs normally in LIG4-silenced cells, while nucleotide addition to the 3' ends of the breaks is blocked, together with the final joining step, indicative of a coupling between NHEJ polymerase and ligase activities. Taken together, our data indicate that IES excision is a "cut-and-close" mechanism, which involves the introduction of initiating double-strand cleavages at both ends of each IES, followed by DSB repair via highly precise end joining. This work broadens our current view on how the cellular NHEJ pathway has cooperated with domesticated transposases for the emergence of new mechanisms

  11. Analysis of the initiating events in HIV-1 particle assembly and genome packaging.

    Directory of Open Access Journals (Sweden)

    Sebla B Kutluay

    2010-11-01

    Full Text Available HIV-1 Gag drives a number of events during the genesis of virions and is the only viral protein required for the assembly of virus-like particles in vitro and in cells. Although a reasonable understanding of the processes that accompany the later stages of HIV-1 assembly has accrued, events that occur at the initiation of assembly are less well defined. In this regard, important uncertainties include where in the cell Gag first multimerizes and interacts with the viral RNA, and whether Gag-RNA interaction requires or induces Gag multimerization in a living cell. To address these questions, we developed assays in which protein crosslinking and RNA/protein co-immunoprecipitation were coupled with membrane flotation analyses in transfected or infected cells. We found that interaction between Gag and viral RNA occurred in the cytoplasm and was independent of the ability of Gag to localize to the plasma membrane. However, Gag:RNA binding was stabilized by the C-terminal domain (CTD of capsid (CA, which participates in Gag-Gag interactions. We also found that Gag was present as monomers and low-order multimers (e.g. dimers but did not form higher-order multimers in the cytoplasm. Rather, high-order multimers formed only at the plasma membrane and required the presence of a membrane-binding signal, but not a Gag domain (the CA-CTD that is essential for complete particle assembly. Finally, sequential RNA-immunoprecipitation assays indicated that at least a fraction of Gag molecules can form multimers on viral genomes in the cytoplasm. Taken together, our results suggest that HIV-1 particle assembly is initiated by the interaction between Gag and viral RNA in the cytoplasm and that this initial Gag-RNA encounter involves Gag monomers or low order multimers. These interactions per se do not induce or require high-order Gag multimerization in the cytoplasm. Instead, membrane interactions are necessary for higher order Gag multimerization and subsequent

  12. Driving Forces of the Self-Assembly of Supramolecular Systems: Partially Ordered Mesophases

    Science.gov (United States)

    Shcherbina, M. A.; Chvalun, S. N.

    2018-06-01

    The main aspects are considered of the self-organization of a new class of liquid crystalline compounds, rigid sector-shaped and cone-shaped dendrons. Theoretical approaches to the self-assembly of different amphiphilic compounds (lipids, bolaamphiphiles, block copolymers, and polyelectrolytes) are described. Particular attention is given to the mesophase structures that emerge during the self-organization of mesophases characterized by intermediate degrees of ordering, e.g., plastic crystals, the rotation-crystalline phase in polymers, ordered and disordered two-dimensional columnar phases, and bicontinuous cubic phases of different symmetry.

  13. Annotated Draft Genome Assemblies for the Northern Bobwhite (Colinus virginianus and the Scaled Quail (Callipepla squamata Reveal Disparate Estimates of Modern Genome Diversity and Historic Effective Population Size

    Directory of Open Access Journals (Sweden)

    David L. Oldeschulte

    2017-09-01

    Full Text Available Northern bobwhite (Colinus virginianus; hereafter bobwhite and scaled quail (Callipepla squamata populations have suffered precipitous declines across most of their US ranges. Illumina-based first- (v1.0 and second- (v2.0 generation draft genome assemblies for the scaled quail and the bobwhite produced N50 scaffold sizes of 1.035 and 2.042 Mb, thereby producing a 45-fold improvement in contiguity over the existing bobwhite assembly, and ≥90% of the assembled genomes were captured within 1313 and 8990 scaffolds, respectively. The scaled quail assembly (v1.0 = 1.045 Gb was ∼20% smaller than the bobwhite (v2.0 = 1.254 Gb, which was supported by kmer-based estimates of genome size. Nevertheless, estimates of GC content (41.72%; 42.66%, genome-wide repetitive content (10.40%; 10.43%, and MAKER-predicted protein coding genes (17,131; 17,165 were similar for the scaled quail (v1.0 and bobwhite (v2.0 assemblies, respectively. BUSCO analyses utilizing 3023 single-copy orthologs revealed a high level of assembly completeness for the scaled quail (v1.0; 84.8% and the bobwhite (v2.0; 82.5%, as verified by comparison with well-established avian genomes. We also detected 273 putative segmental duplications in the scaled quail genome (v1.0, and 711 in the bobwhite genome (v2.0, including some that were shared among both species. Autosomal variant prediction revealed ∼2.48 and 4.17 heterozygous variants per kilobase within the scaled quail (v1.0 and bobwhite (v2.0 genomes, respectively, and estimates of historic effective population size were uniformly higher for the bobwhite across all time points in a coalescent model. However, large-scale declines were predicted for both species beginning ∼15–20 KYA.

  14. Annotated Draft Genome Assemblies for the Northern Bobwhite (Colinus virginianus) and the Scaled Quail (Callipepla squamata) Reveal Disparate Estimates of Modern Genome Diversity and Historic Effective Population Size.

    Science.gov (United States)

    Oldeschulte, David L; Halley, Yvette A; Wilson, Miranda L; Bhattarai, Eric K; Brashear, Wesley; Hill, Joshua; Metz, Richard P; Johnson, Charles D; Rollins, Dale; Peterson, Markus J; Bickhart, Derek M; Decker, Jared E; Sewell, John F; Seabury, Christopher M

    2017-09-07

    Northern bobwhite ( Colinus virginianus ; hereafter bobwhite) and scaled quail ( Callipepla squamata ) populations have suffered precipitous declines across most of their US ranges. Illumina-based first- (v1.0) and second- (v2.0) generation draft genome assemblies for the scaled quail and the bobwhite produced N50 scaffold sizes of 1.035 and 2.042 Mb, thereby producing a 45-fold improvement in contiguity over the existing bobwhite assembly, and ≥90% of the assembled genomes were captured within 1313 and 8990 scaffolds, respectively. The scaled quail assembly (v1.0 = 1.045 Gb) was ∼20% smaller than the bobwhite (v2.0 = 1.254 Gb), which was supported by kmer-based estimates of genome size. Nevertheless, estimates of GC content (41.72%; 42.66%), genome-wide repetitive content (10.40%; 10.43%), and MAKER-predicted protein coding genes (17,131; 17,165) were similar for the scaled quail (v1.0) and bobwhite (v2.0) assemblies, respectively. BUSCO analyses utilizing 3023 single-copy orthologs revealed a high level of assembly completeness for the scaled quail (v1.0; 84.8%) and the bobwhite (v2.0; 82.5%), as verified by comparison with well-established avian genomes. We also detected 273 putative segmental duplications in the scaled quail genome (v1.0), and 711 in the bobwhite genome (v2.0), including some that were shared among both species. Autosomal variant prediction revealed ∼2.48 and 4.17 heterozygous variants per kilobase within the scaled quail (v1.0) and bobwhite (v2.0) genomes, respectively, and estimates of historic effective population size were uniformly higher for the bobwhite across all time points in a coalescent model. However, large-scale declines were predicted for both species beginning ∼15-20 KYA. Copyright © 2017 Oldeschulte et al.

  15. A multi-platform draft de novo genome assembly and comparative analysis for the Scarlet Macaw (Ara macao.

    Directory of Open Access Journals (Sweden)

    Christopher M Seabury

    Full Text Available Data deposition to NCBI Genomes: This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AMXX00000000 (SMACv1.0, unscaffolded genome assembly. The version described in this paper is the first version (AMXX01000000. The scaffolded assembly (SMACv1.1 has been deposited at DDBJ/EMBL/GenBank under the accession AOUJ00000000, and is also the first version (AOUJ01000000. Strong biological interest in traits such as the acquisition and utilization of speech, cognitive abilities, and longevity catalyzed the utilization of two next-generation sequencing platforms to provide the first-draft de novo genome assembly for the large, new world parrot Ara macao (Scarlet Macaw. Despite the challenges associated with genome assembly for an outbred avian species, including 951,507 high-quality putative single nucleotide polymorphisms, the final genome assembly (>1.035 Gb includes more than 997 Mb of unambiguous sequence data (excluding N's. Cytogenetic analyses including ZooFISH revealed complex rearrangements associated with two scarlet macaw macrochromosomes (AMA6, AMA7, which supports the hypothesis that translocations, fusions, and intragenomic rearrangements are key factors associated with karyotype evolution among parrots. In silico annotation of the scarlet macaw genome provided robust evidence for 14,405 nuclear gene annotation models, their predicted transcripts and proteins, and a complete mitochondrial genome. Comparative analyses involving the scarlet macaw, chicken, and zebra finch genomes revealed high levels of nucleotide-based conservation as well as evidence for overall genome stability among the three highly divergent species. Application of a new whole-genome analysis of divergence involving all three species yielded prioritized candidate genes and noncoding regions for parrot traits of interest (i.e., speech, intelligence, longevity which were independently supported by the results of previous human GWAS

  16. Physical mapping of 20 unmapped fragments of the btau_4.0 genome assembly in cattle, sheep and river buffalo.

    Science.gov (United States)

    De Lorenzi, L; Genualdo, V; Perucatti, A; Iannuzzi, A; Iannuzzi, L; Parma, P

    2013-01-01

    The recent advances in sequencing technology and bioinformatics have revolutionized genomic research, making the decoding of the genome an easier task. Genome sequences are currently available for many species, including cattle, sheep and river buffalo. The available reference genomes are very accurate, and they represent the best possible order of loci at this time. In cattle, despite the great accuracy achieved, a part of the genome has been sequenced but not yet assembled: these genome fragments are called unmapped fragments. In the present study, 20 unmapped fragments belonging to the Btau_4.0 reference genome have been mapped by FISH in cattle (Bos taurus, 2n = 60), sheep (Ovis aries, 2n = 54) and river buffalo (Bubalus bubalis, 2n = 50). Our results confirm the accuracy of the available reference genome, though there are some discrepancies between the expected localization and the observed localization. Moreover, the available data in the literature regarding genomic homologies between cattle, sheep and river buffalo are confirmed. Finally, the results presented here suggest that FISH was, and still is, a useful technology to validate the data produced by genome sequencing programs. Copyright © 2013 S. Karger AG, Basel.

  17. Ebola virus VP24 interacts with NP to facilitate nucleocapsid assembly and genome packaging.

    Science.gov (United States)

    Banadyga, Logan; Hoenen, Thomas; Ambroggio, Xavier; Dunham, Eric; Groseth, Allison; Ebihara, Hideki

    2017-08-09

    Ebola virus causes devastating hemorrhagic fever outbreaks for which no approved therapeutic exists. The viral nucleocapsid, which is minimally composed of the proteins NP, VP35, and VP24, represents an attractive target for drug development; however, the molecular determinants that govern the interactions and functions of these three proteins are still unknown. Through a series of mutational analyses, in combination with biochemical and bioinformatics approaches, we identified a region on VP24 that was critical for its interaction with NP. Importantly, we demonstrated that the interaction between VP24 and NP was required for both nucleocapsid assembly and genome packaging. Not only does this study underscore the critical role that these proteins play in the viral replication cycle, but it also identifies a key interaction interface on VP24 that may serve as a novel target for antiviral therapeutic intervention.

  18. New Clasp Assembly for Distal Extension Removable Partial Dentures: The Reverse RPA Clasp.

    Science.gov (United States)

    Hakkoum, Mohammad Ayham

    2016-07-01

    Several clasp types are used in distal extension removable partial dentures. In some cases the terminal abutments have only distal retentive undercuts that can be occupied by bar clasps; however, bar clasps may be contraindicated with no suitable alternative. This article presents a reasonable solution by introducing a new clasp design as a modification to the well-known RPA clasp. The design includes a mesial rest, proximal plate, and buccal retentive arm arising from the rest and extending to reach the distal retentive undercut. © 2015 by the American College of Prosthodontists.

  19. Comparison of bacterial genome assembly software for MinION data and their applicability to medical microbiology.

    Science.gov (United States)

    Judge, Kim; Hunt, Martin; Reuter, Sandra; Tracey, Alan; Quail, Michael A; Parkhill, Julian; Peacock, Sharon J

    2016-09-01

    Translating the Oxford Nanopore MinION sequencing technology into medical microbiology requires on-going analysis that keeps pace with technological improvements to the instrument and release of associated analysis software. Here, we use a multidrug-resistant Enterobacter kobei isolate as a model organism to compare open source software for the assembly of genome data, and relate this to the time taken to generate actionable information. Three software tools (PBcR, Canu and miniasm) were used to assemble MinION data and a fourth (SPAdes) was used to combine MinION and Illumina data to produce a hybrid assembly. All four had a similar number of contigs and were more contiguous than the assembly using Illumina data alone, with SPAdes producing a single chromosomal contig. Evaluation of the four assemblies to represent the genome structure revealed a single large inversion in the SPAdes assembly, which also incorrectly integrated a plasmid into the chromosomal contig. Almost 50 %, 80 % and 90 % of MinION pass reads were generated in the first 6, 9 and 12 h, respectively. Using data from the first 6 h alone led to a less accurate, fragmented assembly, but data from the first 9 or 12 h generated similar assemblies to that from 48 h sequencing. Assemblies were generated in 2 h using Canu, indicating that going from isolate to assembled data is possible in less than 48 h. MinION data identified that genes responsible for resistance were carried by two plasmids encoding resistance to carbapenem and to sulphonamides, rifampicin and aminoglycosides, respectively.

  20. De novo assembly of the zucchini genome reveals a whole-genome duplication associated with the origin of the Cucurbita genus.

    Science.gov (United States)

    Montero-Pau, Javier; Blanca, José; Bombarely, Aureliano; Ziarsolo, Peio; Esteras, Cristina; Martí-Gómez, Carlos; Ferriol, María; Gómez, Pedro; Jamilena, Manuel; Mueller, Lukas; Picó, Belén; Cañizares, Joaquín

    2017-11-07

    The Cucurbita genus (squashes, pumpkins and gourds) includes important domesticated species such as C. pepo, C. maxima and C. moschata. In this study, we present a high-quality draft of the zucchini (C. pepo) genome. The assembly has a size of 263 Mb, a scaffold N50 of 1.8 Mb and 34 240 gene models. It includes 92% of the conserved BUSCO core gene set, and it is estimated to cover 93.0% of the genome. The genome is organized in 20 pseudomolecules that represent 81.4% of the assembly, and it is integrated with a genetic map of 7718 SNPs. Despite the small genome size, three independent lines of evidence support that the C. pepo genome is the result of a whole-genome duplication: the topology of the gene family phylogenies, the karyotype organization and the distribution of 4DTv distances. Additionally, 40 transcriptomes of 12 species of the genus were assembled and analysed together with all the other published genomes of the Cucurbitaceae family. The duplication was detected in all the Cucurbita species analysed, including C. maxima and C. moschata, but not in the more distant cucurbits belonging to the Cucumis and Citrullus genera, and it is likely to have occurred 30 ± 4 Mya in the ancestral species that gave rise to the genus. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  1. In vivo Assembly in Escherichia coli of Transformation Vectors for Plastid Genome Engineering

    Directory of Open Access Journals (Sweden)

    Yuyong Wu

    2017-08-01

    Full Text Available Plastid transformation for the expression of recombinant proteins and entire metabolic pathways has become a promising tool for plant biotechnology. However, large-scale application of this technology has been hindered by some technical bottlenecks, including lack of routine transformation protocols for agronomically important crop plants like rice or maize. Currently, there are no standard or commercial plastid transformation vectors available for the scientific community. Construction of a plastid transformation vector usually requires tedious and time-consuming cloning steps. In this study, we describe the adoption of an in vivo Escherichia coli cloning (iVEC technology to quickly assemble a plastid transformation vector. The method enables simple and seamless build-up of a complete plastid transformation vector from five DNA fragments in a single step. The vector assembled for demonstration purposes contains an enhanced green fluorescent protein (GFP expression cassette, in which the gfp transgene is driven by the tobacco plastid ribosomal RNA operon promoter fused to the 5′ untranslated region (UTR from gene10 of bacteriophage T7 and the transcript-stabilizing 3′UTR from the E. coli ribosomal RNA operon rrnB. Successful transformation of the tobacco plastid genome was verified by Southern blot analysis and seed assays. High-level expression of the GFP reporter in the transplastomic plants was visualized by confocal microscopy and Coomassie staining, and GFP accumulation was ~9% of the total soluble protein. The iVEC method represents a simple and efficient approach for construction of plastid transformation vector, and offers great potential for the assembly of increasingly complex vectors for synthetic biology applications in plastids.

  2. Insight into structure and assembly of the nuclear pore complex by utilizing the genome of a eukaryotic thermophile

    DEFF Research Database (Denmark)

    Amlacher, Stefan; Sarges, Phillip; Flemming, Dirk

    2011-01-01

    is composed of two large Nups, Nup192 and Nup170, which are flexibly bridged by short linear motifs made up of linker Nups, Nic96 and Nup53. This assembly illustrates how Nup interactions can generate structural plasticity within the NPC scaffold. Our findings therefore demonstrate the utility of the genome...

  3. Construction of high resolution genetic linkage maps to improve the soybean genome sequence assembly Glyma1.01

    Science.gov (United States)

    A landmark in soybean research, Glyma1.01, the first whole genome sequence of variety Williams 82 (Glycine max L. Merr.) was completed in 2010 and is widely used. However, because the assembly was primarily built based on the linkage maps constructed with a limited number of markers and recombinant...

  4. Nanopore Long-Read Guided Complete Genome Assembly of Hydrogenophaga intermedia, and Genomic Insights into 4-Aminobenzenesulfonate, p-Aminobenzoic Acid and Hydrogen Metabolism in the Genus Hydrogenophaga.

    Science.gov (United States)

    Gan, Han M; Lee, Yin P; Austin, Christopher M

    2017-01-01

    We improved upon the previously reported draft genome of Hydrogenophaga intermedia strain PBC, a 4-aminobenzenesulfonate-degrading bacterium, by supplementing the assembly with Nanopore long reads which enabled the reconstruction of the genome as a single contig. From the complete genome, major genes responsible for the catabolism of 4-aminobenzenesulfonate in strain PBC are clustered in two distinct genomic regions. Although the catabolic genes for 4-sulfocatechol, the deaminated product of 4-aminobenzenesulfonate, are only found in H. intermedia , the sad operon responsible for the first deamination step of 4-aminobenzenesulfonate is conserved in various Hydrogenophaga strains. The absence of pabB gene in the complete genome of H. intermedia PBC is consistent with its p -aminobenzoic acid (pABA) auxotrophy but surprisingly comparative genomics analysis of 14 Hydrogenophaga genomes indicate that pABA auxotrophy is not an uncommon feature among members of this genus. Of even more interest, several Hydrogenophaga strains do not possess the genomic potential for hydrogen oxidation, calling for a revision to the taxonomic description of Hydrogenophaga as "hydrogen eating bacteria."

  5. Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

    Directory of Open Access Journals (Sweden)

    Rodrigo Pessôa

    Full Text Available BACKGROUND: Here, we report on the partial and full-length genomic (FLG variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs, 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP and 7 adult T-cell leukemia/lymphoma (ATLL patients, using an Illumina paired-end protocol. METHODS: Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. RESULTS: A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14 and FLG (n = 76 data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5% individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA and that 4 individuals (4.5% were infected with the Japanese sub-subtypes (aB. A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. CONCLUSIONS: This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data

  6. Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

    Science.gov (United States)

    Pessôa, Rodrigo; Watanabe, Jaqueline Tomoko; Nukui, Youko; Pereira, Juliana; Casseb, Jorge; Kasseb, Jorge; de Oliveira, Augusto César Penalva; Segurado, Aluisio Cotrim; Sanabani, Sabri Saeed

    2014-01-01

    Here, we report on the partial and full-length genomic (FLG) variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs), 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) and 7 adult T-cell leukemia/lymphoma (ATLL) patients, using an Illumina paired-end protocol. Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14) and FLG (n = 76) data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5%) individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA) and that 4 individuals (4.5%) were infected with the Japanese sub-subtypes (aB). A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data will add to our current understanding of the

  7. De novo Genome Assembly and Single Nucleotide Variations for Soybean Mosaic Virus Using Soybean Seed Transcriptome Data

    Directory of Open Access Journals (Sweden)

    Yeonhwa Jo

    2017-10-01

    Full Text Available Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV, infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.

  8. Sequence assembly

    DEFF Research Database (Denmark)

    Scheibye-Alsing, Karsten; Hoffmann, S.; Frankel, Annett Maria

    2009-01-01

    Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome projects as well for the evolving high-throughput technologies and...... in genomic DNA, highly expressed genes and alternative transcripts in EST sequences. We summarize existing comparisons of different assemblers and provide a detailed descriptions and directions for download of assembly programs at: http://genome.ku.dk/resources/assembly/methods.html....

  9. Genome Sequence, Assembly and Characterization of Two Metschnikowia fructicola Strains Used as Biocontrol Agents of Postharvest Diseases

    Directory of Open Access Journals (Sweden)

    Edoardo Piombo

    2018-04-01

    Full Text Available The yeast Metschnikowia fructicola was reported as an efficient biological control agent of postharvest diseases of fruits and vegetables, and it is the bases of the commercial formulated product “Shemer.” Several mechanisms of action by which M. fructicola inhibits postharvest pathogens were suggested including iron-binding compounds, induction of defense signaling genes, production of fungal cell wall degrading enzymes and relatively high amounts of superoxide anions. We assembled the whole genome sequence of two strains of M. fructicola using PacBio and Illumina shotgun sequencing technologies. Using the PacBio, a high-quality draft genome consisting of 93 contigs, with an estimated genome size of approximately 26 Mb, was obtained. Comparative analysis of M. fructicola proteins with the other three available closely related genomes revealed a shared core of homologous proteins coded by 5,776 genes. Comparing the genomes of the two M. fructicola strains using a SNP calling approach resulted in the identification of 564,302 homologous SNPs with 2,004 predicted high impact mutations. The size of the genome is exceptionally high when compared with those of available closely related organisms, and the high rate of homology among M. fructicola genes points toward a recent whole-genome duplication event as the cause of this large genome. Based on the assembled genome, sequences were annotated with a gene description and gene ontology (GO term and clustered in functional groups. Analysis of CAZymes family genes revealed 1,145 putative genes, and transcriptomic analysis of CAZyme expression levels in M. fructicola during its interaction with either grapefruit peel tissue or Penicillium digitatum revealed a high level of CAZyme gene expression when the yeast was placed in wounded fruit tissue.

  10. Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study.

    Science.gov (United States)

    Cerdeira, Louise Teixeira; Carneiro, Adriana Ribeiro; Ramos, Rommel Thiago Jucá; de Almeida, Sintia Silva; D'Afonseca, Vivian; Schneider, Maria Paula Cruz; Baumbach, Jan; Tauch, Andreas; McCulloch, John Anthony; Azevedo, Vasco Ariston Carvalho; Silva, Artur

    2011-08-01

    Due to the advent of the so-called Next-Generation Sequencing (NGS) technologies the amount of monetary and temporal resources for whole-genome sequencing has been reduced by several orders of magnitude. Sequence reads can be assembled either by anchoring them directly onto an available reference genome (classical reference assembly), or can be concatenated by overlap (de novo assembly). The latter strategy is preferable because it tends to maintain the architecture of the genome sequence the however, depending on the NGS platform used, the shortness of read lengths cause tremendous problems the in the subsequent genome assembly phase, impeding closing of the entire genome sequence. To address the problem, we developed a multi-pronged hybrid de novo strategy combining De Bruijn graph and Overlap-Layout-Consensus methods, which was used to assemble from short reads the entire genome of Corynebacterium pseudotuberculosis strain I19, a bacterium with immense importance in veterinary medicine that causes Caseous Lymphadenitis in ruminants, principally ovines and caprines. Briefly, contigs were assembled de novo from the short reads and were only oriented using a reference genome by anchoring. Remaining gaps were closed using iterative anchoring of short reads by craning to gap flanks. Finally, we compare the genome sequence assembled using our hybrid strategy to a classical reference assembly using the same data as input and show that with the availability of a reference genome, it pays off to use the hybrid de novo strategy, rather than a classical reference assembly, because more genome sequences are preserved using the former. Copyright © 2011 Elsevier B.V. All rights reserved.

  11. Genome-wide association mapping of partial resistance to Phytophthora sojae in soybean plant introductions from the Republic of Korea.

    Science.gov (United States)

    Schneider, Rhiannon; Rolling, William; Song, Qijian; Cregan, Perry; Dorrance, Anne E; McHale, Leah K

    2016-08-11

    Phytophthora root and stem rot is one of the most yield-limiting diseases of soybean [Glycine max (L.) Merr], caused by the oomycete Phytophthora sojae. Partial resistance is controlled by several genes and, compared to single gene (Rps gene) resistance to P. sojae, places less selection pressure on P. sojae populations. Thus, partial resistance provides a more durable resistance against the pathogen. In previous work, plant introductions (PIs) originating from the Republic of Korea (S. Korea) have shown to be excellent sources for high levels of partial resistance against P. sojae. Resistance to two highly virulent P. sojae isolates was assessed in 1395 PIs from S. Korea via a greenhouse layer test. Lines exhibiting possible Rps gene immunity or rot due to other pathogens were removed and the remaining 800 lines were used to identify regions of quantitative resistance using genome-wide association mapping. Sixteen SNP markers on chromosomes 3, 13 and 19 were significantly associated with partial resistance to P. sojae and were grouped into seven quantitative trait loci (QTL) by linkage disequilibrium blocks. Two QTL on chromosome 3 and three QTL on chromosome 19 represent possible novel loci for partial resistance to P. sojae. While candidate genes at QTL varied in their predicted functions, the coincidence of QTLs 3-2 and 13-1 on chromosomes 3 and 13, respectively, with Rps genes and resistance gene analogs provided support for the hypothesized mechanism of partial resistance involving weak R-genes. QTL contributing to partial resistance towards P. sojae in soybean germplasm originating from S. Korea were identified. The QTL identified in this study coincide with previously reported QTL, Rps genes, as well as novel loci for partial resistance. Molecular markers associated with these QTL can be used in the marker-assisted introgression of these alleles into elite cultivars. Annotations of genes within QTL allow hypotheses on the possible mechanisms of partial

  12. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.

    Science.gov (United States)

    Han, Mira V; Thomas, Gregg W C; Lugo-Martinez, Jose; Hahn, Matthew W

    2013-08-01

    Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.

  13. Gene-enriched draft genome of the cattle tick Rhipicephalus microplus: assembly by the hybrid Pacific Biosciences/Illumina approach enabled analysis of the highly repetitive genome.

    Science.gov (United States)

    Barrero, Roberto A; Guerrero, Felix D; Black, Michael; McCooke, John; Chapman, Brett; Schilkey, Faye; Pérez de León, Adalberto A; Miller, Robert J; Bruns, Sara; Dobry, Jason; Mikhaylenko, Galina; Stormo, Keith; Bell, Callum; Tao, Quanzhou; Bogden, Robert; Moolhuijzen, Paula M; Hunter, Adam; Bellgard, Matthew I

    2017-08-01

    The genome of the cattle tick Rhipicephalus microplus, an ectoparasite with global distribution, is estimated to be 7.1Gbp in length and consists of approximately 70% repetitive DNA. We report the draft assembly of a tick genome that utilized a hybrid sequencing and assembly approach to capture the repetitive fractions of the genome. Our hybrid approach produced an assembly consisting of 2.0Gbp represented in 195,170 scaffolds with a N50 of 60,284bp. The Rmi v2.0 assembly is 51.46% repetitive with a large fraction of unclassified repeats, short interspersed elements, long interspersed elements and long terminal repeats. We identified 38,827 putative R. microplus gene loci, of which 24,758 were protein coding genes (≥100 amino acids). OrthoMCL comparative analysis against 11 selected species including insects and vertebrates identified 10,835 and 3,423 protein coding gene loci that are unique to R. microplus or common to both R. microplus and Ixodes scapularis ticks, respectively. We identified 191 microRNA loci, of which 168 have similarity to known miRNAs and 23 represent novel miRNA families. We identified the genomic loci of several highly divergent R. microplus esterases with sequence similarity to acetylcholinesterase. Additionally we report the finding of a novel cytochrome P450 CYP41 homolog that shows similar protein folding structures to known CYP41 proteins known to be involved in acaricide resistance. Copyright © 2017 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.

  14. DNA damage response and spindle assembly checkpoint function throughout the cell cycle to ensure genomic integrity.

    Directory of Open Access Journals (Sweden)

    Katherine S Lawrence

    2015-04-01

    Full Text Available Errors in replication or segregation lead to DNA damage, mutations, and aneuploidies. Consequently, cells monitor these events and delay progression through the cell cycle so repair precedes division. The DNA damage response (DDR, which monitors DNA integrity, and the spindle assembly checkpoint (SAC, which responds to defects in spindle attachment/tension during metaphase of mitosis and meiosis, are critical for preventing genome instability. Here we show that the DDR and SAC function together throughout the cell cycle to ensure genome integrity in C. elegans germ cells. Metaphase defects result in enrichment of SAC and DDR components to chromatin, and both SAC and DDR are required for metaphase delays. During persistent metaphase arrest following establishment of bi-oriented chromosomes, stability of the metaphase plate is compromised in the absence of DDR kinases ATR or CHK1 or SAC components, MAD1/MAD2, suggesting SAC functions in metaphase beyond its interactions with APC activator CDC20. In response to DNA damage, MAD2 and the histone variant CENPA become enriched at the nuclear periphery in a DDR-dependent manner. Further, depletion of either MAD1 or CENPA results in loss of peripherally associated damaged DNA. In contrast to a SAC-insensitive CDC20 mutant, germ cells deficient for SAC or CENPA cannot efficiently repair DNA damage, suggesting that SAC mediates DNA repair through CENPA interactions with the nuclear periphery. We also show that replication perturbations result in relocalization of MAD1/MAD2 in human cells, suggesting that the role of SAC in DNA repair is conserved.

  15. Packaging signals in two single-stranded RNA viruses imply a conserved assembly mechanism and geometry of the packaged genome.

    Science.gov (United States)

    Dykeman, Eric C; Stockley, Peter G; Twarock, Reidun

    2013-09-09

    The current paradigm for assembly of single-stranded RNA viruses is based on a mechanism involving non-sequence-specific packaging of genomic RNA driven by electrostatic interactions. Recent experiments, however, provide compelling evidence for sequence specificity in this process both in vitro and in vivo. The existence of multiple RNA packaging signals (PSs) within viral genomes has been proposed, which facilitates assembly by binding coat proteins in such a way that they promote the protein-protein contacts needed to build the capsid. The binding energy from these interactions enables the confinement or compaction of the genomic RNAs. Identifying the nature of such PSs is crucial for a full understanding of assembly, which is an as yet untapped potential drug target for this important class of pathogens. Here, for two related bacterial viruses, we determine the sequences and locations of their PSs using Hamiltonian paths, a concept from graph theory, in combination with bioinformatics and structural studies. Their PSs have a common secondary structure motif but distinct consensus sequences and positions within the respective genomes. Despite these differences, the distributions of PSs in both viruses imply defined conformations for the packaged RNA genomes in contact with the protein shell in the capsid, consistent with a recent asymmetric structure determination of the MS2 virion. The PS distributions identified moreover imply a preferred, evolutionarily conserved assembly pathway with respect to the RNA sequence with potentially profound implications for other single-stranded RNA viruses known to have RNA PSs, including many animal and human pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Next-generation transcriptome assembly

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Jeffrey A.; Wang, Zhong

    2011-09-01

    Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalog of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies-along with some perspectives on transcriptome assembly in the near future.

  17. Partial Defect Verification of Spent Fuel Assemblies by PDET: Principle and Field Testing in Interim Spent Fuel Storage Facility (CLAB) in Sweden

    Energy Technology Data Exchange (ETDEWEB)

    Ham, Y.S.; Kerr, P.; Sitaraman, S.; Swan, R. [Global Security Directorate, Lawrence Livermore National Laboratory, Livermore, CA 94550 (United States); Rossa, R. [SCK-CEN, Mol (Belgium); Liljenfeldt, H. [SKB in Oskarshamn (Sweden)

    2015-07-01

    The need for the development of a credible method and instrument for partial defect verification of spent fuel has been emphasized over a few decades in the safeguards communities as the diverted spent fuel pins can be the source of nuclear terrorism or devices. The need is increasingly more important and even urgent as many countries have started to transfer spent fuel to so called 'difficult-to-access' areas such as dry storage casks, reprocessing or geological repositories. Partial defect verification is required by IAEA before spent fuel is placed into 'difficult-to-access' areas. Earlier, Lawrence Livermore National Laboratory (LLNL) has reported the successful development of a new, credible partial defect verification method for pressurized water reactor (PWR) spent fuel assemblies without use of operator data, and further reported the validation experiments using commercial spent fuel assemblies with some missing fuel pins. The method was found to be robust as the method is relatively invariant to the characteristic variations of spent fuel assemblies such as initial fuel enrichment, cooling time, and burn-up. Since then, the PDET system has been designed and prototyped for 17x17 PWR spent fuel assemblies, complete with data acquisition software and acquisition electronics. In this paper, a summary description of the PDET development followed by results of the first successful field testing using the integrated PDET system and actual spent fuel assemblies performed in a commercial spent fuel storage site, known as Central Interim Spent fuel Storage Facility (CLAB) in Sweden will be presented. In addition to partial defect detection initial studies have determined that the tool can be used to verify the operator declared average burnup of the assembly as well as intra-assembly burnup levels. (authors)

  18. Selecting Superior De Novo Transcriptome Assemblies: Lessons Learned by Leveraging the Best Plant Genome.

    Directory of Open Access Journals (Sweden)

    Loren A Honaas

    Full Text Available Whereas de novo assemblies of RNA-Seq data are being published for a growing number of species across the tree of life, there are currently no broadly accepted methods for evaluating such assemblies. Here we present a detailed comparison of 99 transcriptome assemblies, generated with 6 de novo assemblers including CLC, Trinity, SOAP, Oases, ABySS and NextGENe. Controlled analyses of de novo assemblies for Arabidopsis thaliana and Oryza sativa transcriptomes provide new insights into the strengths and limitations of transcriptome assembly strategies. We find that the leading assemblers generate reassuringly accurate assemblies for the majority of transcripts. At the same time, we find a propensity for assemblers to fail to fully assemble highly expressed genes. Surprisingly, the instance of true chimeric assemblies is very low for all assemblers. Normalized libraries are reduced in highly abundant transcripts, but they also lack 1000s of low abundance transcripts. We conclude that the quality of de novo transcriptome assemblies is best assessed through consideration of a combination of metrics: 1 proportion of reads mapping to an assembly 2 recovery of conserved, widely expressed genes, 3 N50 length statistics, and 4 the total number of unigenes. We provide benchmark Illumina transcriptome data and introduce SCERNA, a broadly applicable modular protocol for de novo assembly improvement. Finally, our de novo assembly of the Arabidopsis leaf transcriptome revealed ~20 putative Arabidopsis genes lacking in the current annotation.

  19. Genomic comparison of the endophyte Herbaspirillum seropedicae SmR1 and the phytopathogen Herbaspirillum rubrisubalbicans M1 by suppressive subtractive hybridization and partial genome sequencing.

    Science.gov (United States)

    Monteiro, Rose A; Balsanelli, Eduardo; Tuleski, Thalita; Faoro, Helison; Cruz, Leonardo M; Wassem, Roseli; de Baura, Valter A; Tadra-Sfeir, Michelle Z; Weiss, Vinícius; DaRocha, Wanderson D; Muller-Santos, Marcelo; Chubatsu, Leda S; Huergo, Luciano F; Pedrosa, Fábio O; de Souza, Emanuel M

    2012-05-01

    Herbaspirillum rubrisubalbicans M1 causes the mottled stripe disease in sugarcane cv. B-4362. Inoculation of this cultivar with Herbaspirillum seropedicae SmR1 does not produce disease symptoms. A comparison of the genomic sequences of these closely related species may permit a better understanding of contrasting phenotype such as endophytic association and pathogenic life style. To achieve this goal, we constructed suppressive subtractive hybridization (SSH) libraries to identify DNA fragments present in one species and absent in the other. In a parallel approach, partial genomic sequence from H. rubrisubalbicans M1 was directly compared in silico with the H. seropedicae SmR1 genome. The genomic differences between the two organisms revealed by SSH suggested that lipopolysaccharide and adhesins are potential molecular factors involved in the different phenotypic behavior. The cluster wss probably involved in cellulose biosynthesis was found in H. rubrisubalbicans M1. Expression of this gene cluster was increased in H. rubrisubalbicans M1 cells attached to the surface of maize root, and knockout of wssD gene led to decrease in maize root surface attachment and endophytic colonization. The production of cellulose could be responsible for the maize attachment pattern of H. rubrisubalbicans M1 that is capable of outcompeting H. seropedicae SmR1. © 2012 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  20. Partial ion yield and NEXAFS of 2-(perfluorooctyl)ethanethiol self-assembled monolayer: Comparison with PTFE results

    CERN Document Server

    Setoyama, H; Murase, T; Imamura, M; Mase, K; Okudaira, K K; Hara, M; Ueno, N

    2003-01-01

    Partial-ion-yield (PIY) spectra using ion time-of-flight (TOF) method and near-edge absorption fine structure (NEXAFS) spectra were measured for 2-(perfluorooctyl)ethanethiol [CF sub 3 (CF sub 2) sub 7 (CH sub 2) sub 2 SH] self-assembled monolayer (F8-SAM) on Au(1 1 1) near carbon K-edge. The PIY spectra of the F8-SAM at the magic angle, where -CF sub 3 groups exist at the surface were compared with those of the rubbed polytetrafluoroethylene (PTFE) thin film. The F sup + intensity from the F8-SAM at the photon energy of the sharp peak of the NEXAFS, which originates from the excitation of C1s electron to sigma sup * (C-F) states at -CF sub 2 - chain, was extremely smaller than that from the rubbed PTFE film. This result clearly indicates that the ions observed by PIY do not originate from the film inside but from the surface. This was confirmed by changes in ion-TOF mass spectra during soft X-ray induced etching of the F8-SAM. The NEXAFS peaks of the F8-SAM were also assigned by considering PIY results.

  1. Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data.

    Science.gov (United States)

    Jayakumar, Vasanthan; Sakakibara, Yasubumi

    2017-11-03

    Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms. © The Author 2017. Published by Oxford University Press.

  2. A Chromosome-Scale Assembly of the Bactrocera cucurbitae Genome Provides Insight to the Genetic Basis of white pupae

    Directory of Open Access Journals (Sweden)

    Sheina B. Sim

    2017-06-01

    Full Text Available Genetic sexing strains (GSS used in sterile insect technique (SIT programs are textbook examples of how classical Mendelian genetics can be directly implemented in the management of agricultural insect pests. Although the foundation of traditionally developed GSS are single locus, autosomal recessive traits, their genetic basis are largely unknown. With the advent of modern genomic techniques, the genetic basis of sexing traits in GSS can now be further investigated. This study is the first of its kind to integrate traditional genetic techniques with emerging genomics to characterize a GSS using the tephritid fruit fly pest Bactrocera cucurbitae as a model. These techniques include whole-genome sequencing, the development of a mapping population and linkage map, and quantitative trait analysis. The experiment designed to map the genetic sexing trait in B. cucurbitae, white pupae (wp, also enabled the generation of a chromosome-scale genome assembly by integrating the linkage map with the assembly. Quantitative trait loci analysis revealed SNP loci near position 42 MB on chromosome 3 to be tightly linked to wp. Gene annotation and synteny analysis show a near perfect relationship between chromosomes in B. cucurbitae and Muller elements A–E in Drosophila melanogaster. This chromosome-scale genome assembly is complete, has high contiguity, was generated using a minimal input DNA, and will be used to further characterize the genetic mechanisms underlying wp. Knowledge of the genetic basis of genetic sexing traits can be used to improve SIT in this species and expand it to other economically important Diptera.

  3. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

    Science.gov (United States)

    Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael

    2010-04-08

    Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for

  4. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

    Directory of Open Access Journals (Sweden)

    Minou Nowrousian

    2010-04-01

    Full Text Available Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data

  5. Assembly of the Lactuca sativa, L. cv. Tizian draft genome sequence reveals differences within major resistance complex 1 as compared to the cv. Salinas reference genome.

    Science.gov (United States)

    Verwaaijen, Bart; Wibberg, Daniel; Nelkner, Johanna; Gordin, Miriam; Rupp, Oliver; Winkler, Anika; Bremges, Andreas; Blom, Jochen; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

    2018-02-10

    Lettuce (Lactuca sativa, L.) is an important annual plant of the family Asteraceae (Compositae). The commercial lettuce cultivar Tizian has been used in various scientific studies investigating the interaction of the plant with phytopathogens or biological control agents. Here, we present the de novo draft genome sequencing and gene prediction for this specific cultivar derived from transcriptome sequence data. The assembled scaffolds amount to a size of 2.22 Gb. Based on RNAseq data, 31,112 transcript isoforms were identified. Functional predictions for these transcripts were determined within the GenDBE annotation platform. Comparison with the cv. Salinas reference genome revealed a high degree of sequence similarity on genome and transcriptome levels, with an average amino acid identity of 99%. Furthermore, it was observed that two large regions are either missing or are highly divergent within the cv. Tizian genome compared to cv. Salinas. One of these regions covers the major resistance complex 1 region of cv. Salinas. The cv. Tizian draft genome sequence provides a valuable resource for future functional and transcriptome analyses focused on this lettuce cultivar. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Metagenome Assembly at the DOE JGI (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Chain, Patrick

    2011-10-13

    Patrick Chain of DOE JGI at LANL, Co-Chair of the Metagenome-specific Assembly session, on Metagenome Assembly at the DOE JGIat the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  7. Radiation hybrid maps of the D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes.

    Science.gov (United States)

    Kumar, Ajay; Seetan, Raed; Mergoum, Mohamed; Tiwari, Vijay K; Iqbal, Muhammad J; Wang, Yi; Al-Azzam, Omar; Šimková, Hana; Luo, Ming-Cheng; Dvorak, Jan; Gu, Yong Q; Denton, Anne; Kilian, Andrzej; Lazo, Gerard R; Kianian, Shahryar F

    2015-10-16

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high resolution genome maps with saturated marker scaffolds to anchor and orient BAC contigs/ sequence scaffolds for whole genome assembly. Radiation hybrid (RH) mapping has proven to be an excellent tool for the development of such maps for it offers much higher and more uniform marker resolution across the length of the chromosome compared to genetic mapping and does not require marker polymorphism per se, as it is based on presence (retention) vs. absence (deletion) marker assay. In this study, a 178 line RH panel was genotyped with SSRs and DArT markers to develop the first high resolution RH maps of the entire D-genome of Ae. tauschii accession AL8/78. To confirm map order accuracy, the AL8/78-RH maps were compared with:1) a DArT consensus genetic map constructed using more than 100 bi-parental populations, 2) a RH map of the D-genome of reference hexaploid wheat 'Chinese Spring', and 3) two SNP-based genetic maps, one with anchored D-genome BAC contigs and another with anchored D-genome sequence scaffolds. Using marker sequences, the RH maps were also anchored with a BAC contig based physical map and draft sequence of the D-genome of Ae. tauschii. A total of 609 markers were mapped to 503 unique positions on the seven D-genome chromosomes, with a total map length of 14,706.7 cR. The average distance between any two marker loci was 29.2 cR which corresponds to 2.1 cM or 9.8 Mb. The average mapping resolution across the D-genome was estimated to be 0.34 Mb (Mb/cR) or 0.07 cM (cM/cR). The RH maps showed almost perfect agreement with several published maps with regard to chromosome assignments of markers. The mean rank correlations between the position of markers on AL8/78 maps and the four published maps, ranged from 0.75 to 0.92, suggesting a good agreement in marker order. With 609 mapped markers, a total of 2481 deletions for the whole D-genome were detected with an average

  8. Construction of carrier state viruses with partial genomes of the segmented dsRNA bacteriophages

    International Nuclear Information System (INIS)

    Sun Yang; Qiao Xueying; Mindich, Leonard

    2004-01-01

    The cystoviridae are bacteriophages with genomes of three segments of dsRNA enclosed within a polyhedral capsid. Two members of this family, PHI6 and PHI8, have been shown to form carrier states in which the virus replicates as a stable episome in the host bacterium while expressing reporter genes such as kanamycin resistance or lacα. The carrier state does not require the activity of all the genes necessary for phage production. It is possible to generate carrier states by infecting cells with virus or by electroporating nonreplicating plasmids containing cDNA copies of the viral genomes into the host cells. We have found that carrier states in both PHI6 and PHI8 can be formed at high frequency with all three genomic segments or with only the large and small segments. The large genomic segment codes for the proteins that constitute the inner core of the virus, which is the structure responsible for the packaging and replication of the genome. In PHI6, a carrier state can be formed with the large and middle segment if mutations occur in the gene for the major structural protein of the inner core. In PHI8, carrier state formation requires the activity of genes 8 and 12 of segment S

  9. Draft sequencing and assembly of the genome of the world's largest fish, the whale shark: Rhincodon typus Smith 1828.

    Science.gov (United States)

    Read, Timothy D; Petit, Robert A; Joseph, Sandeep J; Alam, Md Tauqeer; Weil, M Ryan; Ahmad, Maida; Bhimani, Ravila; Vuong, Jocelyn S; Haase, Chad P; Webb, D Harry; Tan, Milton; Dove, Alistair D M

    2017-07-14

    The whale shark (Rhincodon typus) has by far the largest body size of any elasmobranch (shark or ray) species. Therefore, it is also the largest extant species of the paraphyletic assemblage commonly referred to as fishes. As both a phenotypic extreme and a member of the group Chondrichthyes - the sister group to the remaining gnathostomes, which includes all tetrapods and therefore also humans - its genome is of substantial comparative interest. Whale sharks are also listed as an endangered species on the International Union for Conservation of Nature's Red List of threatened species and are of growing popularity as both a target of ecotourism and as a charismatic conservation ambassador for the pelagic ecosystem. A genome map for this species would aid in defining effective conservation units and understanding global population structure. We characterised the nuclear genome of the whale shark using next generation sequencing (454, Illumina) and de novo assembly and annotation methods, based on material collected from the Georgia Aquarium. The data set consisted of 878,654,233 reads, which yielded a draft assembly of 1,213,200 contigs and 997,976 scaffolds. The estimated genome size was 3.44Gb. As expected, the proteome of the whale shark was most closely related to the only other complete genome of a cartilaginous fish, the holocephalan elephant shark. The whale shark contained a novel Toll-like-receptor (TLR) protein with sequence similarity to both the TLR4 and TLR13 proteins of mammals and TLR21 of teleosts. The data are publicly available on GenBank, FigShare, and from the NCBI Short Read Archive under accession number SRP044374. This represents the first shotgun elasmobranch genome and will aid studies of molecular systematics, biogeography, genetic differentiation, and conservation genetics in this and other shark species, as well as providing comparative data for studies of evolutionary biology and immunology across the jawed vertebrate lineages.

  10. Phyllanthus emblica Fruit Extract Activates Spindle Assembly Checkpoint, Prevents Mitotic Aberrations and Genomic Instability in Human Colon Epithelial NCM460 Cells

    Directory of Open Access Journals (Sweden)

    Xihan Guo

    2016-09-01

    Full Text Available The fruit of Phyllanthus emblica Linn. (PE has been widely consumed as a functional food and folk medicine in Southeast Asia due to its remarkable nutritional and pharmacological effects. Previous research showed PE delays mitotic progress and increases genomic instability (GIN in human colorectal cancer cells. This study aimed to investigate the similar effects of PE by the biomarkers related to spindle assembly checkpoint (SAC, mitotic aberrations and GIN in human NCM460 normal colon epithelial cells. Cells were treated with PE and harvested differently according to the biomarkers observed. Frequencies of micronuclei (MN, nucleoplasmic bridge (NPB and nuclear bud (NB in cytokinesis-block micronucleus assay were used as indicators of GIN. Mitotic aberrations were assessed by the biomarkers of chromosome misalignment, multipolar division, chromosome lagging and chromatin bridge. SAC activity was determined by anaphase-to- metaphase ratio (AMR and the expression of core SAC gene budding uninhibited by benzimidazoles related 1 (BubR1. Compared with the control, PE-treated cells showed (1 decreased incidences of MN, NPB and NB (p < 0.01; (2 decreased frequencies of all mitotic aberration biomarkers (p < 0.01; and (3 decreased AMR (p < 0.01 and increased BubR1 expression (p < 0.001. The results revealed PE has the potential to protect human normal colon epithelial cells from mitotic and genomic damages partially by enhancing the function of SAC.

  11. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.

    Science.gov (United States)

    Senol Cali, Damla; Kim, Jeremie S; Ghose, Saugata; Alkan, Can; Mutlu, Onur

    2018-04-02

    Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious

  12. Draft Genome Sequences of 12 Dry-Heat-Resistant Bacillus Strains Isolated from the Cleanrooms Where the Viking Spacecraft Were Assembled.

    Science.gov (United States)

    Seuylemezian, Arman; Cooper, Kerry; Schubert, Wayne; Vaishampayan, Parag

    2018-03-22

    Spore-forming microorganisms are of concern for forward contamination because they can survive harsh interplanetary travel. Here, we report the draft genome sequences of 12 spore-forming strains isolated from the Manned Spacecraft Operations Building (MSOB) and the Vehicle Assembly Building (VAB) in Cape Canaveral, FL, where the Viking spacecraft were assembled. Copyright © 2018 Seuylemezian et al.

  13. Assembled genomic and tissue-specific transcriptomic data resources for two genetically distinct lines of Cowpea ( Vigna unguiculata (L.) Walp).

    Science.gov (United States)

    Spriggs, Andrew; Henderson, Steven T; Hand, Melanie L; Johnson, Susan D; Taylor, Jennifer M; Koltunow, Anna

    2018-02-09

    Cowpea ( Vigna unguiculata (L.) Walp) is an important legume crop for food security in areas of low-input and smallholder farming throughout Africa and Asia. Genetic improvements are required to increase yield and resilience to biotic and abiotic stress and to enhance cowpea crop performance. An integrated cowpea genomic and gene expression data resource has the potential to greatly accelerate breeding and the delivery of novel genetic traits for cowpea. Extensive genomic resources for cowpea have been absent from the public domain; however, a recent early release reference genome for IT97K-499-35 ( Vigna unguiculata  v1.0, NSF, UCR, USAID, DOE-JGI, http://phytozome.jgi.doe.gov/) has now been established in a collaboration between the Joint Genome Institute (JGI) and University California (UC) Riverside. Here we release supporting genomic and transcriptomic data for IT97K-499-35 and a second transformable cowpea variety, IT86D-1010. The transcriptome resource includes six tissue-specific datasets for each variety, with particular emphasis on reproductive tissues that extend and support the V. unguiculata v1.0 reference. Annotations have been included in our resource to allow direct mapping to the v1.0 cowpea reference. Access to this resource provided here is supported by raw and assembled data downloads.

  14. Comparison of carnivore, omnivore, and herbivore mammalian genomes with a new leopard assembly

    OpenAIRE

    Kim, Soonok; Cho, Yun Sung; Kim, Hak-Min; Chung, Oksung; Kim, Hyunho; Jho, Sungwoong; Seomun, Hong; Kim, Jeongho; Bang, Woo Young; Kim, Changmu; An, Junghwa; Bae, Chang Hwan; Bhak, Youngjune; Jeon, Sungwon; Yoon, Hyejun

    2016-01-01

    Background: There are three main dietary groups in mammals: carnivores, omnivores, and herbivores. Currently, there is limited comparative genomics insight into the evolution of dietary specializations in mammals. Due to recent advances in sequencing technologies, we were able to perform in-depth whole genome analyses of representatives of these three dietary groups. Results: We investigated the evolution of carnivory by comparing 18 representative genomes from across Mammalia with carnivorou...

  15. Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals

    DEFF Research Database (Denmark)

    Hellmann, Ines; Mang, Yuan; Gu, Zhiping

    2008-01-01

    We introduce a simple, broadly applicable method for obtaining estimates of nucleotide diversity from genomic shotgun sequencing data. The method takes into account the special nature of these data: random sampling of genomic segments from one or more individuals and a relatively high error rate...... for individual reads. Applying this method to data from the Celera human genome sequencing and SNP discovery project, we obtain estimates of nucleotide diversity in windows spanning the human genome and show that the diversity to divergence ratio is reduced in regions of low recombination. Furthermore, we show...

  16. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale

    DEFF Research Database (Denmark)

    Liu, Siyang; Huang, Shujia; Rao, Junhua

    2015-01-01

    present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome......) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We...... assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction...

  17. The de novo assembly of mitochondrial genomes of the extinct passenger pigeon (Ectopistes migratorius with next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Chih-Ming Hung

    Full Text Available The information from ancient DNA (aDNA provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome of two extinct passenger pigeons (Ectopistes migratorius using de novo assembly of massive short (90 bp, paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species.

  18. The De Novo Assembly of Mitochondrial Genomes of the Extinct Passenger Pigeon (Ectopistes migratorius) with Next Generation Sequencing

    Science.gov (United States)

    Hung, Chih-Ming; Lin, Rong-Chien; Chu, Jui-Hua; Yeh, Chia-Fen; Yao, Chiou-Ju; Li, Shou-Hsien

    2013-01-01

    The information from ancient DNA (aDNA) provides an unparalleled opportunity to infer phylogenetic relationships and population history of extinct species and to investigate genetic evolution directly. However, the degraded and fragmented nature of aDNA has posed technical challenges for studies based on conventional PCR amplification. In this study, we present an approach based on next generation sequencing to efficiently sequence the complete mitochondrial genome (mitogenome) of two extinct passenger pigeons (Ectopistes migratorius) using de novo assembly of massive short (90 bp), paired-end or single-end reads. Although varying levels of human contamination and low levels of postmortem nucleotide lesion were observed, they did not impact sequencing accuracy. Our results demonstrated that the de novo assembly of shotgun sequence reads could be a potent approach to sequence mitogenomes, and offered an efficient way to infer evolutionary history of extinct species. PMID:23437111

  19. Toward allotetraploid cotton genome assembly: integration of a high-density molecular genetic linkage map with DNA sequence information

    Science.gov (United States)

    2012-01-01

    Background Cotton is the world’s most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. Results In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. Conclusion This study will serve as a valuable genomic resource

  20. LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes

    Directory of Open Access Journals (Sweden)

    Feuillet Catherine

    2010-11-01

    Full Text Available Abstract Background Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC software, which often results in short contig lengths (of 3-5 clones before merging as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs. Results To address these problems, we propose a novel approach that: (i reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize. Conclusions The results show that compared to other methods, LTC enables the construction of highly

  1. Survey of endosymbionts in the Diaphorina citri metagenome and assembly of a Wolbachia wDi draft genome.

    Directory of Open Access Journals (Sweden)

    Surya Saha

    Full Text Available Diaphorina citri (Hemiptera: Psyllidae, the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China.

  2. Survey of endosymbionts in the Diaphorina citri metagenome and assembly of a Wolbachia wDi draft genome.

    Science.gov (United States)

    Saha, Surya; Hunter, Wayne B; Reese, Justin; Morgan, J Kent; Marutani-Hert, Mizuri; Huang, Hong; Lindeberg, Magdalen

    2012-01-01

    Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China.

  3. Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data.

    Directory of Open Access Journals (Sweden)

    Tsutomu Ikegami

    Full Text Available A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data.

  4. Genetic diversity and population structure inferred from the partially duplicated genome of domesticated carp, Cyprinus carpio L.

    Directory of Open Access Journals (Sweden)

    Feldman Marcus W

    2007-04-01

    Full Text Available Abstract Genetic relationships among eight populations of domesticated carp (Cyprinus carpio L., a species with a partially duplicated genome, were studied using 12 microsatellites and 505 AFLP bands. The populations included three aquacultured carp strains and five ornamental carp (koi variants. Grass carp (Ctenopharyngodon idella was used as an outgroup. AFLP-based gene diversity varied from 5% (grass carp to 32% (koi and reflected the reasonably well understood histories and breeding practices of the populations. A large fraction of the molecular variance was due to differences between aquacultured and ornamental carps. Further analyses based on microsatellite data, including cluster analysis and neighbor-joining trees, supported the genetic distinctiveness of aquacultured and ornamental carps, despite the recent divergence of the two groups. In contrast to what was observed for AFLP-based diversity, the frequency of heterozygotes based on microsatellites was comparable among all populations. This discrepancy can potentially be explained by duplication of some loci in Cyprinus carpio L., and a model that shows how duplication can increase heterozygosity estimates for microsatellites but not for AFLP loci is discussed. Our analyses in carp can help in understanding the consequences of genotyping duplicated loci and in interpreting discrepancies between dominant and co-dominant markers in species with recent genome duplication.

  5. Updated genome assembly and annotation of Paenibacillus larvae, the agent of American foulbrood disease of honey bees

    Directory of Open Access Journals (Sweden)

    de Graaf Dirk C

    2011-09-01

    Full Text Available Abstract Background As scientists continue to pursue various 'omics-based research, there is a need for high quality data for the most fundamental 'omics of all: genomics. The bacterium Paenibacillus larvae is the causative agent of the honey bee disease American foulbrood. If untreated, it can lead to the demise of an entire hive; the highly social nature of bees also leads to easy disease spread, between both individuals and colonies. Biologists have studied this organism since the early 1900s, and a century later, the molecular mechanism of infection remains elusive. Transcriptomics and proteomics, because of their ability to analyze multiple genes and proteins in a high-throughput manner, may be very helpful to its study. However, the power of these methodologies is severely limited without a complete genome; we undertake to address that deficiency here. Results We used the Illumina GAIIx platform and conventional Sanger sequencing to generate a 182-fold sequence coverage of the P. larvae genome, and assembled the data using ABySS into a total of 388 contigs spanning 4.5 Mbp. Comparative genomics analysis against fully-sequenced soil bacteria P. JDR2 and P. vortex showed that regions of poor conservation may contain putative virulence factors. We used GLIMMER to predict 3568 gene models, and named them based on homology revealed by BLAST searches; proteases, hemolytic factors, toxins, and antibiotic resistance enzymes were identified in this way. Finally, mass spectrometry was used to provide experimental evidence that at least 35% of the genes are expressed at the protein level. Conclusions This update on the genome of P. larvae and annotation represents an immense advancement from what we had previously known about this species. We provide here a reliable resource that can be used to elucidate the mechanism of infection, and by extension, more effective methods to control and cure this widespread honey bee disease.

  6. A draft de novo genome assembly for the northern bobwhite (Colinus virginianus reveals evidence for a rapid decline in effective population size beginning in the Late Pleistocene.

    Directory of Open Access Journals (Sweden)

    Yvette A Halley

    Full Text Available Wild populations of northern bobwhites (Colinus virginianus; hereafter bobwhite have declined across nearly all of their U.S. range, and despite their importance as an experimental wildlife model for ecotoxicology studies, no bobwhite draft genome assembly currently exists. Herein, we present a bobwhite draft de novo genome assembly with annotation, comparative analyses including genome-wide analyses of divergence with the chicken (Gallus gallus and zebra finch (Taeniopygia guttata genomes, and coalescent modeling to reconstruct the demographic history of the bobwhite for comparison to other birds currently in decline (i.e., scarlet macaw; Ara macao. More than 90% of the assembled bobwhite genome was captured within 14,000 unique genes and proteins. Bobwhite analyses of divergence with the chicken and zebra finch genomes revealed many extremely conserved gene sequences, and evidence for lineage-specific divergence of noncoding regions. Coalescent models for reconstructing the demographic history of the bobwhite and the scarlet macaw provided evidence for population bottlenecks which were temporally coincident with human colonization of the New World, the late Pleistocene collapse of the megafauna, and the last glacial maximum. Demographic trends predicted for the bobwhite and the scarlet macaw also were concordant with how opposing natural selection strategies (i.e., skewness in the r-/K-selection continuum would be expected to shape genome diversity and the effective population sizes in these species, which is directly relevant to future conservation efforts.

  7. A New Approach to Predict Microbial Community Assembly and Function Using a Stochastic, Genome-Enabled Modeling Framework

    Science.gov (United States)

    King, E.; Brodie, E.; Anantharaman, K.; Karaoz, U.; Bouskill, N.; Banfield, J. F.; Steefel, C. I.; Molins, S.

    2016-12-01

    Characterizing and predicting the microbial and chemical compositions of subsurface aquatic systems necessitates an understanding of the metabolism and physiology of organisms that are often uncultured or studied under conditions not relevant for one's environment of interest. Cultivation-independent approaches are therefore important and have greatly enhanced our ability to characterize functional microbial diversity. The capability to reconstruct genomes representing thousands of populations from microbial communities using metagenomic techniques provides a foundation for development of predictive models for community structure and function. Here, we discuss a genome-informed stochastic trait-based model incorporated into a reactive transport framework to represent the activities of coupled guilds of hypothetical microorganisms. Metabolic pathways for each microbe within a functional guild are parameterized from metagenomic data with a unique combination of traits governing organism fitness under dynamic environmental conditions. We simulate the thermodynamics of coupled electron donor and acceptor reactions to predict the energy available for cellular maintenance, respiration, biomass development, and enzyme production. While `omics analyses can now characterize the metabolic potential of microbial communities, it is functionally redundant as well as computationally prohibitive to explicitly include the thousands of recovered organisms into biogeochemical models. However, one can derive potential metabolic pathways from genomes along with trait-linkages to build probability distributions of traits. These distributions are used to assemble groups of microbes that couple one or more of these pathways. From the initial ensemble of microbes, only a subset will persist based on the interaction of their physiological and metabolic traits with environmental conditions, competing organisms, etc. Here, we analyze the predicted niches of these hypothetical microbes and

  8. Assembled Plastid and Mitochondrial Genomes, as well as Nuclear Genes, Place the Parasite Family Cynomoriaceae in the Saxifragales.

    Science.gov (United States)

    Bellot, Sidonie; Cusimano, Natalie; Luo, Shixiao; Sun, Guiling; Zarre, Shahin; Gröger, Andreas; Temsch, Eva; Renner, Susanne S

    2016-08-03

    Cynomoriaceae, one of the last unplaced families of flowering plants, comprise one or two species or subspecies of root parasites that occur from the Mediterranean to the Gobi Desert. Using Illumina sequencing, we assembled the mitochondrial and plastid genomes as well as some nuclear genes of a Cynomorium specimen from Italy. Selected genes were also obtained by Sanger sequencing from individuals collected in China and Iran, resulting in matrices of 33 mitochondrial, 6 nuclear, and 14 plastid genes and rDNAs enlarged to include a representative angiosperm taxon sampling based on data available in GenBank. We also compiled a new geographic map to discern possible discontinuities in the parasites' occurrence. Cynomorium has large genomes of 13.70-13.61 (Italy) to 13.95-13.76 pg (China). Its mitochondrial genome consists of up to 49 circular subgenomes and has an overall gene content similar to that of photosynthetic angiosperms, while its plastome retains only 27 of the normally 116 genes. Nuclear, plastid and mitochondrial phylogenies place Cynomoriaceae in Saxifragales, and we found evidence for several horizontal gene transfers from different hosts, as well as intracellular gene transfers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  9. Calcium-Release Channels in Paramecium. Genomic Expansion, Differential Positioning and Partial Transcriptional Elimination

    Science.gov (United States)

    Ladenburger, Eva-Maria; Plattner, Helmut

    2011-01-01

    The release of Ca2+ from internal stores is a major source of signal Ca2+ in almost all cell types. The internal Ca2+ pools are activated via two main families of intracellular Ca2+-release channels, the ryanodine and the inositol 1,4,5-trisphosphate (InsP3) receptors. Among multicellular organisms these channel types are ubiquitous, whereas in most unicellular eukaryotes the identification of orthologs is impaired probably due to evolutionary sequence divergence. However, the ciliated protozoan Paramecium allowed us to prognosticate six groups, with a total of 34 genes, encoding proteins with characteristics typical of InsP3 and ryanodine receptors by BLAST search of the Paramecium database. We here report that these Ca2+-release channels may display all or only some of the characteristics of canonical InsP3 and ryanodine receptors. In all cases, prediction methods indicate the presence of six trans-membrane regions in the C-terminal domains, thus corresponding to canonical InsP3 receptors, while a sequence homologous to the InsP3-binding domain is present only in some types. Only two types have been analyzed in detail previously. We now show, by using antibodies and eventually by green fluorescent protein labeling, that the members of all six groups localize to distinct organelles known to participate in vesicle trafficking and, thus, may provide Ca2+ for local membrane-membrane interactions. Whole genome duplication can explain radiation within the six groups. Comparative and evolutionary evaluation suggests derivation from a common ancestor of canonical InsP3 and ryanodine receptors. With one group we could ascertain, to our knowledge for the first time, aberrant splicing in one thoroughly analyzed Paramecium gene. This yields truncated forms and, thus, may indicate a way to pseudogene formation. No comparable analysis is available for any other, free-living or parasitic/pathogenic protozoan. PMID:22102876

  10. Calcium-release channels in paramecium. Genomic expansion, differential positioning and partial transcriptional elimination.

    Directory of Open Access Journals (Sweden)

    Eva-Maria Ladenburger

    Full Text Available The release of Ca²⁺ from internal stores is a major source of signal Ca²⁺ in almost all cell types. The internal Ca²⁺ pools are activated via two main families of intracellular Ca²⁺-release channels, the ryanodine and the inositol 1,4,5-trisphosphate (InsP₃ receptors. Among multicellular organisms these channel types are ubiquitous, whereas in most unicellular eukaryotes the identification of orthologs is impaired probably due to evolutionary sequence divergence. However, the ciliated protozoan Paramecium allowed us to prognosticate six groups, with a total of 34 genes, encoding proteins with characteristics typical of InsP₃ and ryanodine receptors by BLAST search of the Paramecium database. We here report that these Ca²⁺-release channels may display all or only some of the characteristics of canonical InsP₃ and ryanodine receptors. In all cases, prediction methods indicate the presence of six trans-membrane regions in the C-terminal domains, thus corresponding to canonical InsP₃ receptors, while a sequence homologous to the InsP₃-binding domain is present only in some types. Only two types have been analyzed in detail previously. We now show, by using antibodies and eventually by green fluorescent protein labeling, that the members of all six groups localize to distinct organelles known to participate in vesicle trafficking and, thus, may provide Ca²⁺ for local membrane-membrane interactions. Whole genome duplication can explain radiation within the six groups. Comparative and evolutionary evaluation suggests derivation from a common ancestor of canonical InsP₃ and ryanodine receptors. With one group we could ascertain, to our knowledge for the first time, aberrant splicing in one thoroughly analyzed Paramecium gene. This yields truncated forms and, thus, may indicate a way to pseudogene formation. No comparable analysis is available for any other, free-living or parasitic/pathogenic protozoan.

  11. Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome.

    Science.gov (United States)

    Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu

    2012-01-01

    Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to

  12. Towards long-read metagenomics: complete assembly of three novel genomes from bacteria dependent on a diazotrophic cyanobacterium in a freshwater lake co-culture.

    Science.gov (United States)

    Driscoll, Connor B; Otten, Timothy G; Brown, Nathan M; Dreher, Theo W

    2017-01-01

    Here we report three complete bacterial genome assemblies from a PacBio shotgun metagenome of a co-culture from Upper Klamath Lake, OR. Genome annotations and culture conditions indicate these bacteria are dependent on carbon and nitrogen fixation from the cyanobacterium Aphanizomenon flos-aquae, whose genome was assembled to draft-quality . Due to their taxonomic novelty relative to previously sequenced bacteria, we have temporarily designated these bacteria as incertae sedis Hyphomonadaceae strain UKL13-1 (3,501,508 bp and 56.12% GC), incertae sedis Betaproteobacterium strain UKL13-2 (3,387,087 bp and 54.98% GC), and incertae sedis Bacteroidetes strain UKL13-3 (3,236,529 bp and 37.33% GC). Each genome consists of a single circular chromosome with no identified plasmids. When compared with binned Illumina assemblies of the same three genomes, there was ~7% discrepancy in total genome length. Gaps where Illumina assemblies broke were often due to repetitive elements. Within these missing sequences were essential genes and genes associated with a variety of functional categories. Annotated gene content reveals that both Proteobacteria are aerobic anoxygenic phototrophs, with Betaproteobacterium UKL13-2 potentially capable of phototrophic oxidation of sulfur compounds. Both proteobacterial genomes contain transporters suggesting they are scavenging fixed nitrogen from A. flos-aquae in the form of ammonium. Bacteroidetes UKL13-3 has few completely annotated biosynthetic pathways, and has a comparatively higher proportion of unannotated genes. The genomes were detected in only a few other freshwater metagenomes, suggesting that these bacteria are not ubiquitous in freshwater systems. Our results indicate that long-read sequencing is a viable method for sequencing dominant members from low-diversity microbial communities, and should be considered for environmental metagenomics when conditions meet these requirements.

  13. Deciphering heterogeneity in pig genome assembly Sscrofa9 by isochore and isochore-like region analyses.

    Directory of Open Access Journals (Sweden)

    Wenqian Zhang

    Full Text Available BACKGROUND: The isochore, a large DNA sequence with relatively small GC variance, is one of the most important structures in eukaryotic genomes. Although the isochore has been widely studied in humans and other species, little is known about its distribution in pigs. PRINCIPAL FINDINGS: In this paper, we construct a map of long homogeneous genome regions (LHGRs, i.e., isochores and isochore-like regions, in pigs to provide an intuitive version of GC heterogeneity in each chromosome. The LHGR pattern study not only quantifies heterogeneities, but also reveals some primary characteristics of the chromatin organization, including the followings: (1 the majority of LHGRs belong to GC-poor families and are in long length; (2 a high gene density tends to occur with the appearance of GC-rich LHGRs; and (3 the density of LINE repeats decreases with an increase in the GC content of LHGRs. Furthermore, a portion of LHGRs with particular GC ranges (50%-51% and 54%-55% tend to have abnormally high gene densities, suggesting that biased gene conversion (BGC, as well as time- and energy-saving principles, could be of importance to the formation of genome organization. CONCLUSION: This study significantly improves our knowledge of chromatin organization in the pig genome. Correlations between the different biological features (e.g., gene density and repeat density and GC content of LHGRs provide a unique glimpse of in silico gene and repeats prediction.

  14. A common genomic framework for a diverse assembly of plasmids in the symbiotic nitrogen fixing bacteria.

    Directory of Open Access Journals (Sweden)

    Lisa C Crossman

    2008-07-01

    Full Text Available This work centres on the genomic comparisons of two closely-related nitrogen-fixing symbiotic bacteria, Rhizobium leguminosarum biovar viciae 3841 and Rhizobium etli CFN42. These strains maintain a stable genomic core that is also common to other rhizobia species plus a very variable and significant accessory component. The chromosomes are highly syntenic, whereas plasmids are related by fewer syntenic blocks and have mosaic structures. The pairs of plasmids p42f-pRL12, p42e-pRL11 and p42b-pRL9 as well large parts of p42c with pRL10 are shown to be similar, whereas the symbiotic plasmids (p42d and pRL10 are structurally unrelated and seem to follow distinct evolutionary paths. Even though purifying selection is acting on the whole genome, the accessory component is evolving more rapidly. This component is constituted largely for proteins for transport of diverse metabolites and elements of external origin. The present analysis allows us to conclude that a heterogeneous and quickly diversifying group of plasmids co-exists in a common genomic framework.

  15. Assembly of the Genome of the Disease Vector Aedes aegypti onto a Genetic Linkage Map Allows Mapping of Genes Affecting Disease Transmission

    KAUST Repository

    Juneja, Punita

    2014-01-30

    The mosquito Aedes aegypti transmits some of the most important human arboviruses, including dengue, yellow fever and chikungunya viruses. It has a large genome containing many repetitive sequences, which has resulted in the genome being poorly assembled - there are 4,758 scaffolds, few of which have been assigned to a chromosome. To allow the mapping of genes affecting disease transmission, we have improved the genome assembly by scoring a large number of SNPs in recombinant progeny from a cross between two strains of Ae. aegypti, and used these to generate a genetic map. This revealed a high rate of misassemblies in the current genome, where, for example, sequences from different chromosomes were found on the same scaffold. Once these were corrected, we were able to assign 60% of the genome sequence to chromosomes and approximately order the scaffolds along the chromosome. We found that there are very large regions of suppressed recombination around the centromeres, which can extend to as much as 47% of the chromosome. To illustrate the utility of this new genome assembly, we mapped a gene that makes Ae. aegypti resistant to the human parasite Brugia malayi, and generated a list of candidate genes that could be affecting the trait. © 2014 Juneja et al.

  16. Nucleic Acid Binding by Mason-Pfizer Monkey Virus CA Promotes Virus Assembly and Genome Packaging

    Czech Academy of Sciences Publication Activity Database

    Füzik, T.; Píchalová, R.; Schur, F. K. M.; Strohalmová, Karolína; Křížová, Ivana; Hadravová, Romana; Rumlová, Michaela; Briggs, J. A. G.; Ulbrich, P.; Ruml, T.

    2016-01-01

    Roč. 90, č. 9 (2016), s. 4593-4603 ISSN 0022-538X R&D Projects: GA ČR(CZ) GA14-15326S; GA MŠk LO1302; GA MŠk(CZ) LO1304 Institutional support: RVO:61388963 Keywords : M-PMV * virus assembly * capsid protein Subject RIV: EE - Microbiology, Virology Impact factor: 4.663, year: 2016

  17. Quantitative RNA-Seq analysis in non-model species: assessing transcriptome assemblies as a scaffold and the utility of evolutionary divergent genomic reference species

    Directory of Open Access Journals (Sweden)

    Hornett Emily A

    2012-08-01

    Full Text Available Abstract Background How well does RNA-Seq data perform for quantitative whole gene expression analysis in the absence of a genome? This is one unanswered question facing the rapidly growing number of researchers studying non-model species. Using Homo sapiens data and resources, we compared the direct mapping of sequencing reads to predicted genes from the genome with mapping to de novo transcriptomes assembled from RNA-Seq data. Gene coverage and expression analysis was further investigated in the non-model context by using increasingly divergent genomic reference species to group assembled contigs by unique genes. Results Eight transcriptome sets, composed of varying amounts of Illumina and 454 data, were assembled and assessed. Hybrid 454/Illumina assemblies had the highest transcriptome and individual gene coverage. Quantitative whole gene expression levels were highly similar between using a de novo hybrid assembly and the predicted genes as a scaffold, although mapping to the de novo transcriptome assembly provided data on fewer genes. Using non-target species as reference scaffolds does result in some loss of sequence and expression data, and bias and error increase with evolutionary distance. However, within a 100 million year window these effect sizes are relatively small. Conclusions Predicted gene sets from sequenced genomes of related species can provide a powerful method for grouping RNA-Seq reads and annotating contigs. Gene expression results can be produced that are similar to results obtained using gene models derived from a high quality genome, though biased towards conserved genes. Our results demonstrate the power and limitations of conducting RNA-Seq in non-model species.

  18. Deciphering the assembly of multi-segment genome complexes in influenza A virus

    OpenAIRE

    Prisner, Simon

    2017-01-01

    Influenza A besitzt ein segmentiertes, achtsträngiges Genom in negativer Orientierung. Die einzelnen Segmente sind in virale Ribonukleoproteinkomplexe (vRNPs) verpackt. Genomische Segmentierung erlaubt es Influenza, zwischen verschiedenen Stämmen Reassortierung zu betreiben, was zur Entstehung von hochgradig virulenten und potentiell pandemischen neuen Stämmen führen kann. Die Existenz eines Packungsmechanismus wird vermutet, der sicherstellt dass exakt ein Segment jeden Typs in neu knospe...

  19. Phylogeography, salinity adaptations and metabolic potential of the Candidate Division KB1 Bacteria based on a partial single cell genome.

    Directory of Open Access Journals (Sweden)

    Lisa M Nigro

    2016-08-01

    Full Text Available Deep-sea hypersaline anoxic basins (DHABs and other hypersaline environments contain abundant and diverse microbial life that has adapted to these extreme conditions. The bacterial Candidate Division KB1 represents one of several uncultured groups that has been consistently observed in hypersaline microbial diversity studies. Here we report the phylogeography of KB1, its phylogenetic relationships to Candidate Division OP1 Bacteria, and its potential metabolic and osmotic stress adaptations based on a partial single cell amplified genome (SAG of KB1 from Orca Basin, the largest hypersaline seafloor brine basin in the Gulf of Mexico. Our results are consistent with the hypothesis – previously developed based on 14C incorporation experiments with mixed-species enrichments from Mediterranean seafloor brines - that KB1 has adapted its proteins to elevated intracellular salinity, but at the same time KB1 apparently imports glycine betaine; this compatible solute is potentially not limited to osmoregulation but could also serve as a carbon and energy source.

  20. Genome wide analysis of the evolution of Senecavirus A from swine clinical material and assembly yard environmental samples.

    Directory of Open Access Journals (Sweden)

    Wanhong Xu

    Full Text Available Senecavirus A (SVA, previously known as Seneca Valley virus, was first isolated in the United States in 2002. SVA was associated with porcine idiopathic vesicular disease in Canada and the USA in 2007 and 2012, respectively. Recent increase in SVA outbreaks resulting in neonatal mortality of piglets and/or vesicular lesions in sows in Brazil, the USA and Canada point to the necessity to study the pathogenicity and molecular epidemiology of the virus. Here, we report the analysis of the complete coding sequences of SVA from 2 clinical cases and 9 assembly yard environmental samples collected in 2015 in Canada, along with 22 previously released complete genomes in the GenBank. With this combined data set, the evolution of the SVA over a 12-month period in 2015/2016 was evaluated. These SVA isolates were characterized by a rapid accumulation of genetic variations driven mainly by a high nucleotide substitution rate and purifying selection. The SVA sequences clustered in clearly defined geographical areas with reported cases of SVA infection. No transmission links were identified between assembly yards, suggesting that point source introductions may have occurred. In addition, 25 fixed non-synonymous mutations were identified across all analyzed strains when compared to the prototype SVA strain (SVV-001. This study highlights the importance of monitoring SVA mutations for their role in increased virulence and impact on SVA diagnostics.

  1. Metagenome-assembled genomes of deep-branching magnetotactic bacteria in the Nitrospirae phylum

    Science.gov (United States)

    Zhang, W.; He, M.; Gu, L.; Tang, X.; Pan, Y.; Lin, W.

    2017-12-01

    Magnetotactic bacteria (MTB) are aquatic microorganisms that synthesize intracellular magnetic nanoparticles composed of magnetite and/or greigite. MTB have thus far been identified in the phyla of Proteobacteria, Nitrospirae, Omnitrophica, Latescibacteria and Planctomycetes (Lin et al., 2017b). Among these organisms, MTB belonging to the Nitrospirae phylum are of great interest because of the formation of hundreds of magnetite magnetosomes in a single cell and of the great potential for iron, sulfur, nitrogen, and carbon cycling in natural environments. However, due to the lack of genomic information, our current knowledge on magnetotactic Nitrospirae remains very limited. In the present study, we have identified and characterized two novel populations of uncultivated MTB from freshwater lakes in Shaanxi province, China. 16S rRNA gene-based analyses revealed that they belonged to two different clusters in the Nitrospirae. The draft population genomes of these two Nitrospirae MTB were successfully recovered through genome-resolved metagenomics, both of which containing nearly complete magnetosome gene clusters (MGCs) responsible for magnetosome biomineralization and organization. In consistent with our previous study (Lin et al., 2017a), we found that the gene content and gene organization of the MGCs in the Nitrospirae MTB were highly conserved, indicating that Nitrospirae gene clusters represent one of the ancestral types of MGCs. The population genome sequences suggest that magnetotactic Nitrospirae are capable of CO2 fixtion through Wood-Ljungdahl pathway. They may also reduce sulfate and nitrate/nitrite through sulfate reduction pathway and denitrification pathway, respectively. Our genomic analyses revealed the potential metabolic capability of the Nitrospirae MTB and shed light on their ecology, evolution and biomineralization mechanism. References: Lin W, Paterson GA, Zhu Q, Wang Y, Kopylova E, Li Y, Knight R, Bazylinski DA, Zhu R, Kirschvink JL, Pan Y

  2. De Novo Assembly of Candida sojae and Candida boidinii Genomes, Unexplored Xylose-Consuming Yeasts with Potential for Renewable Biochemical Production

    Science.gov (United States)

    Borelli, Guilherme; José, Juliana; Teixeira, Paulo José Pereira Lima; dos Santos, Leandro Vieira

    2016-01-01

    Candida boidinii and Candida sojae yeasts were isolated from energy cane bagasse and plague-insects. Both have fast xylose uptake rate and produce great amounts of xylitol, which are interesting features for food and 2G ethanol industries. Because they lack published genomes, we have sequenced and assembled them, offering new possibilities for gene prospection. PMID:26769937

  3. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences

    Directory of Open Access Journals (Sweden)

    Shairul Izan

    2017-08-01

    Full Text Available Whole Genome Shotgun (WGS sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb, Aegilops tauschii (4 Gb and Paphiopedilum henryanum (25 Gb. We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.

  4. The Carcinogenic Liver Fluke, Clonorchis sinensis: New Assembly, Reannotation and Analysis of the Genome and Characterization of Tissue Transcriptomes

    Science.gov (United States)

    Wang, Xiaoyun; Liu, Hailiang; Chen, Yangyi; Guo, Lei; Luo, Fang; Sun, Jiufeng; Mao, Qiang; Liang, Pei; Xie, Zhizhi; Zhou, Chenhui; Tian, Yanli; Lv, Xiaoli; Huang, Lisi; Zhou, Juanjuan; Hu, Yue; Li, Ran; Zhang, Fan; Lei, Huali; Li, Wenfang; Hu, Xuchu; Liang, Chi; Xu, Jin; Li, Xuerong; Yu, Xinbing

    2013-01-01

    Clonorchis sinensis (C. sinensis), an important food-borne parasite that inhabits the intrahepatic bile duct and causes clonorchiasis, is of interest to both the public health field and the scientific research community. To learn more about the migration, parasitism and pathogenesis of C. sinensis at the molecular level, the present study developed an upgraded genomic assembly and annotation by sequencing paired-end and mate-paired libraries. We also performed transcriptome sequence analyses on multiple C. sinensis tissues (sucker, muscle, ovary and testis). Genes encoding molecules involved in responses to stimuli and muscle-related development were abundantly expressed in the oral sucker. Compared with other species, genes encoding molecules that facilitate the recognition and transport of cholesterol were observed in high copy numbers in the genome and were highly expressed in the oral sucker. Genes encoding transporters for fatty acids, glucose, amino acids and oxygen were also highly expressed, along with other molecules involved in metabolizing these substrates. All genes involved in energy metabolism pathways, including the β-oxidation of fatty acids, the citrate cycle, oxidative phosphorylation, and fumarate reduction, were expressed in the adults. Finally, we also provide valuable insights into the mechanism underlying the process of pathogenesis by characterizing the secretome of C. sinensis. The characterization and elaborate analysis of the upgraded genome and the tissue transcriptomes not only form a detailed and fundamental C. sinensis resource but also provide novel insights into the physiology and pathogenesis of C. sinensis. We anticipate that this work will aid the development of innovative strategies for the prevention and control of clonorchiasis. PMID:23382950

  5. The carcinogenic liver fluke, Clonorchis sinensis: new assembly, reannotation and analysis of the genome and characterization of tissue transcriptomes.

    Directory of Open Access Journals (Sweden)

    Yan Huang

    Full Text Available Clonorchis sinensis (C. sinensis, an important food-borne parasite that inhabits the intrahepatic bile duct and causes clonorchiasis, is of interest to both the public health field and the scientific research community. To learn more about the migration, parasitism and pathogenesis of C. sinensis at the molecular level, the present study developed an upgraded genomic assembly and annotation by sequencing paired-end and mate-paired libraries. We also performed transcriptome sequence analyses on multiple C. sinensis tissues (sucker, muscle, ovary and testis. Genes encoding molecules involved in responses to stimuli and muscle-related development were abundantly expressed in the oral sucker. Compared with other species, genes encoding molecules that facilitate the recognition and transport of cholesterol were observed in high copy numbers in the genome and were highly expressed in the oral sucker. Genes encoding transporters for fatty acids, glucose, amino acids and oxygen were also highly expressed, along with other molecules involved in metabolizing these substrates. All genes involved in energy metabolism pathways, including the β-oxidation of fatty acids, the citrate cycle, oxidative phosphorylation, and fumarate reduction, were expressed in the adults. Finally, we also provide valuable insights into the mechanism underlying the process of pathogenesis by characterizing the secretome of C. sinensis. The characterization and elaborate analysis of the upgraded genome and the tissue transcriptomes not only form a detailed and fundamental C. sinensis resource but also provide novel insights into the physiology and pathogenesis of C. sinensis. We anticipate that this work will aid the development of innovative strategies for the prevention and control of clonorchiasis.

  6. Exact algorithms for haplotype assembly from whole-genome sequence data.

    Science.gov (United States)

    Chen, Zhi-Zhong; Deng, Fei; Wang, Lusheng

    2013-08-15

    Haplotypes play a crucial role in genetic analysis and have many applications such as gene disease diagnoses, association studies, ancestry inference and so forth. The development of DNA sequencing technologies makes it possible to obtain haplotypes from a set of aligned reads originated from both copies of a chromosome of a single individual. This approach is often known as haplotype assembly. Exact algorithms that can give optimal solutions to the haplotype assembly problem are highly demanded. Unfortunately, previous algorithms for this problem either fail to output optimal solutions or take too long time even executed on a PC cluster. We develop an approach to finding optimal solutions for the haplotype assembly problem under the minimum-error-correction (MEC) model. Most of the previous approaches assume that the columns in the input matrix correspond to (putative) heterozygous sites. This all-heterozygous assumption is correct for most columns, but it may be incorrect for a small number of columns. In this article, we consider the MEC model with or without the all-heterozygous assumption. In our approach, we first use new methods to decompose the input read matrix into small independent blocks and then model the problem for each block as an integer linear programming problem, which is then solved by an integer linear programming solver. We have tested our program on a single PC [a Linux (x64) desktop PC with i7-3960X CPU], using the filtered HuRef and the NA 12878 datasets (after applying some variant calling methods). With the all-heterozygous assumption, our approach can optimally solve the whole HuRef data set within a total time of 31 h (26 h for the most difficult block of the 15th chromosome and only 5 h for the other blocks). To our knowledge, this is the first time that MEC optimal solutions are completely obtained for the filtered HuRef dataset. Moreover, in the general case (without the all-heterozygous assumption), for the HuRef dataset our

  7. Assembly factors of F1FO-ATP synthase across genomes

    Czech Academy of Sciences Publication Activity Database

    Pícková, Andrea; Potocký, Martin; Houštěk, Josef

    2005-01-01

    Roč. 59, č. 3 (2005), s. 393-402 ISSN 0887-3585 R&D Projects: GA MŠk(CZ) 1M0520; GA MZd(CZ) NR7790 Grant - others:GA UK(CZ) 12/2002; GA UK(CZ) 11/2004; EC Framework Programme(XE) LSHM-CT-2004-503116 Institutional research plan: CEZ:AV0Z50110509 Keywords : assembly * ATP synthase * phylogenetic and sequence analysis Subject RIV: FB - Endocrinology, Diabetology, Metabolism, Nutrition Impact factor: 4.684, year: 2005

  8. Copy-number and gene dependency analysis reveals partial copy loss of wild-type SF3B1 as a novel cancer vulnerability. | Office of Cancer Genomics

    Science.gov (United States)

    Genomic instability is a hallmark of human cancer, and results in widespread somatic copy number alterations. We used a genome-scale shRNA viability screen in human cancer cell lines to systematically identify genes that are essential in the context of particular copy-number alterations (copy-number associated gene dependencies). The most enriched class of copy-number associated gene dependencies was CYCLOPS (Copy-number alterations Yielding Cancer Liabilities Owing to Partial losS) genes, and spliceosome components were the most prevalent.

  9. A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules

    Science.gov (United States)

    Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried

    2015-01-01

    A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis. PMID:26355961

  10. Single-molecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution

    KAUST Repository

    Lightfoot, D. J.; Jarvis, David Erwin; Ramaraj, T.; Lee, R.; Jellen, E. N.; Maughan, P. J.

    2017-01-01

    Background: Amaranth (Amaranthus hypochondriacus) was a food staple among the ancient civilizations of Central and South America that has recently received increased attention due to the high nutritional value of the seeds, with the potential to help alleviate malnutrition and food security concerns, particularly in arid and semiarid regions of the developing world. Here, we present a reference-quality assembly of the amaranth genome which will assist the agronomic development of the species.Results: Utilizing single-molecule, real-time sequencing (Pacific Biosciences) and chromatin interaction mapping (Hi-C) to close assembly gaps and scaffold contigs, respectively, we improved our previously reported Illumina-based assembly to produce a chromosome-scale assembly with a scaffold N50 of 24.4 Mb. The 16 largest scaffolds contain 98% of the assembly and likely represent the haploid chromosomes (n = 16). To demonstrate the accuracy and utility of this approach, we produced physical and genetic maps and identified candidate genes for the betalain pigmentation pathway. The chromosome-scale assembly facilitated a genome-wide syntenic comparison of amaranth with other Amaranthaceae species, revealing chromosome loss and fusion events in amaranth that explain the reduction from the ancestral haploid chromosome number (n = 18) for a tetraploid member of the Amaranthaceae. as major evolutionary events in the 2n = 32 amaranths and clearly establish the homoeologous relationship among most of the subgenome chromosomes, which will facilitate future investigations of intragenomic changes that occurred post polyploidization.

  11. Genome-wide SNP identification by high-throughput sequencing and selective mapping allows sequence assembly positioning using a framework genetic linkage map

    Directory of Open Access Journals (Sweden)

    Xu Xiangming

    2010-12-01

    Full Text Available Abstract Background Determining the position and order of contigs and scaffolds from a genome assembly within an organism's genome remains a technical challenge in a majority of sequencing projects. In order to exploit contemporary technologies for DNA sequencing, we developed a strategy for whole genome single nucleotide polymorphism sequencing allowing the positioning of sequence contigs onto a linkage map using the bin mapping method. Results The strategy was tested on a draft genome of the fungal pathogen Venturia inaequalis, the causal agent of apple scab, and further validated using sequence contigs derived from the diploid plant genome Fragaria vesca. Using our novel method we were able to anchor 70% and 92% of sequences assemblies for V. inaequalis and F. vesca, respectively, to genetic linkage maps. Conclusions We demonstrated the utility of this approach by accurately determining the bin map positions of the majority of the large sequence contigs from each genome sequence and validated our method by mapping single sequence repeat markers derived from sequence contigs on a full mapping population.

  12. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from total DNA Sequences.

    NARCIS (Netherlands)

    Izan, Shairul; Esselink, G.; Visser, R.G.F.; Smulders, M.J.M.; Borm, T.J.A.

    2017-01-01

    Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This

  13. Deconstruction of archaeal genome depict strategic consensus in core pathways coding sequence assembly.

    Directory of Open Access Journals (Sweden)

    Ayon Pal

    Full Text Available A comprehensive in silico analysis of 71 species representing the different taxonomic classes and physiological genre of the domain Archaea was performed. These organisms differed in their physiological attributes, particularly oxygen tolerance and energy metabolism. We explored the diversity and similarity in the codon usage pattern in the genes and genomes of these organisms, emphasizing on their core cellular pathways. Our thrust was to figure out whether there is any underlying similarity in the design of core pathways within these organisms. Analyses of codon utilization pattern, construction of hierarchical linear models of codon usage, expression pattern and codon pair preference pointed to the fact that, in the archaea there is a trend towards biased use of synonymous codons in the core cellular pathways and the Nc-plots appeared to display the physiological variations present within the different species. Our analyses revealed that aerobic species of archaea possessed a larger degree of freedom in regulating expression levels than could be accounted for by codon usage bias alone. This feature might be a consequence of their enhanced metabolic activities as a result of their adaptation to the relatively O2-rich environment. Species of archaea, which are related from the taxonomical viewpoint, were found to have striking similarities in their ORF structuring pattern. In the anaerobic species of archaea, codon bias was found to be a major determinant of gene expression. We have also detected a significant difference in the codon pair usage pattern between the whole genome and the genes related to vital cellular pathways, and it was not only species-specific but pathway specific too. This hints towards the structuring of ORFs with better decoding accuracy during translation. Finally, a codon-pathway interaction in shaping the codon design of pathways was observed where the transcription pathway exhibited a significantly different coding

  14. Deconstruction of archaeal genome depict strategic consensus in core pathways coding sequence assembly.

    Science.gov (United States)

    Pal, Ayon; Banerjee, Rachana; Mondal, Uttam K; Mukhopadhyay, Subhasis; Bothra, Asim K

    2015-01-01

    A comprehensive in silico analysis of 71 species representing the different taxonomic classes and physiological genre of the domain Archaea was performed. These organisms differed in their physiological attributes, particularly oxygen tolerance and energy metabolism. We explored the diversity and similarity in the codon usage pattern in the genes and genomes of these organisms, emphasizing on their core cellular pathways. Our thrust was to figure out whether there is any underlying similarity in the design of core pathways within these organisms. Analyses of codon utilization pattern, construction of hierarchical linear models of codon usage, expression pattern and codon pair preference pointed to the fact that, in the archaea there is a trend towards biased use of synonymous codons in the core cellular pathways and the Nc-plots appeared to display the physiological variations present within the different species. Our analyses revealed that aerobic species of archaea possessed a larger degree of freedom in regulating expression levels than could be accounted for by codon usage bias alone. This feature might be a consequence of their enhanced metabolic activities as a result of their adaptation to the relatively O2-rich environment. Species of archaea, which are related from the taxonomical viewpoint, were found to have striking similarities in their ORF structuring pattern. In the anaerobic species of archaea, codon bias was found to be a major determinant of gene expression. We have also detected a significant difference in the codon pair usage pattern between the whole genome and the genes related to vital cellular pathways, and it was not only species-specific but pathway specific too. This hints towards the structuring of ORFs with better decoding accuracy during translation. Finally, a codon-pathway interaction in shaping the codon design of pathways was observed where the transcription pathway exhibited a significantly different coding frequency signature.

  15. Partial structure of the phylloxin gene from the giant monkey frog, Phyllomedusa bicolor: parallel cloning of precursor cDNA and genomic DNA from lyophilized skin secretion.

    Science.gov (United States)

    Chen, Tianbao; Gagliardo, Ron; Walker, Brian; Zhou, Mei; Shaw, Chris

    2005-12-01

    Phylloxin is a novel prototype antimicrobial peptide from the skin of Phyllomedusa bicolor. Here, we describe parallel identification and sequencing of phylloxin precursor transcript (mRNA) and partial gene structure (genomic DNA) from the same sample of lyophilized skin secretion using our recently-described cloning technique. The open-reading frame of the phylloxin precursor was identical in nucleotide sequence to that previously reported and alignment with the nucleotide sequence derived from genomic DNA indicated the presence of a 175 bp intron located in a near identical position to that found in the dermaseptins. The highly-conserved structural organization of skin secretion peptide genes in P. bicolor can thus be extended to include that encoding phylloxin (plx). These data further reinforce our assertion that application of the described methodology can provide robust genomic/transcriptomic/peptidomic data without the need for specimen sacrifice.

  16. Experimental validation of 3D reconstructed pin-power distributions in full-scale BWR fuel assemblies with partial length rods

    Energy Technology Data Exchange (ETDEWEB)

    Giust, F. D. [Axpo Kernenergie, Parkstrasse 23, CH-5401 Baden (Switzerland); Swiss Federal Inst. of Technology EPFL, CH-1015 Lausanne (Switzerland); Grimm, P. [Paul Scherrer Inst., CH-5232 Villigen (Switzerland); Chawla, R. [Paul Scherrer Inst., CH-5232 Villigen (Switzerland); Swiss Federal Inst. of Technology (EPFL), CH-1015 Lausanne (Switzerland)

    2012-07-01

    Total fission rate measurements have been performed on full-size BWR fuel assemblies of type SVEA-96 Optima2 in the framework of Phase III of the LWR-PROTEUS experimental program at the Paul Scherrer Inst.. This paper presents comparisons of calculated, nodal reconstructed, pin-wise total-fission rate distributions with experimental results. Radial comparisons have been performed for the three sections of the assembly (96, 92 and 84 fuel pins), while three-dimensional effects have been investigated at pellet-level for the two transition regions, i.e. the tips of the short (1/3) and long (2/3) partial length rods. The test zone has been modeled using two different code systems: HELIOS/PRESTO-2 and CASMO-5/SIMULATE-5. The first is presently used for core monitoring and design at the Leibstadt Nuclear Power Plant (KKL). The second represents the most recent generation of the widely applied CASMO/SIMULATE system. For representing the PROTEUS test-zone boundaries, Partial Current Ratios (PCRs) - derived from a 3D MCNPX model of the entire reactor - have been applied to the PRESTO-2 and SIMULATE-5 models in the form of 2- and 5-group diagonal albedo matrices, respectively. The MCNPX results have also served as a reference, high-order transport solution in the calculation/experiment comparisons. It is shown that the performance of the nodal methodologies in predicting the global distribution of the total-fission rate is very satisfactory. Considering the various radial comparisons, the standard deviations of the calculated/experimental (C/E) distributions do not exceed 1.9% for any of the three methodologies - PRESTO-2, SIMULATE-5 and MCNPX. For the three-dimensional comparisons at pellet-level, the corresponding standard deviations are 2.7%, 2.0% and 2.1%, respectively. (authors)

  17. An att site-based recombination reporter system for genome engineering and synthetic DNA assembly.

    Science.gov (United States)

    Bland, Michael J; Ducos-Galand, Magaly; Val, Marie-Eve; Mazel, Didier

    2017-07-14

    Direct manipulation of the genome is a widespread technique for genetic studies and synthetic biology applications. The tyrosine and serine site-specific recombination systems of bacteriophages HK022 and ΦC31 are widely used for stable directional exchange and relocation of DNA sequences, making them valuable tools in these contexts. We have developed site-specific recombination tools that allow the direct selection of recombination events by embedding the attB site from each system within the β-lactamase resistance coding sequence (bla). The HK and ΦC31 tools were developed by placing the attB sites from each system into the signal peptide cleavage site coding sequence of bla. All possible open reading frames (ORFs) were inserted and tested for recombination efficiency and bla activity. Efficient recombination was observed for all tested ORFs (3 for HK, 6 for ΦC31) as shown through a cointegrate formation assay. The bla gene with the embedded attB site was functional for eight of the nine constructs tested. The HK/ΦC31 att-bla system offers a simple way to directly select recombination events, thus enhancing the use of site-specific recombination systems for carrying out precise, large-scale DNA manipulation, and adding useful tools to the genetics toolbox. We further show the power and flexibility of bla to be used as a reporter for recombination.

  18. Insights on novel particulate self-assembled drug delivery beads based on partial inclusion complexes between triglycerides and cyclodextrins.

    Science.gov (United States)

    Aburahma, Mona Hassan

    2016-09-01

    Most of the newly designed drug molecules are lipophilic in nature and often encounter erratic absorption and low bioavailability after oral administration. Finding ways to enhance the absorption and bioavailability of these lipophilic drugs is one of the major challenges that face pharmaceutical industry nowadays. In view of that, the purpose of this review is to shed some light on a novel particulate self-assembling system named "beads" than can act as a safe carrier for delivering lipophilic drugs. The beads are prepared simply by mixing oils with cyclodextrin (CD) aqueous solution in mild conditions. A unique interaction between oil components and CD molecules occurs to form in situ surface-active complexes which are prerequisites for beads formation. This review mainly focuses on the fundamentals of beads preparation through reviewing present, yet scarce, literature. The key methods used for beads characterization are discussed in details. Also, the potential mechanisms by which beads increase the bioavailability of lipophilic drugs are illustrated. Finally, the related research areas that needs to be addressed in future for optimizing this promising delivery system are briefly outlined.

  19. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Sakakibara, Yasumbumi

    2011-10-13

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  20. Partial loss of heterozygosity events at the mutated gene in tumors from MLH1/MSH2 large genomic rearrangement carriers

    Energy Technology Data Exchange (ETDEWEB)

    Zavodna, Katarina; Krivulcik, Tomas; Bujalkova, Maria Gerykova [Laboratory of Cancer Genetics, Cancer Research Institute of Slovak Academy of Sciences, Vlarska 7, 833 91 Bratislava (Slovakia); Slamka, Tomas; Martinicky, David; Ilencikova, Denisa [National Cancer Institute, Department of Oncologic Genetics, Klenova 1, 833 01 Bratislava (Slovakia); Bartosova, Zdena [Laboratory of Cancer Genetics, Cancer Research Institute of Slovak Academy of Sciences, Vlarska 7, 833 91 Bratislava (Slovakia)

    2009-11-20

    Depending on the population studied, large genomic rearrangements (LGRs) of the mismatch repair (MMR) genes constitute various proportions of the germline mutations that predispose to hereditary non-polyposis colorectal cancer (HNPCC). It has been reported that loss of heterozygosity (LOH) at the LGR region occurs through a gene conversion mechanism in tumors from MLH1/MSH2 deletion carriers; however, the converted tracts were delineated only by extragenic microsatellite markers. We sought to determine the frequency of LGRs in Slovak HNPCC patients and to study LOH in tumors from LGR carriers at the LGR region, as well as at other heterozygous markers within the gene to more precisely define conversion tracts. The main MMR genes responsible for HNPCC, MLH1, MSH2, MSH6, and PMS2, were analyzed by MLPA (multiplex ligation-dependent probe amplification) in a total of 37 unrelated HNPCC-suspected patients whose MLH1/MSH2 genes gave negative results in previous sequencing experiments. An LOH study was performed on six tumors from LGR carriers by combining MLPA to assess LOH at LGR regions and sequencing to examine LOH at 28 SNP markers from the MLH1 and MSH2 genes. We found six rearrangements in the MSH2 gene (five deletions and dup5-6), and one aberration in the MLH1 gene (del5-6). The MSH2 deletions were of three types (del1, del1-3, del1-7). We detected LOH at the LGR region in the single MLH1 case, which was determined in a previous study to be LOH-negative in the intragenic D3S1611 marker. Three tumors displayed LOH of at least one SNP marker, including two cases that were LOH-negative at the LGR region. LGRs accounted for 25% of germline MMR mutations identified in 28 Slovakian HNPCC families. A high frequency of LGRs among the MSH2 mutations provides a rationale for a MLPA screening of the Slovakian HNPCC families prior scanning by DNA sequencing. LOH at part of the informative loci confined to the MLH1 or MSH2 gene (heterozygous LGR region, SNP, or

  1. Partial loss of heterozygosity events at the mutated gene in tumors from MLH1/MSH2 large genomic rearrangement carriers

    Directory of Open Access Journals (Sweden)

    Ilencikova Denisa

    2009-11-01

    Full Text Available Abstract Background Depending on the population studied, large genomic rearrangements (LGRs of the mismatch repair (MMR genes constitute various proportions of the germline mutations that predispose to hereditary non-polyposis colorectal cancer (HNPCC. It has been reported that loss of heterozygosity (LOH at the LGR region occurs through a gene conversion mechanism in tumors from MLH1/MSH2 deletion carriers; however, the converted tracts were delineated only by extragenic microsatellite markers. We sought to determine the frequency of LGRs in Slovak HNPCC patients and to study LOH in tumors from LGR carriers at the LGR region, as well as at other heterozygous markers within the gene to more precisely define conversion tracts. Methods The main MMR genes responsible for HNPCC, MLH1, MSH2, MSH6, and PMS2, were analyzed by MLPA (multiplex ligation-dependent probe amplification in a total of 37 unrelated HNPCC-suspected patients whose MLH1/MSH2 genes gave negative results in previous sequencing experiments. An LOH study was performed on six tumors from LGR carriers by combining MLPA to assess LOH at LGR regions and sequencing to examine LOH at 28 SNP markers from the MLH1 and MSH2 genes. Results We found six rearrangements in the MSH2 gene (five deletions and dup5-6, and one aberration in the MLH1 gene (del5-6. The MSH2 deletions were of three types (del1, del1-3, del1-7. We detected LOH at the LGR region in the single MLH1 case, which was determined in a previous study to be LOH-negative in the intragenic D3S1611 marker. Three tumors displayed LOH of at least one SNP marker, including two cases that were LOH-negative at the LGR region. Conclusion LGRs accounted for 25% of germline MMR mutations identified in 28 Slovakian HNPCC families. A high frequency of LGRs among the MSH2 mutations provides a rationale for a MLPA screening of the Slovakian HNPCC families prior scanning by DNA sequencing. LOH at part of the informative loci confined to the MLH1

  2. Partial loss of heterozygosity events at the mutated gene in tumors from MLH1/MSH2 large genomic rearrangement carriers

    International Nuclear Information System (INIS)

    Zavodna, Katarina; Krivulcik, Tomas; Bujalkova, Maria Gerykova; Slamka, Tomas; Martinicky, David; Ilencikova, Denisa; Bartosova, Zdena

    2009-01-01

    Depending on the population studied, large genomic rearrangements (LGRs) of the mismatch repair (MMR) genes constitute various proportions of the germline mutations that predispose to hereditary non-polyposis colorectal cancer (HNPCC). It has been reported that loss of heterozygosity (LOH) at the LGR region occurs through a gene conversion mechanism in tumors from MLH1/MSH2 deletion carriers; however, the converted tracts were delineated only by extragenic microsatellite markers. We sought to determine the frequency of LGRs in Slovak HNPCC patients and to study LOH in tumors from LGR carriers at the LGR region, as well as at other heterozygous markers within the gene to more precisely define conversion tracts. The main MMR genes responsible for HNPCC, MLH1, MSH2, MSH6, and PMS2, were analyzed by MLPA (multiplex ligation-dependent probe amplification) in a total of 37 unrelated HNPCC-suspected patients whose MLH1/MSH2 genes gave negative results in previous sequencing experiments. An LOH study was performed on six tumors from LGR carriers by combining MLPA to assess LOH at LGR regions and sequencing to examine LOH at 28 SNP markers from the MLH1 and MSH2 genes. We found six rearrangements in the MSH2 gene (five deletions and dup5-6), and one aberration in the MLH1 gene (del5-6). The MSH2 deletions were of three types (del1, del1-3, del1-7). We detected LOH at the LGR region in the single MLH1 case, which was determined in a previous study to be LOH-negative in the intragenic D3S1611 marker. Three tumors displayed LOH of at least one SNP marker, including two cases that were LOH-negative at the LGR region. LGRs accounted for 25% of germline MMR mutations identified in 28 Slovakian HNPCC families. A high frequency of LGRs among the MSH2 mutations provides a rationale for a MLPA screening of the Slovakian HNPCC families prior scanning by DNA sequencing. LOH at part of the informative loci confined to the MLH1 or MSH2 gene (heterozygous LGR region, SNP, or

  3. Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut.

    Directory of Open Access Journals (Sweden)

    Alix Armero

    Full Text Available The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L. is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut and a reference species (oil palm to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/.

  4. Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut.

    Science.gov (United States)

    Armero, Alix; Baudouin, Luc; Bocs, Stéphanie; This, Dominique

    2017-01-01

    The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/).

  5. De Novo Assembly and Phasing of Dikaryotic Genomes from Two Isolates of Puccinia coronata f. sp. avenae, the Causal Agent of Oat Crown Rust.

    Science.gov (United States)

    Miller, Marisa E; Zhang, Ying; Omidvar, Vahid; Sperschneider, Jana; Schwessinger, Benjamin; Raley, Castle; Palmer, Jonathan M; Garnica, Diana; Upadhyaya, Narayana; Rathjen, John; Taylor, Jennifer M; Park, Robert F; Dodds, Peter N; Hirsch, Cory D; Kianian, Shahryar F; Figueroa, Melania

    2018-02-20

    Oat crown rust, caused by the fungus Pucinnia coronata f. sp. avenae , is a devastating disease that impacts worldwide oat production. For much of its life cycle, P. coronata f. sp. avenae is dikaryotic, with two separate haploid nuclei that may vary in virulence genotype, highlighting the importance of understanding haplotype diversity in this species. We generated highly contiguous de novo genome assemblies of two P. coronata f. sp. avenae isolates, 12SD80 and 12NC29, from long-read sequences. In total, we assembled 603 primary contigs for 12SD80, for a total assembly length of 99.16 Mbp, and 777 primary contigs for 12NC29, for a total length of 105.25 Mbp; approximately 52% of each genome was assembled into alternate haplotypes. This revealed structural variation between haplotypes in each isolate equivalent to more than 2% of the genome size, in addition to about 260,000 and 380,000 heterozygous single-nucleotide polymorphisms in 12SD80 and 12NC29, respectively. Transcript-based annotation identified 26,796 and 28,801 coding sequences for isolates 12SD80 and 12NC29, respectively, including about 7,000 allele pairs in haplotype-phased regions. Furthermore, expression profiling revealed clusters of coexpressed secreted effector candidates, and the majority of orthologous effectors between isolates showed conservation of expression patterns. However, a small subset of orthologs showed divergence in expression, which may contribute to differences in virulence between 12SD80 and 12NC29. This study provides the first haplotype-phased reference genome for a dikaryotic rust fungus as a foundation for future studies into virulence mechanisms in P. coronata f. sp. avenae IMPORTANCE Disease management strategies for oat crown rust are challenged by the rapid evolution of Puccinia coronata f. sp. avenae , which renders resistance genes in oat varieties ineffective. Despite the economic importance of understanding P. coronata f. sp. avenae , resources to study the

  6. Single-molecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution

    KAUST Repository

    Lightfoot, D. J.

    2017-08-29

    Background: Amaranth (Amaranthus hypochondriacus) was a food staple among the ancient civilizations of Central and South America that has recently received increased attention due to the high nutritional value of the seeds, with the potential to help alleviate malnutrition and food security concerns, particularly in arid and semiarid regions of the developing world. Here, we present a reference-quality assembly of the amaranth genome which will assist the agronomic development of the species.

  7. De Novo Assembly and Phasing of Dikaryotic Genomes from Two Isolates of Puccinia coronata f. sp. avenae, the Causal Agent of Oat Crown Rust

    Directory of Open Access Journals (Sweden)

    Marisa E. Miller

    2018-02-01

    Full Text Available Oat crown rust, caused by the fungus Pucinnia coronata f. sp. avenae, is a devastating disease that impacts worldwide oat production. For much of its life cycle, P. coronata f. sp. avenae is dikaryotic, with two separate haploid nuclei that may vary in virulence genotype, highlighting the importance of understanding haplotype diversity in this species. We generated highly contiguous de novo genome assemblies of two P. coronata f. sp. avenae isolates, 12SD80 and 12NC29, from long-read sequences. In total, we assembled 603 primary contigs for 12SD80, for a total assembly length of 99.16 Mbp, and 777 primary contigs for 12NC29, for a total length of 105.25 Mbp; approximately 52% of each genome was assembled into alternate haplotypes. This revealed structural variation between haplotypes in each isolate equivalent to more than 2% of the genome size, in addition to about 260,000 and 380,000 heterozygous single-nucleotide polymorphisms in 12SD80 and 12NC29, respectively. Transcript-based annotation identified 26,796 and 28,801 coding sequences for isolates 12SD80 and 12NC29, respectively, including about 7,000 allele pairs in haplotype-phased regions. Furthermore, expression profiling revealed clusters of coexpressed secreted effector candidates, and the majority of orthologous effectors between isolates showed conservation of expression patterns. However, a small subset of orthologs showed divergence in expression, which may contribute to differences in virulence between 12SD80 and 12NC29. This study provides the first haplotype-phased reference genome for a dikaryotic rust fungus as a foundation for future studies into virulence mechanisms in P. coronata f. sp. avenae.

  8. Microdiversification of a Pelagic Polynucleobacter Species Is Mainly Driven by Acquisition of Genomic Islands from a Partially Interspecific Gene Pool

    Science.gov (United States)

    Schmidt, Johanna; Jezberová, Jitka; Koll, Ulrike; Hahn, Martin W.

    2016-01-01

    ABSTRACT Microdiversification of a planktonic freshwater bacterium was studied by comparing 37 Polynucleobacter asymbioticus strains obtained from three geographically separated sites in the Austrian Alps. Genome comparison of nine strains revealed a core genome of 1.8 Mb, representing 81% of the average genome size. Seventy-five percent of the remaining flexible genome is clustered in genomic islands (GIs). Twenty-four genomic positions could be identified where GIs are potentially located. These positions are occupied strain specifically from a set of 28 GI variants, classified according to similarities in their gene content. One variant, present in 62% of the isolates, encodes a pathway for the degradation of aromatic compounds, and another, found in 78% of the strains, contains an operon for nitrate assimilation. Both variants were shown in ecophysiological tests to be functional, thus providing the potential for microniche partitioning. In addition, detected interspecific horizontal exchange of GIs indicates a large gene pool accessible to Polynucleobacter species. In contrast to core genes, GIs are spread more successfully across spatially separated freshwater habitats. The mobility and functional diversity of GIs allow for rapid evolution, which may be a key aspect for the ubiquitous occurrence of Polynucleobacter bacteria. IMPORTANCE Assessing the ecological relevance of bacterial diversity is a key challenge for current microbial ecology. The polyphasic approach which was applied in this study, including targeted isolation of strains, genome analysis, and ecophysiological tests, is crucial for the linkage of genetic and ecological knowledge. Particularly great importance is attached to the high number of closely related strains which were investigated, represented by genome-wide average nucleotide identities (ANI) larger than 97%. The extent of functional diversification found on this narrow phylogenetic scale is compelling. Moreover, the transfer of

  9. De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies.

    Science.gov (United States)

    Karamitros, Timokratis; Harrison, Ian; Piorkowska, Renata; Katzourakis, Aris; Magiorkinis, Gkikas; Mbisa, Jean Lutamyo

    2016-01-01

    Human herpesvirus type 1 (HHV-1) has a large double-stranded DNA genome of approximately 152 kbp that is structurally complex and GC-rich. This makes the assembly of HHV-1 whole genomes from short-read sequencing data technically challenging. To improve the assembly of HHV-1 genomes we have employed a hybrid genome assembly protocol using data from two sequencing technologies: the short-read Roche 454 and the long-read Oxford Nanopore MinION sequencers. We sequenced 18 HHV-1 cell culture-isolated clinical specimens collected from immunocompromised patients undergoing antiviral therapy. The susceptibility of the samples to several antivirals was determined by plaque reduction assay. Hybrid genome assembly resulted in a decrease in the number of contigs in 6 out of 7 samples and an increase in N(G)50 and N(G)75 of all 7 samples sequenced by both technologies. The approach also enhanced the detection of non-canonical contigs including a rearrangement between the unique (UL) and repeat (T/IRL) sequence regions of one sample that was not detectable by assembly of 454 reads alone. We detected several known and novel resistance-associated mutations in UL23 and UL30 genes. Genome-wide genetic variability ranged from genomes will be useful in determining genetic determinants of drug resistance, virulence, pathogenesis and viral evolution. The numerous, complex repeat regions of the HHV-1 genome currently remain a barrier towards this goal.

  10. Optimizing Hybrid de Novo Transcriptome Assembly and Extending Genomic Resources for Giant Freshwater Prawns (Macrobrachium rosenbergii: The Identification of Genes and Markers Associated with Reproduction

    Directory of Open Access Journals (Sweden)

    Hyungtaek Jung

    2016-05-01

    Full Text Available The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world’s most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomic resources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium.

  11. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  12. Control of Partial Coalescence of Self-Assembled Metal Nano-Particles across Lyotropic Liquid Crystals Templates towards Long Range Meso-Porous Metal Frameworks Design

    Directory of Open Access Journals (Sweden)

    Ludovic F. Dumée

    2015-10-01

    Full Text Available The formation of purely metallic meso-porous metal thin films by partial interface coalescence of self-assembled metal nano-particles across aqueous solutions of Pluronics triblock lyotropic liquid crystals is demonstrated for the first time. Small angle X-ray scattering was used to study the influence of the thin film composition and processing conditions on the ordered structures. The structural characteristics of the meso-structures formed demonstrated to primarily rely on the lyotropic liquid crystal properties while the nature of the metal nano-particles used as well as the their diameters were found to affect the ordered structure formation. The impact of the annealing temperature on the nano-particle coalescence and efficiency at removing the templating lyotropic liquid crystals was also analysed. It is demonstrated that the lyotropic liquid crystal is rendered slightly less thermally stable, upon mixing with metal nano-particles and that low annealing temperatures are sufficient to form purely metallic frameworks with average pore size distributions smaller than 500 nm and porosity around 45% with potential application in sensing, catalysis, nanoscale heat exchange, and molecular separation.

  13. Harnessing NGS and Big Data Optimally: Comparison of miRNA Prediction from Assembled versus Non-assembled Sequencing Data--The Case of the Grass Aegilops tauschii Complex Genome.

    Science.gov (United States)

    Budak, Hikmet; Kantar, Melda

    2015-07-01

    MicroRNAs (miRNAs) are small, endogenous, non-coding RNA molecules that regulate gene expression at the post-transcriptional level. As high-throughput next generation sequencing (NGS) and Big Data rapidly accumulate for various species, efforts for in silico identification of miRNAs intensify. Surprisingly, the effect of the input genomics sequence on the robustness of miRNA prediction was not evaluated in detail to date. In the present study, we performed a homology-based miRNA and isomiRNA prediction of the 5D chromosome of bread wheat progenitor, Aegilops tauschii, using two distinct sequence data sets as input: (1) raw sequence reads obtained from 454-GS FLX Titanium sequencing platform and (2) an assembly constructed from these reads. We also compared this method with a number of available plant sequence datasets. We report here the identification of 62 and 22 miRNAs from raw reads and the assembly, respectively, of which 16 were predicted with high confidence from both datasets. While raw reads promoted sensitivity with the high number of miRNAs predicted, 55% (12 out of 22) of the assembly-based predictions were supported by previous observations, bringing specificity forward compared to the read-based predictions, of which only 37% were supported. Importantly, raw reads could identify several repeat-related miRNAs that could not be detected with the assembly. However, raw reads could not capture 6 miRNAs, for which the stem-loops could only be covered by the relatively longer sequences from the assembly. In summary, the comparison of miRNA datasets obtained by these two strategies revealed that utilization of raw reads, as well as assemblies for in silico prediction, have distinct advantages and disadvantages. Consideration of these important nuances can benefit future miRNA identification efforts in the current age of NGS and Big Data driven life sciences innovation.

  14. Comparing genome guided assembly and phased variants based assembly approach to separate the homoeolog transcripts in tetraploid peanut (Arachis hypogaea L.)

    Science.gov (United States)

    Homoeologous copies of transcripts are abundant in many self-pollinating species including tetraploid peanut, and can impose a challenge to build a transcriptome reference without the merging of homoeologs. De novo transcriptome assembly of tetraploid OLin with single kmer and multiple kmer approach...

  15. Microdiversification of a Pelagic Polynucleobacter Species Is Mainly Driven by Acquisition of Genomic Islands from a Partially Interspecific Gene Pool

    Czech Academy of Sciences Publication Activity Database

    Hoetzinger, M.; Schmidt, J.; Jezberová, Jitka; Koll, U.; Hahn, M.W.

    2017-01-01

    Roč. 83, č. 3 (2017), č. článku e02266-16. ISSN 0099-2240 Institutional support: RVO:60077344 Keywords : Polynucleobacter * ecophysiology * environmental genomics * functional diversity Subject RIV: EE - Microbiology, Virology OBOR OECD: Microbiology Impact factor: 3.807, year: 2016

  16. CasEMBLR: Cas9-Facilitated Multiloci Genomic Integration of in Vivo Assembled DNA Parts in Saccharomyces cerevisiae

    DEFF Research Database (Denmark)

    Jakociunas, Tadas; Rajkumar, Arun Stephen; Zhang, Jie

    2015-01-01

    , we present a method for marker-free multiloci integration of in vivo assembled DNA parts. By the use of CRISPR/Cas9-mediated one-step double-strand breaks at single, double and triple integration sites we report the successful in vivo assembly and chromosomal integration of DNA parts. We call our...

  17. Genomic resources and draft assemblies of the human and porcine varieties of scabies mites, Sarcoptes scabiei var. hominis and var. suis.

    Science.gov (United States)

    Mofiz, Ehtesham; Holt, Deborah C; Seemann, Torsten; Currie, Bart J; Fischer, Katja; Papenfuss, Anthony T

    2016-06-02

    The scabies mite, Sarcoptes scabiei, is a parasitic arachnid and cause of the infectious skin disease scabies in humans and mange in other animal species. Scabies infections are a major health problem, particularly in remote Indigenous communities in Australia, where secondary group A streptococcal and Staphylococcus aureus infections of scabies sores are thought to drive the high rate of rheumatic heart disease and chronic kidney disease. We sequenced the genome of two samples of Sarcoptes scabiei var. hominis obtained from unrelated patients with crusted scabies located in different parts of northern Australia using the Illumina HiSeq. We also sequenced samples of Sarcoptes scabiei var. suis from a pig model. Because of the small size of the scabies mite, these data are derived from pools of thousands of mites and are metagenomic, including host and microbiome DNA. We performed cleaning and de novo assembly and present Sarcoptes scabiei var. hominis and var. suis draft reference genomes. We have constructed a preliminary annotation of this reference comprising 13,226 putative coding sequences based on sequence similarity to known proteins. We have developed extensive genomic resources for the scabies mite, including reference genomes and a preliminary annotation.

  18. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    Energy Technology Data Exchange (ETDEWEB)

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric; Abernathy, Jason; Waldbieser, Geoff; Lindquist, Erika; Richardson, Paul; Lucas, Susan; Wang, Mei; Li, Ping; Thimmapuram, Jyothi; Liu, Lei; Vullaganti, Deepika; Kucuktas, Huseyin; Murdock, Christopher; Small, Brian C; Wilson, Melanie; Liu, Hong; Jiang, Yanliang; Lee, Yoona; Chen, Fei; Lu, Jianguo; Wang, Wenqi; Xu, Peng; Somridhivej, Benjaporn; Baoprasertkul, Puttharat; Quilang, Jonas; Sha, Zhenxia; Bao, Baolong; Wang, Yaping; Wang, Qun; Takano, Tomokazu; Nandi, Samiran; Liu, Shikai; Wong, Lilian; Kaltenboeck, Ludmilla; Quiniou, Sylvie; Bengten, Eva; Miller, Norman; Trant, John; Rokhsar, Daniel; Liu, Zhanjiang

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.

  19. De novo genome assembly and annotation of Australia's largest freshwater fish, the Murray cod (Maccullochella peelii), from Illumina and Nanopore sequencing read.

    Science.gov (United States)

    Austin, Christopher M; Tan, Mun Hua; Harrisson, Katherine A; Lee, Yin Peng; Croft, Laurence J; Sunnucks, Paul; Pavlova, Alexandra; Gan, Han Ming

    2017-08-01

    One of the most iconic Australian fish is the Murray cod, Maccullochella peelii (Mitchell 1838), a freshwater species that can grow to ∼1.8 metres in length and live to age ≥48 years. The Murray cod is of a conservation concern as a result of strong population contractions, but it is also popular for recreational fishing and is of growing aquaculture interest. In this study, we report the whole genome sequence of the Murray cod to support ongoing population genetics, conservation, and management research, as well as to better understand the evolutionary ecology and history of the species. A draft Murray cod genome of 633 Mbp (N50 = 109 974bp; BUSCO and CEGMA completeness of 94.2% and 91.9%, respectively) with an estimated 148 Mbp of putative repetitive sequences was assembled from the combined sequencing data of 2 fish individuals with an identical maternal lineage; 47.2 Gb of Illumina HiSeq data and 804 Mb of Nanopore data were generated from the first individual while 23.2 Gb of Illumina MiSeq data were generated from the second individual. The inclusion of Nanopore reads for scaffolding followed by subsequent gap-closing using Illumina data led to a 29% reduction in the number of scaffolds and a 55% and 54% increase in the scaffold and contig N50, respectively. We also report the first transcriptome of Murray cod that was subsequently used to annotate the Murray cod genome, leading to the identification of 26 539 protein-coding genes. We present the whole genome of the Murray cod and anticipate this will be a catalyst for a range of genetic, genomic, and phylogenetic studies of the Murray cod and more generally other fish species of the Percichthydae family. © The Authors 2017. Published by Oxford University Press.

  20. Comparative genomic analysis of the arthropod muscle myosin heavy chain genes allows ancestral gene reconstruction and reveals a new type of 'partially' processed pseudogene

    Directory of Open Access Journals (Sweden)

    Kollmar Martin

    2008-02-01

    Full Text Available Abstract Background Alternative splicing of mutually exclusive exons is an important mechanism for increasing protein diversity in eukaryotes. The insect Mhc (myosin heavy chain gene produces all different muscle myosins as a result of alternative splicing in contrast to most other organisms of the Metazoa lineage, that have a family of muscle genes with each gene coding for a protein specialized for a functional niche. Results The muscle myosin heavy chain genes of 22 species of the Arthropoda ranging from the waterflea to wasp and Drosophila have been annotated. The analysis of the gene structures allowed the reconstruction of an ancient muscle myosin heavy chain gene and showed that during evolution of the arthropods introns have mainly been lost in these genes although intron gain might have happened in a few cases. Surprisingly, the genome of Aedes aegypti contains another and that of Culex pipiens quinquefasciatus two further muscle myosin heavy chain genes, called Mhc3 and Mhc4, that contain only one variant of the corresponding alternative exons of the Mhc1 gene. Mhc3 transcription in Aedes aegypti is documented by EST data. Mhc3 and Mhc4 inserted in the Aedes and Culex genomes either by gene duplication followed by the loss of all but one variant of the alternative exons, or by incorporation of a transcript of which all other variants have been spliced out retaining the exon-intron structure. The second and more likely possibility represents a new type of a 'partially' processed pseudogene. Conclusion Based on the comparative genomic analysis of the alternatively spliced arthropod muscle myosin heavy chain genes we propose that the splicing process operates sequentially on the transcript. The process consists of the splicing of the mutually exclusive exons until one exon out of the cluster remains while retaining surrounding intronic sequence. In a second step splicing of introns takes place. A related mechanism could be responsible for

  1. Biologic Constraints on Modelling Virus Assembly

    Directory of Open Access Journals (Sweden)

    Robert L. Garcea

    2008-01-01

    Full Text Available The mathematic modelling of icosahedral virus assembly has drawn increasing interest because of the symmetric geometry of the outer shell structures. Many models involve equilibrium expressions of subunit binding, with reversible subunit additions forming various intermediate structures. The underlying assumption is that a final lowest energy state drives the equilibrium toward assembly. In their simplest forms, these models have explained why high subunit protein concentrations and strong subunit association constants can result in kinetic traps forming off pathway partial and aberrant structures. However, the cell biology of virus assembly is exceedingly complex. The biochemistry and biology of polyoma and papillomavirus assembly described here illustrates many of these specific issues. Variables include the use of cellular ‘chaperone’ proteins as mediators of assembly fidelity, the coupling of assembly to encapsidation of a specific nucleic acid genome, the use of cellular structures as ‘workbenches’ upon which assembly occurs, and the underlying problem of making a capsid structure that is metastable and capable of rapid disassembly upon infection. Although formidable to model, incorporating these considerations could advance the relevance of mathematical models of virus assembly to the real world.

  2. Ninety-nine de novo assembled genomes from the moose (Alces alces) rumen microbiome provide new insights into microbial plant biomass degradation

    Science.gov (United States)

    Svartström, Olov; Alneberg, Johannes; Terrapon, Nicolas; Lombard, Vincent; de Bruijn, Ino; Malmsten, Jonas; Dalin, Ann-Marie; Muller, Emilie E.L.; Shah, Pranjul; Wilmes, Paul; Henrissat, Bernard; Aspeborg, Henrik; Andersson, Anders F.

    2017-01-01

    The moose (Alces alces) is a ruminant that harvests energy from fiber-rich lignocellulose material through carbohydrate-active enzymes (CAZymes) produced by its rumen microbes. We applied shotgun metagenomics to rumen contents from six moose to obtain insights into this microbiome. Following binning, 99 metagenome-assembled genomes (MAGs) belonging to eleven prokaryotic phyla were reconstructed and characterized based on phylogeny and CAZyme profile. The taxonomy of these MAGs reflected the overall composition of the metagenome, with dominance of the phyla Bacteroidetes and Firmicutes. Unlike in other ruminants, Spirochaetes constituted a significant proportion of the community and our analyses indicate that the corresponding strains are primarily pectin digesters. Pectin-degrading genes were also common in MAGs of Ruminococcus, Fibrobacteres and Bacteroidetes, and were overall overrepresented in the moose microbiome compared to other ruminants. Phylogenomic analyses revealed several clades within the Bacteriodetes without previously characterized genomes. Several of these MAGs encoded a large numbers of dockerins, a module usually associated with cellulosomes. The Bacteroidetes dockerins were often linked to CAZymes and sometimes encoded inside polysaccharide utilization loci (PULs), which has never been reported before. The almost one hundred CAZyme-annotated genomes reconstructed in this study provides an in-depth view of an efficient lignocellulose-degrading microbiome and prospects for developing enzyme technology for biorefineries. PMID:28731473

  3. Characterization of partial and near full-length genomes of HIV-1 strains sampled from recently infected individuals in São Paulo, Brazil.

    Directory of Open Access Journals (Sweden)

    Sabri Saeed Sanabani

    Full Text Available BACKGROUND: Genetic variability is a major feature of human immunodeficiency virus type 1 (HIV-1 and is considered the key factor frustrating efforts to halt the HIV epidemic. A proper understanding of HIV-1 genomic diversity is a fundamental prerequisite for proper epidemiology, genetic diagnosis, and successful drugs and vaccines design. Here, we report on the partial and near full-length genomic (NFLG variability of HIV-1 isolates from a well-characterized cohort of recently infected patients in São Paul, Brazil. METHODOLOGY: HIV-1 proviral DNA was extracted from the peripheral blood mononuclear cells of 113 participants. The NFLG and partial fragments were determined by overlapping nested PCR and direct sequencing. The data were phylogenetically analyzed. RESULTS: Of the 113 samples (90.3% male; median age 31 years; 79.6% homosexual men studied, 77 (68.1% NFLGs and 32 (29.3% partial fragments were successfully subtyped. Of the successfully subtyped sequences, 88 (80.7% were subtype B sequences, 12 (11% BF1 recombinants, 3 (2.8% subtype C sequences, 2 (1.8% BC recombinants and subclade F1 each, 1 (0.9% CRF02 AG, and 1 (0.9% CRF31 BC. Primary drug resistance mutations were observed in 14/101 (13.9% of samples, with 5.9% being resistant to protease inhibitors and nucleoside reverse transcriptase inhibitors (NRTI and 4.9% resistant to non-NRTIs. Predictions of viral tropism were determined for 86 individuals. X4 or X4 dual or mixed-tropic viruses (X4/DM were seen in 26 (30.2% of subjects. The proportion of X4 viruses in homosexuals was detected in 19/69 (27.5%. CONCLUSIONS: Our results confirm the existence of various HIV-1 subtypes circulating in São Paulo, and indicate that subtype B account for the majority of infections. Antiretroviral (ARV drug resistance is relatively common among recently infected patients. The proportion of X4 viruses in homosexuals was significantly higher than the proportion seen in other study populations.

  4. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

    Directory of Open Access Journals (Sweden)

    Ritland Carol

    2009-08-01

    Full Text Available Abstract Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs and full-length (FLcDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR and a cytochrome P450 (CYP720B4 from a non-arrayed genomic BAC library of white spruce (Picea glauca. Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR and 94 kbp (CYP720B4 long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs, high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene

  5. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

    Science.gov (United States)

    Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

    2009-08-06

    Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The

  6. Packaging of a unit-length viral genome: the role of nucleotides and the gpD decoration protein in stable nucleocapsid assembly in bacteriophage lambda.

    Science.gov (United States)

    Yang, Qin; Maluf, Nasib Karl; Catalano, Carlos Enrique

    2008-11-28

    The developmental pathways for a variety of eukaryotic and prokaryotic double-stranded DNA viruses include packaging of viral DNA into a preformed procapsid structure, catalyzed by terminase enzymes and fueled by ATP hydrolysis. In most instances, a capsid expansion process accompanies DNA packaging, which significantly increases the volume of the capsid to accommodate the full-length viral genome. "Decoration" proteins add to the surface of the expanded capsid lattice, and the terminase motors tightly package DNA, generating up to approximately 20 atm of internal capsid pressure. Herein we describe biochemical studies on genome packaging using bacteriophage lambda as a model system. Kinetic analysis suggests that the packaging motor possesses at least four ATPase catalytic sites that act cooperatively to effect DNA translocation, and that the motor is highly processive. While not required for DNA translocation into the capsid, the phage lambda capsid decoration protein gpD is essential for the packaging of the penultimate 8-10 kb (15-20%) of the viral genome; virtually no DNA is packaged in the absence of gpD when large DNA substrates are used, most likely due to a loss of capsid structural integrity. Finally, we show that ATP hydrolysis is required to retain the genome in a packaged state subsequent to condensation within the capsid. Presumably, the packaging motor continues to "idle" at the genome end and to maintain a positive pressure towards the packaged state. Surprisingly, ADP, guanosine triphosphate, and the nonhydrolyzable ATP analog 5'-adenylyl-beta,gamma-imidodiphosphate (AMP-PNP) similarly stabilize the packaged viral genome despite the fact that they fail to support genome packaging. In contrast, the poorly hydrolyzed ATP analog ATP-gammaS only partially stabilizes the nucleocapsid, and a DNA is released in "quantized" steps. We interpret the ensemble of data to indicate that (i) the viral procapsid possesses a degree of plasticity that is required to

  7. De novo assembly and annotation of the Asian tiger mosquito (Aedes albopictus) repeatome with dnaPipeTE from raw genomic reads and comparative analysis with the yellow fever mosquito (Aedes aegypti).

    Science.gov (United States)

    Goubert, Clément; Modolo, Laurent; Vieira, Cristina; ValienteMoro, Claire; Mavingui, Patrick; Boulesteix, Matthieu

    2015-03-11

    Repetitive DNA, including transposable elements (TEs), is found throughout eukaryotic genomes. Annotating and assembling the "repeatome" during genome-wide analysis often poses a challenge. To address this problem, we present dnaPipeTE-a new bioinformatics pipeline that uses a sample of raw genomic reads. It produces precise estimates of repeated DNA content and TE consensus sequences, as well as the relative ages of TE families. We shows that dnaPipeTE performs well using very low coverage sequencing in different genomes, losing accuracy only with old TE families. We applied this pipeline to the genome of the Asian tiger mosquito Aedes albopictus, an invasive species of human health interest, for which the genome size is estimated to be over 1 Gbp. Using dnaPipeTE, we showed that this species harbors a large (50% of the genome) and potentially active repeatome with an overall TE class and order composition similar to that of Aedes aegypti, the yellow fever mosquito. However, intraorder dynamics show clear distinctions between the two species, with differences at the TE family level. Our pipeline's ability to manage the repeatome annotation problem will make it helpful for new or ongoing assembly projects, and our results will benefit future genomic studies of A. albopictus. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  8. Genome Assembly of the Fungus Cochliobolus miyabeanus, and Transcriptome Analysis during Early Stages of Infection on American Wildrice (Zizania palustris L..

    Directory of Open Access Journals (Sweden)

    Claudia V Castell-Miller

    Full Text Available The fungus Cochliobolus miyabeanus causes severe leaf spot disease on rice (Oryza sativa and two North American specialty crops, American wildrice (Zizania palustris and switchgrass (Panicum virgatum. Despite the importance of C. miyabeanus as a disease-causing agent in wildrice, little is known about either the mechanisms of pathogenicity or host defense responses. To start bridging these gaps, the genome of C. miyabeanus strain TG12bL2 was shotgun sequenced using Illumina technology. The genome assembly consists of 31.79 Mbp in 2,378 scaffolds with an N50 = 74,921. It contains 11,000 predicted genes of which 94.5% were annotated. Approximately 10% of total gene number is expected to be secreted. The C. miyabeanus genome is rich in carbohydrate active enzymes, and harbors 187 small secreted peptides (SSPs and some fungal effector homologs. Detoxification systems were represented by a variety of enzymes that could offer protection against plant defense compounds. The non-ribosomal peptide synthetases and polyketide synthases (PKS present were common to other Cochliobolus species. Additionally, the fungal transcriptome was analyzed at 48 hours after inoculation in planta. A total of 10,674 genes were found to be expressed, some of which are known to be involved in pathogenicity or response to host defenses including hydrophobins, cutinase, cell wall degrading enzymes, enzymes related to reactive oxygen species scavenging, PKS, detoxification systems, SSPs, and a known fungal effector. This work will facilitate future research on C. miyabeanus pathogen-associated molecular patterns and effectors, and in the identification of their corresponding wildrice defense mechanisms.

  9. Random walk in genome space: A key ingredient of intermittent dynamics of community assembly on evolutionary time scales

    KAUST Repository

    Murase, Yohsuke

    2010-06-01

    Community assembly is studied using individual-based multispecies models. The models have stochastic population dynamics with mutation, migration, and extinction of species. Mutants appear as a result of mutation of the resident species, while migrants have no correlation with the resident species. It is found that the dynamics of community assembly with mutations are quite different from the case with migrations. In contrast to mutation models, which show intermittent dynamics of quasi-steady states interrupted by sudden reorganizations of the community, migration models show smooth and gradual renewal of the community. As a consequence, instead of the 1/f diversity fluctuations found for the mutation models, 1/f2, random-walk like fluctuations are observed for the migration models. In addition, a characteristic species-lifetime distribution is found: a power law that is cut off by a "skewed" distribution in the long-lifetime regime. The latter has a longer tail than a simple exponential function, which indicates an age-dependent species-mortality function. Since this characteristic profile has been observed, both in fossil data and in several other mathematical models, we conclude that it is a universal feature of macroevolution. © 2010 Elsevier Ltd.

  10. De novo assembly of mitochondrial genomes provides insights into genetic diversity and molecular evolution in wild boars and domestic pigs.

    Science.gov (United States)

    Ni, Pan; Bhuiyan, Ali Akbar; Chen, Jian-Hai; Li, Jingjin; Zhang, Cheng; Zhao, Shuhong; Du, Xiaoyong; Li, Hua; Yu, Hui; Liu, Xiangdong; Li, Kui

    2018-05-10

    Up to date, the scarcity of publicly available complete mitochondrial sequences for European wild pigs hampers deeper understanding about the genetic changes following domestication. Here, we have assembled 26 de novo mtDNA sequences of European wild boars from next generation sequencing (NGS) data and downloaded 174 complete mtDNA sequences to assess the genetic relationship, nucleotide diversity, and selection. The Bayesian consensus tree reveals the clear divergence between the European and Asian clade and a very small portion (10 out of 200 samples) of maternal introgression. The overall nucleotides diversities of the mtDNA sequences have been reduced following domestication. Interestingly, the selection efficiencies in both European and Asian domestic pigs are reduced, probably caused by changes in both selection constraints and maternal population size following domestication. This study suggests that de novo assembled mitogenomes can be a great boon to uncover the genetic turnover following domestication. Further investigation is warranted to include more samples from the ever-increasing amounts of NGS data to help us to better understand the process of domestication.

  11. Random walk in genome space: A key ingredient of intermittent dynamics of community assembly on evolutionary time scales

    KAUST Repository

    Murase, Yohsuke; Shimada, Takashi; Ito, Nobuyasu; Rikvold, Per Arne

    2010-01-01

    Community assembly is studied using individual-based multispecies models. The models have stochastic population dynamics with mutation, migration, and extinction of species. Mutants appear as a result of mutation of the resident species, while migrants have no correlation with the resident species. It is found that the dynamics of community assembly with mutations are quite different from the case with migrations. In contrast to mutation models, which show intermittent dynamics of quasi-steady states interrupted by sudden reorganizations of the community, migration models show smooth and gradual renewal of the community. As a consequence, instead of the 1/f diversity fluctuations found for the mutation models, 1/f2, random-walk like fluctuations are observed for the migration models. In addition, a characteristic species-lifetime distribution is found: a power law that is cut off by a "skewed" distribution in the long-lifetime regime. The latter has a longer tail than a simple exponential function, which indicates an age-dependent species-mortality function. Since this characteristic profile has been observed, both in fossil data and in several other mathematical models, we conclude that it is a universal feature of macroevolution. © 2010 Elsevier Ltd.

  12. Detection of a Usp-like gene in Calotropis procera plant from the de novo assembled genome contigs of the high-throughput sequencing dataset

    KAUST Repository

    Shokry, Ahmed M.

    2014-02-01

    The wild plant species Calotropis procera (C. procera) has many potential applications and beneficial uses in medicine, industry and ornamental field. It also represents an excellent source of genes for drought and salt tolerance. Genes encoding proteins that contain the conserved universal stress protein (USP) domain are known to provide organisms like bacteria, archaea, fungi, protozoa and plants with the ability to respond to a plethora of environmental stresses. However, information on the possible occurrence of Usp in C. procera is not available. In this study, we uncovered and characterized a one-class A Usp-like (UspA-like, NCBI accession No. KC954274) gene in this medicinal plant from the de novo assembled genome contigs of the high-throughput sequencing dataset. A number of GenBank accessions for Usp sequences were blasted with the recovered de novo assembled contigs. Homology modelling of the deduced amino acids (NCBI accession No. AGT02387) was further carried out using Swiss-Model, accessible via the EXPASY. Superimposition of C. procera USPA-like full sequence model on Thermus thermophilus USP UniProt protein (PDB accession No. Q5SJV7) was constructed using RasMol and Deep-View programs. The functional domains of the novel USPA-like amino acids sequence were identified from the NCBI conserved domain database (CDD) that provide insights into sequence structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, PRK, TIGRFAM). © 2014 Académie des sciences.

  13. Comprehensive profiling of retroviral integration sites using target enrichment methods from historical koala samples without an assembled reference genome

    Directory of Open Access Journals (Sweden)

    Pin Cui

    2016-03-01

    Full Text Available Background. Retroviral integration into the host germline results in permanent viral colonization of vertebrate genomes. The koala retrovirus (KoRV is currently invading the germline of the koala (Phascolarctos cinereus and provides a unique opportunity for studying retroviral endogenization. Previous analysis of KoRV integration patterns in modern koalas demonstrate that they share integration sites primarily if they are related, indicating that the process is currently driven by vertical transmission rather than infection. However, due to methodological challenges, KoRV integrations have not been comprehensively characterized. Results. To overcome these challenges, we applied and compared three target enrichment techniques coupled with next generation sequencing (NGS and a newly customized sequence-clustering based computational pipeline to determine the integration sites for 10 museum Queensland and New South Wales (NSW koala samples collected between the 1870s and late 1980s. A secondary aim of this study sought to identify common integration sites across modern and historical specimens by comparing our dataset to previously published studies. Several million sequences were processed, and the KoRV integration sites in each koala were characterized. Conclusions. Although the three enrichment methods each exhibited bias in integration site retrieval, a combination of two methods, Primer Extension Capture and hybridization capture is recommended for future studies on historical samples. Moreover, identification of integration sites shows that the proportion of integration sites shared between any two koalas is quite small.

  14. One bacterial cell, one complete genome.

    Directory of Open Access Journals (Sweden)

    Tanja Woyke

    2010-04-01

    Full Text Available While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated from the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200-900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA. Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs, indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.

  15. One Bacterial Cell, One Complete Genome

    Energy Technology Data Exchange (ETDEWEB)

    Woyke, Tanja; Tighe, Damon; Mavrommatis, Konstantinos; Clum, Alicia; Copeland, Alex; Schackwitz, Wendy; Lapidus, Alla; Wu, Dongying; McCutcheon, John P.; McDonald, Bradon R.; Moran, Nancy A.; Bristow, James; Cheng, Jan-Fang

    2010-04-26

    While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated from the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200?900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA). Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs), indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.

  16. De Novo Assembly and Genome Analyses of the Marine-Derived Scopulariopsis brevicaulis Strain LF580 Unravels Life-Style Traits and Anticancerous Scopularide Biosynthetic Gene Cluster.

    Science.gov (United States)

    Kumar, Abhishek; Henrissat, Bernard; Arvas, Mikko; Syed, Muhammad Fahad; Thieme, Nils; Benz, J Philipp; Sørensen, Jens Laurids; Record, Eric; Pöggeler, Stefanie; Kempken, Frank

    2015-01-01

    The marine-derived Scopulariopsis brevicaulis strain LF580 produces scopularides A and B, which have anticancerous properties. We carried out genome sequencing using three next-generation DNA sequencing methods. De novo hybrid assembly yielded 621 scaffolds with a total size of 32.2 Mb and 16298 putative gene models. We identified a large non-ribosomal peptide synthetase gene (nrps1) and supporting pks2 gene in the same biosynthetic gene cluster. This cluster and the genes within the cluster are functionally active as confirmed by RNA-Seq. Characterization of carbohydrate-active enzymes and major facilitator superfamily (MFS)-type transporters lead to postulate S. brevicaulis originated from a soil fungus, which came into contact with the marine sponge Tethya aurantium. This marine sponge seems to provide shelter to this fungus and micro-environment suitable for its survival in the ocean. This study also builds the platform for further investigations of the role of life-style and secondary metabolites from S. brevicaulis.

  17. Detection of an inversion in the Ty-2 region between S. lycopersicum and S. habrochaites by a combination of de novo genome assembly and BAC cloning.

    Science.gov (United States)

    Wolters, Anne-Marie A; Caro, Myluska; Dong, Shufang; Finkers, Richard; Gao, Jianchang; Visser, Richard G F; Wang, Xiaoxuan; Du, Yongchen; Bai, Yuling

    2015-10-01

    A chromosomal inversion associated with the tomato Ty - 2 gene for TYLCV resistance is the cause of severe suppression of recombination in a tomato Ty - 2 introgression line. Among tomato and its wild relatives inversions are often observed, which result in suppression of recombination. Such inversions hamper the transfer of important traits from a related species to the crop by introgression breeding. Suppression of recombination was reported for the TYLCV resistance gene, Ty-2, which has been introgressed in cultivated tomato (Solanum lycopersicum) from the wild relative S. habrochaites accession B6013. Ty-2 was mapped to a 300-kb region on the long arm of chromosome 11. The suppression of recombination in the Ty-2 region could be caused by chromosomal rearrangements in S. habrochaites compared with S. lycopersicum. With the aim of visualizing the genome structure of the Ty-2 region, we compared the draft de novo assembly of S. habrochaites accession LYC4 with the sequence of cultivated tomato ('Heinz'). Furthermore, using populations derived from intraspecific crosses of S. habrochaites accessions, the order of markers in the Ty-2 region was studied. Results showed the presence of an inversion of approximately 200 kb in the Ty-2 region when comparing S. lycopersicum and S. habrochaites. By sequencing a BAC clone from the Ty-2 introgression line, one inversion breakpoint was identified. Finally, the obtained results are discussed with respect to introgression breeding and the importance of a priori de novo sequencing of the species involved.

  18. Partial vs. integer electron transfer in molecular assemblies: On the importance of multideterminant theoretical description and the necessity to find a solution within DFT

    Energy Technology Data Exchange (ETDEWEB)

    Geskin, Victor; Cornil, Jérôme [Laboratory for Chemistry of Novel Materials, University of Mons, Place du Parc 20, B-7000 Mons (Belgium); Stadler, Robert [Department of Physical Chemistry, University of Vienna, Sensengasse 8/7, A-1090 Vienna (Austria)

    2015-01-22

    Nonequilibrium Green's function techniques (NEGF) combined with density functional theory (DFT) calculations have become a standard tool for the description of electron transport through single molecule nanojunctions in the coherent tunneling (CT) regime. However, the applicability of these methods for transport in the Coulomb blockade (CB) regime is questionable. For a molecular assembly model, with multideterminant calculations as a benchmark, we show how a closed-shell ansatz, the usual ingredient of mean-field methods, fails to properly describe the step like electron-transfer characteristic in weakly coupled systems. Detailed analysis of this misbehavior allows us to propose a practical scheme to extract the addition energies in the CB regime for single-molecule junctions from NEGF DFT within the local-density approximation (closed shell). We show also that electrostatic screening effects are taken into account within this simple approach.

  19. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based on transcr......The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...

  20. Partial Trisomy 16p (16p12.2→pter and Partial Monosomy 22q (22q13.31 →qter Presenting With Fetal Ascites and Ventriculomegaly: Prenatal Diagnosis and Array Comparative Genomic Hybridization Characterization

    Directory of Open Access Journals (Sweden)

    Chih-Ping Chen

    2010-12-01

    Conclusion: Partial trisomy 16p can be associated with fetal ascites and ventriculomegaly in the second trimester. Prenatal sonographic detection of fetal ascites in association with ventriculomegaly should alert chromosomal abnormalities and prompt cytogenetic investigation, which may lead to the identification of an unexpected parental translocation involving chromosomal segments associated with cerebral and vascular abnormalities.

  1. Building a model: developing genomic resources for common milkweed (Asclepias syriaca with low coverage genome sequencing

    Directory of Open Access Journals (Sweden)

    Weitemier Kevin

    2011-05-01

    Full Text Available Abstract Background Milkweeds (Asclepias L. have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L. could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp and 5S rDNA (120 bp sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp, with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae unigenes (median coverage of 0.29× and 66% of single copy orthologs (COSII in asterids (median coverage of 0.14×. From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites and phylogenetics (low-copy nuclear genes studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species

  2. Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

    Science.gov (United States)

    Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron

    2011-05-04

    Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first

  3. The mitochondrial genome of the Arizona Snowfly Mesocapnia arizonensis (Plecoptera, Capniidae).

    Science.gov (United States)

    Elbrecht, Vasco; Leese, Florian

    2016-09-01

    We assembled the mitochondrial genome of the capniid stonefly Mesocapnia arizonensis (Baumann & Gaufin, 1969) using Illumina HiSeq sequence data. The recovered mitogenome is 14,921 bp in length and includes 13 protein-coding genes, 2 ribosomal RNA genes and 22 transfer RNA genes. The control region could only be assembled partially. Gene order resembles that of basal arthropods. This is the first partial mitogenome sequence for the stonefly superfamily group Euholognatha and will be useful in future phylogenetic analyses.

  4. Hyb-Seq: Combining Target Enrichment and Genome Skimming for Plant Phylogenomics

    Directory of Open Access Journals (Sweden)

    Kevin Weitemier

    2014-08-01

    Full Text Available Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics.

  5. Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics1

    Science.gov (United States)

    Weitemier, Kevin; Straub, Shannon C. K.; Cronn, Richard C.; Fishbein, Mark; Schmickl, Roswitha; McDonnell, Angela; Liston, Aaron

    2014-01-01

    • Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca) were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. • Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics. PMID:25225629

  6. Fuel assembly

    International Nuclear Information System (INIS)

    Ueda, Makoto; Ogiya, Shunsuke.

    1989-01-01

    For improving the economy of a BWR type reactor by making the operation cycle longer, the fuel enrichment degree has to be increased further. However, this makes the subcriticality shallower in the upper portion of the reactor core, to bring about a possibility that the reactor shutdown becomes impossible. In the present invention, a portion of fuel rod is constituted as partial length fuel rods (P-fuel rods) in which the entire stack length in the effective portion is made shorter by reducing the concentration of fissionable materials in the axial portion. A plurality of moderator rods are disposed at least on one diagonal line of a fuel assembly and P-fuel rods are arranged at a position put between the moderator rods. This makes it possible to reactor shutdown and makes the axial power distribution satisfactory even if the fuel enrichment degree is increased. (T.M.)

  7. From plant genomes to phenotypes

    OpenAIRE

    Bolger, Marie; Gundlach, Heidrun; Scholz, Uwe; Mayer, Klaus; Usadel, Björn; Schwacke, Rainer; Schmutzer, Thomas; Chen, Jinbo; Arend, Daniel; Oppermann, Markus; Weise, Stephan; Lange, Matthias; Fiorani, Fabio; Spannagl, Manuel

    2017-01-01

    Recent advances in sequencing technologies have greatly accelerated the rate of plant genome and applied breeding research. Despite this advancing trend, plant genomes continue to present numerous difficulties to the standard tools and pipelines not only for genome assembly but also gene annotation and downstream analysis.Here we give a perspective on tools, resources and services necessary to assemble and analyze plant genomes and link them to plant phenotypes.

  8. Genome sequence of the olive tree, Olea europaea.

    Science.gov (United States)

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  9. Metagenome-Assembled Genome Sequences of Acetobacterium sp. Strain MES1 and Desulfovibrio sp. Strain MES5 from a Cathode-Associated Acetogenic Microbial Community.

    Science.gov (United States)

    Ross, Daniel E; Marshall, Christopher W; May, Harold D; Norman, R Sean

    2017-09-07

    Draft genome sequences of Acetobacterium sp. strain MES1 and Desulfovibrio sp. strain MES5 were obtained from the metagenome of a cathode-associated community enriched within a microbial electrosynthesis system (MES). The draft genome sequences provide insight into the functional potential of these microorganisms within an MES and a foundation for future comparative analyses. Copyright © 2017 Ross et al.

  10. Partial genomic structure, mutation analysis and mapping of the porcine inhibitor of DNA binding genes ID1, ID2, ID3 and ID4

    Czech Academy of Sciences Publication Activity Database

    Stratil, Antonín; Horák, Pavel; Filkuková, Jitka; Van Poucke, M.; Bartenschlager, H.; Peelman, L. J.; Geldermann, H.

    2010-01-01

    Roč. 41, - (2010), s. 558-559 ISSN 0268-9146 R&D Projects: GA ČR(CZ) GA523/06/1302; GA ČR GA523/09/0844 Institutional research plan: CEZ:AV0Z50450515 Keywords : genomic structure * muscle-specific genes * porcine Subject RIV: GI - Animal Husbandry ; Breeding Impact factor: 2.203, year: 2010

  11. Pure partial monosomy 3p (3p25.3 → pter: Prenatal diagnosis and array comparative genomic hybridization characterization

    Directory of Open Access Journals (Sweden)

    Chih-Ping Chen

    2012-09-01

    Conclusion: In this case, aCGH has characterized a 3p deleted region with haploinsufficiency of the neurodevelopmental genes associated with cognitive deficit and mental retardation but without involvement of the congenital heart disease susceptibility locus, and QF-PCR has determined a paternal origin of the deletion. aCGH and QF-PCR help to delineate the genomic imbalance in prenatally detected de novo chromosome aberration, and the information acquired is useful for genetic counseling.

  12. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.

    Science.gov (United States)

    Salzberg, Steven L; Dunning Hotopp, Julie C; Delcher, Arthur L; Pop, Mihai; Smith, Douglas R; Eisen, Michael B; Nelson, William C

    2005-01-01

    The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.

  13. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey

    Directory of Open Access Journals (Sweden)

    Varala Kranthi

    2007-05-01

    Full Text Available Abstract Background Extensive computational and database tools are available to mine genomic and genetic databases for model organisms, but little genomic data is available for many species of ecological or agricultural significance, especially those with large genomes. Genome surveys using conventional sequencing techniques are powerful, particularly for detecting sequences present in many copies per genome. However these methods are time-consuming and have potential drawbacks. High throughput 454 sequencing provides an alternative method by which much information can be gained quickly and cheaply from high-coverage surveys of genomic DNA. Results We sequenced 78 million base-pairs of randomly sheared soybean DNA which passed our quality criteria. Computational analysis of the survey sequences provided global information on the abundant repetitive sequences in soybean. The sequence was used to determine the copy number across regions of large genomic clones or contigs and discover higher-order structures within satellite repeats. We have created an annotated, online database of sequences present in multiple copies in the soybean genome. The low bias of pyrosequencing against repeat sequences is demonstrated by the overall composition of the survey data, which matches well with past estimates of repetitive DNA content obtained by DNA re-association kinetics (Cot analysis. Conclusion This approach provides a potential aid to conventional or shotgun genome assembly, by allowing rapid assessment of copy number in any clone or clone-end sequence. In addition, we show that partial sequencing can provide access to partial protein-coding sequences.

  14. Comparison of Normalized and Unnormalized Single Cell and Population Assemblies (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Hugenholtz, Phil

    2011-10-12

    University of Queensland's Phil Hugenholtz on "Comparison of Normalized and Unnormalized Single Cell and Population Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  15. Whole genome sequencing and assembly of Eukaryotic microbes isolated from ISS environmental surface Kirovograd region soil Chernobyl Nuclear Power Plant and Chernobyl Exclusion Zone

    Data.gov (United States)

    National Aeronautics and Space Administration — The whole-genome sequences of eight fungal strains that were selected for exposure to microgravity at the International Space Station are presented here. These...

  16. CRISPR Detection From Short Reads Using Partial Overlap Graphs.

    Science.gov (United States)

    Ben-Bassat, Ilan; Chor, Benny

    2016-06-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) are structured regions in bacterial and archaeal genomes, which are part of an adaptive immune system against phages. CRISPRs are important for many microbial studies and are playing an essential role in current gene editing techniques. As such, they attract substantial research interest. The exponential growth in the amount of bacterial sequence data in recent years enables the exploration of CRISPR loci in more and more species. Most of the automated tools that detect CRISPR loci rely on fully assembled genomes. However, many assemblers do not handle repetitive regions successfully. The first tool to work directly on raw sequence data is Crass, which requires reads that are long enough to contain two copies of the same repeat. We present a method to identify CRISPR repeats from raw sequence data of short reads. The algorithm is based on an observation differentiating CRISPR repeats from other types of repeats, and it involves a series of partial constructions of the overlap graph. This enables us to avoid many of the difficulties that assemblers face, as we merely aim to identify the repeats that belong to CRISPR loci. A preliminary implementation of the algorithm shows good results and detects CRISPR repeats in cases where other existing tools fail to do so.

  17. A high-quality genome assembly of quinoa provides insights into the molecular basis of salt bladder-based salinity tolerance and the exceptional nutritional value

    Science.gov (United States)

    Zou, Changsong; Chen, Aojun; Xiao, Lihong; Muller, Heike M; Ache, Peter; Haberer, Georg; Zhang, Meiling; Jia, Wei; Deng, Ping; Huang, Ru; Lang, Daniel; Li, Feng; Zhan, Dongliang; Wu, Xiangyun; Zhang, Hui; Bohm, Jennifer; Liu, Renyi; Shabala, Sergey; Hedrich, Rainer; Zhu, Jian-Kang; Zhang, Heng

    2017-01-01

    Chenopodium quinoa is a halophytic pseudocereal crop that is being cultivated in an ever-growing number of countries. Because quinoa is highly resistant to multiple abiotic stresses and its seed has a better nutritional value than any other major cereals, it is regarded as a future crop to ensure global food security. We generated a high-quality genome draft using an inbred line of the quinoa cultivar Real. The quinoa genome experienced one recent genome duplication about 4.3 million years ago, likely reflecting the genome fusion of two Chenopodium parents, in addition to the γ paleohexaploidization reported for most eudicots. The genome is highly repetitive (64.5% repeat content) and contains 54 438 protein-coding genes and 192 microRNA genes, with more than 99.3% having orthologous genes from glycophylic species. Stress tolerance in quinoa is associated with the expansion of genes involved in ion and nutrient transport, ABA homeostasis and signaling, and enhanced basal-level ABA responses. Epidermal salt bladder cells exhibit similar characteristics as trichomes, with a significantly higher expression of genes related to energy import and ABA biosynthesis compared with the leaf lamina. The quinoa genome sequence provides insights into its exceptional nutritional value and the evolution of halophytes, enabling the identification of genes involved in salinity tolerance, and providing the basis for molecular breeding in quinoa. PMID:28994416

  18. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

    Directory of Open Access Journals (Sweden)

    Feltus Frank A

    2011-07-01

    Full Text Available Abstract Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18 to duodecaploid (12X = 108. Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective. Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of

  19. RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing.

    Science.gov (United States)

    Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E

    2015-01-01

    Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.

  20. A gene-based high-resolution comparative radiation hybrid map as a framework for genome sequence assembly of a bovine chromosome 6 region associated with QTL for growth, body composition, and milk performance traits

    Directory of Open Access Journals (Sweden)

    Laurent Pascal

    2006-03-01

    Full Text Available Abstract Background A number of different quantitative trait loci (QTL for various phenotypic traits, including milk production, functional, and conformation traits in dairy cattle as well as growth and body composition traits in meat cattle, have been mapped consistently in the middle region of bovine chromosome 6 (BTA6. Dense genetic and physical maps and, ultimately, a fully annotated genome sequence as well as their mutual connections are required to efficiently identify genes and gene variants responsible for genetic variation of phenotypic traits. A comprehensive high-resolution gene-rich map linking densely spaced bovine markers and genes to the annotated human genome sequence is required as a framework to facilitate this approach for the region on BTA6 carrying the QTL. Results Therefore, we constructed a high-resolution radiation hybrid (RH map for the QTL containing chromosomal region of BTA6. This new RH map with a total of 234 loci including 115 genes and ESTs displays a substantial increase in loci density compared to existing physical BTA6 maps. Screening the available bovine genome sequence resources, a total of 73 loci could be assigned to sequence contigs, which were already identified as specific for BTA6. For 43 loci, corresponding sequence contigs, which were not yet placed on the bovine genome assembly, were identified. In addition, the improved potential of this high-resolution RH map for BTA6 with respect to comparative mapping was demonstrated. Mapping a large number of genes on BTA6 and cross-referencing them with map locations in corresponding syntenic multi-species chromosome segments (human, mouse, rat, dog, chicken achieved a refined accurate alignment of conserved segments and evolutionary breakpoints across the species included. Conclusion The gene-anchored high-resolution RH map (1 locus/300 kb for the targeted region of BTA6 presented here will provide a valuable platform to guide high-quality assembling and

  1. Cephalopod genomics

    DEFF Research Database (Denmark)

    Albertin, Caroline B.; Bonnaud, Laure; Brown, C. Titus

    2012-01-01

    The Cephalopod Sequencing Consortium (CephSeq Consortium) was established at a NESCent Catalysis Group Meeting, ``Paths to Cephalopod Genomics-Strategies, Choices, Organization,'' held in Durham, North Carolina, USA on May 24-27, 2012. Twenty-eight participants representing nine countries (Austria......, Australia, China, Denmark, France, Italy, Japan, Spain and the USA) met to address the pressing need for genome sequencing of cephalopod mollusks. This group, drawn from cephalopod biologists, neuroscientists, developmental and evolutionary biologists, materials scientists, bioinformaticians and researchers...... active in sequencing, assembling and annotating genomes, agreed on a set of cephalopod species of particular importance for initial sequencing and developed strategies and an organization (CephSeq Consortium) to promote this sequencing. The conclusions and recommendations of this meeting are described...

  2. Re-annotation of the physical map of Glycine max for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly

    Directory of Open Access Journals (Sweden)

    Shultz Jeffry

    2008-07-01

    Full Text Available Abstract Background Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (Glycine max genome has been shown to be composed of approximately four thousand short interspersed homeologous regions with 1, 2 or 4 copies per haploid genome by RFLP analysis, microsatellite anchors to BACs and by contigs formed from BAC fingerprints. Despite these similar regions,, the genome has been sequenced by whole genome shotgun sequence (WGS. Here the aim was to use BAC end sequences (BES derived from three minimum tile paths (MTP to examine the extent and homogeneity of polyploid-like regions within contigs and the extent of correlation between the polyploid-like regions inferred from fingerprinting and the polyploid-like sequences inferred from WGS matches. Results Results show that when sequence divergence was 1–10%, the copy number of homeologous regions could be identified from sequence variation in WGS reads overlapping BES. Homeolog sequence variants (HSVs were single nucleotide polymorphisms (SNPs; 89% and single nucleotide indels (SNIs 10%. Larger indels were rare but present (1%. Simulations that had predicted fingerprints of homeologous regions could be separated when divergence exceeded 2% were shown to be false. We show that a 5–10% sequence divergence is necessary to separate homeologs by fingerprinting. BES compared to WGS traces showed polyploid-like regions with less than 1% sequence divergence exist at 2.3% of the locations assayed. Conclusion The use of HSVs like SNPs and SNIs to characterize BACs wil improve contig building methods. The implications for bioinformatic and functional annotation of polyploid and paleopolyploid genomes show that a combined approach of BAC fingerprint based physical maps, WGS sequence and HSV-based partitioning of BAC clones from homeologous regions to separate contigs will allow reliable de

  3. Bioinformatics decoding the genome

    CERN Multimedia

    CERN. Geneva; Deutsch, Sam; Michielin, Olivier; Thomas, Arthur; Descombes, Patrick

    2006-01-01

    Extracting the fundamental genomic sequence from the DNA From Genome to Sequence : Biology in the early 21st century has been radically transformed by the availability of the full genome sequences of an ever increasing number of life forms, from bacteria to major crop plants and to humans. The lecture will concentrate on the computational challenges associated with the production, storage and analysis of genome sequence data, with an emphasis on mammalian genomes. The quality and usability of genome sequences is increasingly conditioned by the careful integration of strategies for data collection and computational analysis, from the construction of maps and libraries to the assembly of raw data into sequence contigs and chromosome-sized scaffolds. Once the sequence is assembled, a major challenge is the mapping of biologically relevant information onto this sequence: promoters, introns and exons of protein-encoding genes, regulatory elements, functional RNAs, pseudogenes, transposons, etc. The methodological ...

  4. Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Stepanauskas, Ramunas

    2011-10-13

    DOE JGI's Tanja Woyke, chair of the Single Cells and Metagenomes session, delivers an introduction, followed by Bigelow Laboratory's Ramunas Stepanauskas on "Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  5. DOE JGI Quality Metrics; Approaches to Scaling and Improving Metagenome Assembly (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    Energy Technology Data Exchange (ETDEWEB)

    Copeland, Alex; Brown, C. Titus

    2011-10-13

    DOE JGI's Alex Copeland on "DOE JGI Quality Metrics" and Michigan State University's C. Titus Brown on "Approaches to Scaling and Improving Metagenome Assembly" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  6. A specific pattern of splicing for the horse αS1-Casein mRNA and partial genomic characterization of the relevant locus

    Directory of Open Access Journals (Sweden)

    Guérin Gérard

    2002-07-01

    Full Text Available Abstract Mares' milk has a composition very different from that of cows' milk. It is much more similar to human milk, in particular in its casein fraction. This study reports on the sequence of a 994 bp amplified fragment corresponding to a horse αS1-Casein (αS1-Cn cDNA and its comparison with its caprine, pig, rabbit and human counterparts. The alignment of these sequences revealed a specific pattern of splicing for this horse primary transcript. As in humans, exons 3', 6' and 13' are present whereas exons 5, 13 and 14 are absent in this equine mRNA sequence. BAC clones, screened from a horse BAC library, containing the αS1-Cn gene allowed the mapping of its locus by FISH on equine chromosome 3q22.2-q22.3 which is in agreement with the Zoo-FISH results. Genomic analysis of the αS1-Cn gene showed that the region from the second exon to the last exon is scattered within a nucleotide stretch nearly 15-kb in length which is quite similar in size to its ruminant and rabbit counterparts. The region between αS1- and β-Cn genes, suspected to contain cis-acting elements involved in the expression of all clustered casein genes, is similar in size (ca. 15-kb to the caprine and mouse intergenic region.

  7. Genome packaging in viruses

    OpenAIRE

    Sun, Siyang; Rao, Venigalla B.; Rossmann, Michael G.

    2010-01-01

    Genome packaging is a fundamental process in a viral life cycle. Many viruses assemble preformed capsids into which the genomic material is subsequently packaged. These viruses use a packaging motor protein that is driven by the hydrolysis of ATP to condense the nucleic acids into a confined space. How these motor proteins package viral genomes had been poorly understood until recently, when a few X-ray crystal structures and cryo-electron microscopy structures became available. Here we discu...

  8. Genome assembly of Chryseobacterium sp. strain IHBB 10212 from glacier top-surface soil in the Indian trans-Himalayas with potential for hydrolytic enzymes

    Directory of Open Access Journals (Sweden)

    Mohinder Pal

    2017-09-01

    Full Text Available The cold-active esterases are gaining importance due to their catalytic activities finding applications in chemical industry, food processes and detergent industry as additives, and organic synthesis of unstable compounds as catalysts. In the present study, the complete genome sequence of 4,843,645 bp with an average 34.08% G + C content and 4260 protein-coding genes are reported for the low temperature-active esterase-producing novel strain of Chrysobacterium isolated from the top-surface soil of a glacier in the cold deserts of the Indian trans-Himalayas. The genome contained two plasmids of 16,553 and 11,450 bp with 40.54 and 40.37% G + C contents, respectively. Several genes encoding the hydrolysis of ester linkages of triglycerides into fatty acids and glycerol were predicted in the genome. The annotation also predicted the genes encoding proteases, lipases, amylases, β-glucosidases, endoglucanases and xylanases involved in biotechnological processes. The complete genome sequence of Chryseobacterium sp. strain IHBB 10212 and two plasmids have been deposited vide accession numbers CP015199, CP015200 and CP015201 at DDBJ/EMBL/GenBank.

  9. Whole-genome profiling and shotgun sequencing delivers an anchored, gene-decorated, physical map assembly of bread wheat chromosome 6A

    Czech Academy of Sciences Publication Activity Database

    Poursarebani, N.; Nussbaumer, T.; Šimková, Hana; Šafář, Jan; Witsenboer, H.; van Oeveren, J.; Doležel, Jaroslav; Mayer, K. F. X.; Stein, N.; Schnurbusch, T.

    2014-01-01

    Roč. 79, č. 2 (2014), s. 334-347 ISSN 0960-7412 Institutional support: RVO:61389030 Keywords : bread wheat chromosome 6A * whole-genome profiling * LINEAR TOPOLOGICAL CONTIGS Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 5.972, year: 2014

  10. Assembly of the Genome of the Disease Vector Aedes aegypti onto a Genetic Linkage Map Allows Mapping of Genes Affecting Disease Transmission

    KAUST Repository

    Juneja, Punita; Osei-Poku, Jewelna; Ho, Yung S.; Ariani, Cristina V.; Palmer, William J.; Pain, Arnab; Jiggins, Francis M.

    2014-01-01

    between two strains of Ae. aegypti, and used these to generate a genetic map. This revealed a high rate of misassemblies in the current genome, where, for example, sequences from different chromosomes were found on the same scaffold. Once these were

  11. Accurate Dna Assembly And Direct Genome Integration With Optimized Uracil Excision Cloning To Facilitate Engineering Of Escherichia Coli As A Cell Factory

    DEFF Research Database (Denmark)

    Cavaleiro, Mafalda; Kim, Se Hyeuk; Nørholm, Morten

    2015-01-01

    Plants produce a vast diversity of valuable compounds with medical properties, but these are often difficult to purify from the natural source or produce by organic synthesis. An alternative is to transfer the biosynthetic pathways to an efficient production host like the bacterium Escherichia co......-excision-based cloning and combining it with a genome-engineering approach to allow direct integration of whole metabolic pathways into the genome of E. coli, to facilitate the advanced engineering of cell factories........ Cloning and heterologous gene expression are major bottlenecks in the metabolic engineering field. We are working on standardizing DNA vector design processes to promote automation and collaborations in early phase metabolic engineering projects. Here, we focus on optimizing the already established uracil...

  12. Draft genome of neurotropic nematode parasite Angiostrongylus cantonensis, causative agent of human eosinophilic meningitis.

    Science.gov (United States)

    Yong, Hoi-Sen; Eamsobhana, Praphathip; Lim, Phaik-Eem; Razali, Rozaimi; Aziz, Farhanah Abdul; Rosli, Nurul Shielawati Mohamed; Poole-Johnson, Johan; Anwar, Arif

    2015-08-01

    Angiostrongylus cantonensis is a bursate nematode parasite that causes eosinophilic meningitis (or meningoencephalitis) in humans in many parts of the world. The genomic data from A. cantonensis will form a useful resource for comparative genomic and chemogenomic studies to aid the development of diagnostics and therapeutics. We have sequenced, assembled and annotated the genome of A. cantonensis. The genome size is estimated to be ∼260 Mb, with 17,280 genomic scaffolds, 91X coverage, 81.45% for complete and 93.95% for partial score based on CEGMA analysis of genome completeness. The number of predicted genes of ≥300 bp was 17,482. A total of 7737 predicted protein-coding genes of ≥50 amino acids were identified in the assembled genome. Among the proteins of known function, kinases are the most abundant followed by transferases. The draft genome contains 34 excretory-secretory proteins (ES), a minimum of 44 Nematode Astacin (NAS) metalloproteases, 12 Homeobox (HOX) genes, and 30 neurotransmitters. The assembled genome size (260 Mb) is larger than those of Pristionchus pacificus, Caenorhabditis elegans, Necator americanus, Caenorhabditis briggsae, Trichinella spiralis, Brugia malayi and Loa loa, but smaller than Haemonchus contortus and Ascaris suum. The repeat content (25%) is similar to H. contortus. The GC content (41.17%) is lower compared to P. pacificus (42.7%) and H. contortus (43.1%) but higher compared to C. briggsae (37.69%), A. suum (37.9%) and N. americanus (40.2%) while the scaffold N50 is 42,191. This draft genome will facilitate the understanding of many unresolved issues on the parasite and the disorder it causes. Copyright © 2015 Elsevier B.V. All rights reserved.

  13. Partial Cancellation

    Indian Academy of Sciences (India)

    First page Back Continue Last page Overview Graphics. Partial Cancellation. Full Cancellation is desirable. But complexity requirements are enormous. 4000 tones, 100 Users billions of flops !!! Main Idea: Challenge: To determine which cross-talker to cancel on what “tone” for a given victim. Constraint: Total complexity is ...

  14. Bacteriophage Assembly

    Directory of Open Access Journals (Sweden)

    Anastasia A. Aksyuk

    2011-02-01

    Full Text Available Bacteriophages have been a model system to study assembly processes for over half a century. Formation of infectious phage particles involves specific protein-protein and protein-nucleic acid interactions, as well as large conformational changes of assembly precursors. The sequence and molecular mechanisms of phage assembly have been elucidated by a variety of methods. Differences and similarities of assembly processes in several different groups of bacteriophages are discussed in this review. The general principles of phage assembly are applicable to many macromolecular complexes.

  15. Fuel assemblies

    International Nuclear Information System (INIS)

    Nakatsuka, Masafumi.

    1979-01-01

    Purpose: To prevent scattering of gaseous fission products released from fuel assemblies stored in an fbr type reactor. Constitution; A cap provided with means capable of storing gas is adapted to amount to the assembly handling head, for example, by way of threading in a storage rack of spent fuel assemblies consisting of a bottom plate, a top plate and an assembly support mechanism. By previously eliminating the gas inside of the assembly and the cap in the storage rack, gaseous fission products upon loading, if released from fuel rods during storage, are stored in the cap and do not scatter in the storage rack. (Horiuchi, T.)

  16. Polymer Directed Protein Assemblies

    Directory of Open Access Journals (Sweden)

    Patrick van Rijn

    2013-05-01

    Full Text Available Protein aggregation and protein self-assembly is an important occurrence in natural systems, and is in some form or other dictated by biopolymers. Very obvious influences of biopolymers on protein assemblies are, e.g., virus particles. Viruses are a multi-protein assembly of which the morphology is dictated by poly-nucleotides namely RNA or DNA. This “biopolymer” directs the proteins and imposes limitations on the structure like the length or diameter of the particle. Not only do these bionanoparticles use polymer-directed self-assembly, also processes like amyloid formation are in a way a result of directed protein assembly by partial unfolded/misfolded biopolymers namely, polypeptides. The combination of proteins and synthetic polymers, inspired by the natural processes, are therefore regarded as a highly promising area of research. Directed protein assembly is versatile with respect to the possible interactions which brings together the protein and polymer, e.g., electrostatic, v.d. Waals forces or covalent conjugation, and possible combinations are numerous due to the large amounts of different polymers and proteins available. The protein-polymer interacting behavior and overall morphology is envisioned to aid in clarifying protein-protein interactions and are thought to entail some interesting new functions and properties which will ultimately lead to novel bio-hybrid materials.

  17. Partial processing

    International Nuclear Information System (INIS)

    1978-11-01

    This discussion paper considers the possibility of applying to the recycle of plutonium in thermal reactors a particular method of partial processing based on the PUREX process but named CIVEX to emphasise the differences. The CIVEX process is based primarily on the retention of short-lived fission products. The paper suggests: (1) the recycle of fission products with uranium and plutonium in thermal reactor fuel would be technically feasible; (2) it would, however, take ten years or more to develop the CIVEX process to the point where it could be launched on a commercial scale; (3) since the majority of spent fuel to be reprocessed this century will have been in storage for ten years or more, the recycling of short-lived fission products with the U-Pu would not provide an effective means of making refabrication fuel ''inaccessible'' because the radioactivity associated with the fission products would have decayed. There would therefore be no advantage in partial processing

  18. Partial gigantism

    Directory of Open Access Journals (Sweden)

    М.М. Karimova

    2017-05-01

    Full Text Available A girl with partial gigantism (the increased I and II fingers of the left foot is being examined. This condition is a rare and unresolved problem, as the definite reason of its development is not determined. Wait-and-see strategy is recommended, as well as correcting operations after closing of growth zones, and forming of data pool for generalization and development of schemes of drug and radial therapeutic methods.

  19. Drive piston assembly for a valve actuator assembly

    Science.gov (United States)

    Sun, Zongxuan

    2010-02-23

    A drive piston assembly is provided that is operable to selectively open a poppet valve. The drive piston assembly includes a cartridge defining a generally stepped bore. A drive piston is movable within the generally stepped bore and a boost sleeve is coaxially disposed with respect to the drive piston. A main fluid chamber is at least partially defined by the generally stepped bore, drive piston, and boost sleeve. First and second feedback chambers are at least partially defined by the drive piston and each are disposed at opposite ends of the drive piston. At least one of the drive piston and the boost sleeve is sufficiently configured to move within the generally stepped bore in response to fluid pressure within the main fluid chamber to selectively open the poppet valve. A valve actuator assembly and engine are also provided incorporating the disclosed drive piston assembly.

  20. When the genome plays dice: circumvention of the spindle assembly checkpoint and near-random chromosome segregation in multipolar cancer cell mitoses.

    Science.gov (United States)

    Gisselsson, David; Håkanson, Ulf; Stoller, Patrick; Marti, Dominik; Jin, Yuesheng; Rosengren, Anders H; Stewénius, Ylva; Kahl, Fredrik; Panagopoulos, Ioannis

    2008-04-02

    Normal cell division is coordinated by a bipolar mitotic spindle, ensuring symmetrical segregation of chromosomes. Cancer cells, however, occasionally divide into three or more directions. Such multipolar mitoses have been proposed to generate genetic diversity and thereby contribute to clonal evolution. However, this notion has been little validated experimentally. Chromosome segregation and DNA content in daughter cells from multipolar mitoses were assessed by multiphoton cross sectioning and fluorescence in situ hybridization in cancer cells and non-neoplastic transformed cells. The DNA distribution resulting from multipolar cell division was found to be highly variable, with frequent nullisomies in the daughter cells. Time-lapse imaging of H2B/GFP-labelled multipolar mitoses revealed that the time from the initiation of metaphase to the beginning of anaphase was prolonged and that the metaphase plates often switched polarity several times before metaphase-anaphase transition. The multipolar metaphase-anaphase transition was accompanied by a normal reduction of cellular cyclin B levels, but typically occurred before completion of the normal separase activity cycle. Centromeric AURKB and MAD2 foci were observed frequently to remain on the centromeres of multipolar ana-telophase chromosomes, indicating that multipolar mitoses were able to circumvent the spindle assembly checkpoint with some sister chromatids remaining unseparated after anaphase. Accordingly, scoring the distribution of individual chromosomes in multipolar daughter nuclei revealed a high frequency of nondisjunction events, resulting in a near-binomial allotment of sister chromatids to the daughter cells. The capability of multipolar mitoses to circumvent the spindle assembly checkpoint system typically results in a near-random distribution of chromosomes to daughter cells. Spindle multipolarity could thus be a highly efficient generator of genetically diverse minority clones in transformed cell

  1. When the genome plays dice: circumvention of the spindle assembly checkpoint and near-random chromosome segregation in multipolar cancer cell mitoses.

    Directory of Open Access Journals (Sweden)

    David Gisselsson

    Full Text Available BACKGROUND: Normal cell division is coordinated by a bipolar mitotic spindle, ensuring symmetrical segregation of chromosomes. Cancer cells, however, occasionally divide into three or more directions. Such multipolar mitoses have been proposed to generate genetic diversity and thereby contribute to clonal evolution. However, this notion has been little validated experimentally. PRINCIPAL FINDINGS: Chromosome segregation and DNA content in daughter cells from multipolar mitoses were assessed by multiphoton cross sectioning and fluorescence in situ hybridization in cancer cells and non-neoplastic transformed cells. The DNA distribution resulting from multipolar cell division was found to be highly variable, with frequent nullisomies in the daughter cells. Time-lapse imaging of H2B/GFP-labelled multipolar mitoses revealed that the time from the initiation of metaphase to the beginning of anaphase was prolonged and that the metaphase plates often switched polarity several times before metaphase-anaphase transition. The multipolar metaphase-anaphase transition was accompanied by a normal reduction of cellular cyclin B levels, but typically occurred before completion of the normal separase activity cycle. Centromeric AURKB and MAD2 foci were observed frequently to remain on the centromeres of multipolar ana-telophase chromosomes, indicating that multipolar mitoses were able to circumvent the spindle assembly checkpoint with some sister chromatids remaining unseparated after anaphase. Accordingly, scoring the distribution of individual chromosomes in multipolar daughter nuclei revealed a high frequency of nondisjunction events, resulting in a near-binomial allotment of sister chromatids to the daughter cells. CONCLUSION: The capability of multipolar mitoses to circumvent the spindle assembly checkpoint system typically results in a near-random distribution of chromosomes to daughter cells. Spindle multipolarity could thus be a highly efficient

  2. Short and long-term genome stability analysis of prokaryotic genomes.

    Science.gov (United States)

    Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France

    2013-05-08

    able to explore genome organization stability at different time-scales and to find significant differences for pathogen and non-pathogen species. The output of our framework also allows to identify the conserved gene clusters and/or partial occurrences thereof, making possible to explore how gene clusters assembled during evolution.

  3. Draft genome sequence of Micrococcus luteus strain O'Kane implicates metabolic versatility and the potential to degrade polyhydroxybutyrates.

    Science.gov (United States)

    Hanafy, Radwa A; Couger, M B; Baker, Kristina; Murphy, Chelsea; O'Kane, Shannon D; Budd, Connie; French, Donald P; Hoff, Wouter D; Youssef, Noha

    2016-09-01

    Micrococcus luteus is a predominant member of skin microbiome. We here report on the genomic analysis of Micrococcus luteus strain O'Kane that was isolated from an elevator. The partial genome assembly of Micrococcus luteus strain O'Kane is 2.5 Mb with 2256 protein-coding genes and 62 RNA genes. Genomic analysis revealed metabolic versatility with genes involved in the metabolism and transport of glucose, galactose, fructose, mannose, alanine, aspartate, asparagine, glutamate, glutamine, glycine, serine, cysteine, methionine, arginine, proline, histidine, phenylalanine, and fatty acids. Genomic comparison to other M. luteus representatives identified the potential to degrade polyhydroxybutyrates, as well as several antibiotic resistance genes absent from other genomes.

  4. Draft genome sequence of Micrococcus luteus strain O'Kane implicates metabolic versatility and the potential to degrade polyhydroxybutyrates

    Directory of Open Access Journals (Sweden)

    Radwa A. Hanafy

    2016-09-01

    Full Text Available Micrococcus luteus is a predominant member of skin microbiome. We here report on the genomic analysis of Micrococcus luteus strain O'Kane that was isolated from an elevator. The partial genome assembly of Micrococcus luteus strain O'Kane is 2.5 Mb with 2256 protein-coding genes and 62 RNA genes. Genomic analysis revealed metabolic versatility with genes involved in the metabolism and transport of glucose, galactose, fructose, mannose, alanine, aspartate, asparagine, glutamate, glutamine, glycine, serine, cysteine, methionine, arginine, proline, histidine, phenylalanine, and fatty acids. Genomic comparison to other M. luteus representatives identified the potential to degrade polyhydroxybutyrates, as well as several antibiotic resistance genes absent from other genomes.

  5. Fuel assembly inspection device

    International Nuclear Information System (INIS)

    Yaginuma, Yoshitaka

    1998-01-01

    The present invention provides a device suitable to inspect appearance of fuel assemblies by photographing the appearance of fuel assemblies. Namely, the inspection device of the present invention measures bowing of fuel assembly or each of fuel rods or both of them based on the partially photographed images of fuel assembly. In this case, there is disposed a means which flashily projects images in the form of horizontal line from a direction intersecting obliquely relative to a horizontal cross section of the fuel assembly. A first image processing means separates the projected image pictures including projected images and calculates bowing. A second image processing means replaces the projected image pictures of the projected images based on projected images just before and after the photographing. Then, images for the measurement of bowing and images for inspection can be obtained simultaneously. As a result, the time required for the photographing can be shortened, the time for inspection can be shortened and an effect of preventing deterioration of photographing means by radiation rays can be provided. (I.S.)

  6. SCHEMA computational design of virus capsid chimeras: calibrating how genome packaging, protection, and transduction correlate with calculated structural disruption.

    Science.gov (United States)

    Ho, Michelle L; Adler, Benjamin A; Torre, Michael L; Silberg, Jonathan J; Suh, Junghae

    2013-12-20

    Adeno-associated virus (AAV) recombination can result in chimeric capsid protein subunits whose ability to assemble into an oligomeric capsid, package a genome, and transduce cells depends on the inheritance of sequence from different AAV parents. To develop quantitative design principles for guiding site-directed recombination of AAV capsids, we have examined how capsid structural perturbations predicted by the SCHEMA algorithm correlate with experimental measurements of disruption in seventeen chimeric capsid proteins. In our small chimera population, created by recombining AAV serotypes 2 and 4, we found that protection of viral genomes and cellular transduction were inversely related to calculated disruption of the capsid structure. Interestingly, however, we did not observe a correlation between genome packaging and calculated structural disruption; a majority of the chimeric capsid proteins formed at least partially assembled capsids and more than half packaged genomes, including those with the highest SCHEMA disruption. These results suggest that the sequence space accessed by recombination of divergent AAV serotypes is rich in capsid chimeras that assemble into 60-mer capsids and package viral genomes. Overall, the SCHEMA algorithm may be useful for delineating quantitative design principles to guide the creation of libraries enriched in genome-protecting virus nanoparticles that can effectively transduce cells. Such improvements to the virus design process may help advance not only gene therapy applications but also other bionanotechnologies dependent upon the development of viruses with new sequences and functions.

  7. VirSorter: mining viral signal from microbial genomic data

    Science.gov (United States)

    Roux, Simon; Enault, Francois; Hurwitz, Bonnie L.

    2015-01-01

    Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the i

  8. VirSorter: mining viral signal from microbial genomic data

    Directory of Open Access Journals (Sweden)

    Simon Roux

    2015-05-01

    Full Text Available Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome, new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages. Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made

  9. Genome Writing: Current Progress and Related Applications

    Directory of Open Access Journals (Sweden)

    Yueqiang Wang

    2018-02-01

    Full Text Available The ultimate goal of synthetic biology is to build customized cells or organisms to meet specific industrial or medical needs. The most important part of the customized cell is a synthetic genome. Advanced genomic writing technologies are required to build such an artificial genome. Recently, the partially-completed synthetic yeast genome project represents a milestone in this field. In this mini review, we briefly introduce the techniques for de novo genome synthesis and genome editing. Furthermore, we summarize recent research progresses and highlight several applications in the synthetic genome field. Finally, we discuss current challenges and future prospects. Keywords: Synthetic biology, Genome writing, Genome editing, Bioethics, Biosafety

  10. Subtype-independent near full-length HIV-1 genome sequencing and assembly to be used in large molecular epidemiological studies and clinical management.

    Science.gov (United States)

    Grossmann, Sebastian; Nowak, Piotr; Neogi, Ujjwal

    2015-01-01

    HIV-1 near full-length genome (HIV-NFLG) sequencing from plasma is an attractive multidimensional tool to apply in large-scale population-based molecular epidemiological studies. It also enables genotypic resistance testing (GRT) for all drug target sites allowing effective intervention strategies for control and prevention in high-risk population groups. Thus, the main objective of this study was to develop a simplified subtype-independent, cost- and labour-efficient HIV-NFLG protocol that can be used in clinical management as well as in molecular epidemiological studies. Plasma samples (n=30) were obtained from HIV-1B (n=10), HIV-1C (n=10), CRF01_AE (n=5) and CRF01_AG (n=5) infected individuals with minimum viral load >1120 copies/ml. The amplification was performed with two large amplicons of 5.5 kb and 3.7 kb, sequenced with 17 primers to obtain HIV-NFLG. GRT was validated against ViroSeq™ HIV-1 Genotyping System. After excluding four plasma samples with low-quality RNA, a total of 26 samples were attempted. Among them, NFLG was obtained from 24 (92%) samples with the lowest viral load being 3000 copies/ml. High (>99%) concordance was observed between HIV-NFLG and ViroSeq™ when determining the drug resistance mutations (DRMs). The N384I connection mutation was additionally detected by NFLG in two samples. Our high efficiency subtype-independent HIV-NFLG is a simple and promising approach to be used in large-scale molecular epidemiological studies. It will facilitate the understanding of the HIV-1 pandemic population dynamics and outline effective intervention strategies. Furthermore, it can potentially be applicable in clinical management of drug resistance by evaluating DRMs against all available antiretrovirals in a single assay.

  11. Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life.

    Science.gov (United States)

    Brown, Christopher T; Sharon, Itai; Thomas, Brian C; Castelle, Cindy J; Morowitz, Michael J; Banfield, Jillian F

    2013-12-17

    The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community.We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation.During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences

  12. Cocoa/Cotton Comparative Genomics

    Science.gov (United States)

    With genome sequence from two members of the Malvaceae family recently made available, we are exploring syntenic relationships, gene content, and evolutionary trajectories between the cacao and cotton genomes. An assembly of cacao (Theobroma cacao) using Illumina and 454 sequence technology yielded ...

  13. The wolf reference genome sequence (Canis lupus lupus) and its implications for Canis spp. population genomics

    DEFF Research Database (Denmark)

    Gopalakrishnan, Shyam; Samaniego Castruita, Jose Alfredo; Sinding, Mikkel Holger Strander

    2017-01-01

    Background An increasing number of studies are addressing the evolutionary genomics of dog domestication, principally through resequencing dog, wolf and related canid genomes. There is, however, only one de novo assembled canid genome currently available against which to map such data - that of a......Background An increasing number of studies are addressing the evolutionary genomics of dog domestication, principally through resequencing dog, wolf and related canid genomes. There is, however, only one de novo assembled canid genome currently available against which to map such data...

  14. Fuel assembly

    International Nuclear Information System (INIS)

    Abe, Hideaki; Sakai, Takao; Ishida, Tomio; Yokota, Norikatsu.

    1992-01-01

    The lower ends of a plurality of plate-like shape memory alloys are secured at the periphery of the upper inside of the handling head of a fuel assembly. As the shape memory alloy, a Cu-Zn alloy, a Ti-Pd alloy or a Fe-Ni alloy is used. When high temperature coolants flow out to the handling head, the shape memory alloy deforms by warping to the outer side more greatly toward the upper portion thereof with the temperature increase of the coolants. As the result, the shape of the flow channel of the coolants is changed so as to enlarge at the exit of the upper end of the fuel assembly. Then, the pressure loss of the coolants in the fuel assembly is decreased by the enlargement. Accordingly, the flow rate of the coolants in the fuel assembly is increased to lower the temperature of the coolants. Further, high temperature coolants and low temperature coolants are mixed sufficiently just above the fuel assembly. This can suppress the temperature fluctuation of the mixed coolants in the upper portion of the reactor core, thereby enabling to decrease a fatigue and failures of the structural components in the upper portion of the reactor core. (I.N.)

  15. Exploration of Metagenome Assemblies with an Interactive Visualization Tool

    Energy Technology Data Exchange (ETDEWEB)

    Cantor, Michael; Nordberg, Henrik; Smirnova, Tatyana; Andersen, Evan; Tringe, Susannah; Hess, Matthias; Dubchak, Inna

    2014-07-09

    Metagenomics, one of the fastest growing areas of modern genomic science, is the genetic profiling of the entire community of microbial organisms present in an environmental sample. Elviz is a web-based tool for the interactive exploration of metagenome assemblies. Elviz can be used with publicly available data sets from the Joint Genome Institute or with custom user-loaded assemblies. Elviz is available at genome.jgi.doe.gov/viz

  16. Fuel assembly

    International Nuclear Information System (INIS)

    Nakatsuka, Masafumi; Matsuzuka, Ryuji.

    1976-01-01

    Object: To provide a fuel assembly which can decrease pressure loss of coolant to uniform temperature. Structure: A sectional area of a flow passage in the vicinity of an inner peripheral surface of a wrapper tube is limited over the entire length to prevent the temperature of a fuel element in the outermost peripheral portion from being excessively decreased to thereby flatten temperature distribution. To this end, a plurality of pincture-frame-like sheet metals constituting a spacer for supporting a fuel assembly, which has a plurality of fuel elements planted lengthwise and in given spaced relation within the wrapper tube, is disposed in longitudinal grooves and in stacked fashion to form a substantially honeycomb-like space in cross section. The fuel elements are inserted and supported in the space to form a fuel assembly. (Kamimura, M.)

  17. Fuel assemblies

    International Nuclear Information System (INIS)

    Nagano, Mamoru; Yoshioka, Ritsuo

    1983-01-01

    Purpose: To effectively utilize nuclear fuels by increasing the reactivity of a fuel assembly and reduce the concentration at the central region thereof upon completion of the burning. Constitution: A fuel assembly is bisected into a central region and a peripheral region by disposing an inner channel box within a channel box. The flow rate of coolants passing through the central region is made greater than that in the peripheral region. The concentration of uranium 235 of the fuel rods in the central region is made higher. In such a structure, since the moderating effect in the central region is improved, the reactivity of the fuel assembly is increased and the uranium concentration in the central region upon completion of the burning can be reduced, fuel economy and effective utilization of uranium can be attained. (Kamimura, M.)

  18. Meta-IDBA: a de Novo assembler for metagenomic data.

    Science.gov (United States)

    Peng, Yu; Leung, Henry C M; Yiu, S M; Chin, Francis Y L

    2011-07-01

    Next-generation sequencing techniques allow us to generate reads from a microbial environment in order to analyze the microbial community. However, assembling of a set of mixed reads from different species to form contigs is a bottleneck of metagenomic research. Although there are many assemblers for assembling reads from a single genome, there are no assemblers for assembling reads in metagenomic data without reference genome sequences. Moreover, the performances of these assemblers on metagenomic data are far from satisfactory, because of the existence of common regions in the genomes of subspecies and species, which make the assembly problem much more complicated. We introduce the Meta-IDBA algorithm for assembling reads in metagenomic data, which contain multiple genomes from different species. There are two core steps in Meta-IDBA. It first tries to partition the de Bruijn graph into isolated components of different species based on an important observation. Then, for each component, it captures the slight variants of the genomes of subspecies from the same species by multiple alignments and represents the genome of one species, using a consensus sequence. Comparison of the performances of Meta-IDBA and existing assemblers, such as Velvet and Abyss for different metagenomic datasets shows that Meta-IDBA can reconstruct longer contigs with similar accuracy. Meta-IDBA toolkit is available at our website http://www.cs.hku.hk/~alse/metaidba. chin@cs.hku.hk.

  19. The UCSC Genome Browser Database: update 2006

    DEFF Research Database (Denmark)

    Hinrichs, A S; Karolchik, D; Baertsch, R

    2006-01-01

    The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, ...

  20. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  1. Combining de novo and reference-guided assembly with scaffold_builder

    NARCIS (Netherlands)

    Silva, G.G.; Dutilh, B.E.; Matthews, T.D.; Elkins, K.; Schmieder, R.; Dinsdale, E.A.; Edwards, R.A.

    2013-01-01

    Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter

  2. Components of Adenovirus Genome Packaging

    Science.gov (United States)

    Ahi, Yadvinder S.; Mittal, Suresh K.

    2016-01-01

    Adenoviruses (AdVs) are icosahedral viruses with double-stranded DNA (dsDNA) genomes. Genome packaging in AdV is thought to be similar to that seen in dsDNA containing icosahedral bacteriophages and herpesviruses. Specific recognition of the AdV genome is mediated by a packaging domain located close to the left end of the viral genome and is mediated by the viral packaging machinery. Our understanding of the role of various components of the viral packaging machinery in AdV genome packaging has greatly advanced in recent years. Characterization of empty capsids assembled in the absence of one or more components involved in packaging, identification of the unique vertex, and demonstration of the role of IVa2, the putative packaging ATPase, in genome packaging have provided compelling evidence that AdVs follow a sequential assembly pathway. This review provides a detailed discussion on the functions of the various viral and cellular factors involved in AdV genome packaging. We conclude by briefly discussing the roles of the empty capsids, assembly intermediates, scaffolding proteins, portal vertex and DNA encapsidating enzymes in AdV assembly and packaging. PMID:27721809

  3. Valve assembly

    International Nuclear Information System (INIS)

    Sandling, M.

    1981-01-01

    An improved valve assembly, used for controlling the flow of radioactive slurry, is described. Radioactive contamination of the air during removal or replacement of the valve is prevented by sucking air from the atmosphere through a portion of the structure above the valve housing. (U.K.)

  4. Fuel assembly

    International Nuclear Information System (INIS)

    Gjertsen, R.K.; Bassler, E.A.; Huckestein, E.A.; Salton, R.B.; Tower, S.N.

    1988-01-01

    A fuel assembly adapted for use with a pressurized water nuclear reactor having capabilities for fluid moderator spectral shift control is described comprising: parallel arranged elongated nuclear fuel elements; means for providing for axial support of the fuel elements and for arranging the fuel elements in a spaced array; thimbles interspersed among the fuel elements adapted for insertion of a rod control cluster therewithin; means for structurally joining the fuel elements and the guide thimbles; fluid moderator control means for providing a volume of low neutron absorbing fluid within the fuel assembly and for removing a substantially equivalent volume of reactor coolant water therefrom, a first flow manifold at one end of the fuel assembly sealingly connected to a first end of the moderator control tubes whereby the first ends are commonly flow connected; and a second flow manifold, having an inlet passage and an outlet passage therein, sealingly connected to a second end of the moderator control tubes at a second end of the fuel assembly

  5. Genome Improvement at JGI-HAGSC

    Energy Technology Data Exchange (ETDEWEB)

    Grimwood, Jane; Schmutz, Jeremy J.; Myers, Richard M.

    2012-03-03

    Since the completion of the sequencing of the human genome, the Joint Genome Institute (JGI) has rapidly expanded its scientific goals in several DOE mission-relevant areas. At the JGI-HAGSC, we have kept pace with this rapid expansion of projects with our focus on assessing, assembling, improving and finishing eukaryotic whole genome shotgun (WGS) projects for which the shotgun sequence is generated at the Production Genomic Facility (JGI-PGF). We follow this by combining the draft WGS with genomic resources generated at JGI-HAGSC or in collaborator laboratories (including BAC end sequences, genetic maps and FLcDNA sequences) to produce an improved draft sequence. For eukaryotic genomes important to the DOE mission, we then add further information from directed experiments to produce reference genomic sequences that are publicly available for any scientific researcher. Also, we have continued our program for producing BAC-based finished sequence, both for adding information to JGI genome projects and for small BAC-based sequencing projects proposed through any of the JGI sequencing programs. We have now built our computational expertise in WGS assembly and analysis and have moved eukaryotic genome assembly from the JGI-PGF to JGI-HAGSC. We have concentrated our assembly development work on large plant genomes and complex fungal and algal genomes.

  6. Assembly of Repeat Content Using Next Generation Sequencing Data

    Energy Technology Data Exchange (ETDEWEB)

    labutti, Kurt; Kuo, Alan; Grigoriev, Igor; Copeland, Alex

    2014-03-17

    Repetitive organisms pose a challenge for short read assembly, and typically only unique regions and repeat regions shorter than the read length, can be accurately assembled. Recently, we have been investigating the use of Pacific Biosciences reads for de novo fungal assembly. We will present an assessment of the quality and degree of repeat reconstruction possible in a fungal genome using long read technology. We will also compare differences in assembly of repeat content using short read and long read technology.

  7. Fuel assembly

    International Nuclear Information System (INIS)

    Yokota, Tokunobu.

    1990-01-01

    A fuel assembly used in a FBR type nuclear reactor comprises a plurality of fuel rods and a moderator guide member (water rod). A moderator exit opening/closing mechanism is formed at the upper portion of the moderator guide member for opening and closing a moderator exit. In the initial fuel charging operation cycle to the reactor, the moderator exit is closed by the moderator exit opening/closing mechanism. Then, voids are accumulated at the inner upper portion of the moderator guide member to harden spectrum and a great amount of plutonium is generated and accumulated in the fuel assembly. Further, in the fuel re-charging operation cycle, the moderator guide member is used having the moderator exit opened. In this case, voids are discharged from the moderator guide member to decrease the ratio, and the plutonium accumulated in the initial charging operation cycle is burnt. In this way, the fuel economy can be improved. (I.N.)

  8. Fuel assemblies

    International Nuclear Information System (INIS)

    Echigoya, Hironori; Nomata, Terumitsu.

    1983-01-01

    Purpose: To render the axial distribution relatively flat. Constitution: First nuclear element comprises a fuel can made of zircalloy i.e., the metal with less neutron absorption, which is filled with a plurality of UO 2 pellets and sealed by using a lower end plug, a plenum spring and an upper end plug by means of welding. Second fuel element is formed by substituting a part of the UO 2 pellets with a water tube which is sealed with water and has a space for allowing the heat expansion. The nuclear fuel assembly is constituted by using the first and second fuel elements together. In such a structure, since water reflects neutrons and decrease their leakage to increase the temperature, reactivity is added at the upper portion of the fuel assembly to thereby flatten the axial power distribution. Accordingly, stable operation is possible only by means of deep control rods while requiring no shallow control rods. (Sekiya, K.)

  9. Fuel assembly

    International Nuclear Information System (INIS)

    Kawai, Mitsuo.

    1988-01-01

    Purpose: To reduce the corrosion rate and suppress the increase of radioactive corrosion products in reactor water of nuclear fuel assemblies for use in BWR type reactors having spacer springs made of nickel based deposition reinforced type alloys. Constitution: Spacer rings made of nickel based deposition reinforced type alloy are incorporated and used as fuel assemblies after applying treatment of dipping and maintaining at high temperature water followed by heating in steams. Since this can remove the nickel leaching into reactor water at the initial stage, Co-58 as the radioactive corrosion products in the reactor water can be reduced, and the operation at in-service inspection or repairement can be facilitated to improve the working efficiency of the nuclear power plant. The dipping time is desirably more than 10 hours and more desirably more than 30 hours. (Horiuchi, T. )

  10. Fuel assembly

    International Nuclear Information System (INIS)

    Watanabe, Shoichi; Hirano, Yasushi.

    1998-01-01

    A one-half or more of entire fuel rods in a fuel assembly comprises MOX fuel rods containing less than 1wt% of burnable poisons, and at least a portion of the burnable poisons comprises gadolinium. Then, surplus reactivity at an initial stage of operation cycle is controlled to eliminate burnable poisons remained unburnt at a final stage, as well as increase thermal reactivity. In addition, the content of fission plutonium is determined to greater than the content of uranium 235, and fuel rods at corner portions are made not to incorporate burnable poisons. Fuel rods not containing burnable poisons are disposed at positions in adjacent with fuel rods facing to a water rod at one or two directions. Local power at radial center of the fuel assembly is increased to flatten the distortion of radial power distribution. (N.H.)

  11. General Assembly

    CERN Multimedia

    Staff Association

    2016-01-01

    5th April, 2016 – Ordinary General Assembly of the Staff Association! In the first semester of each year, the Staff Association (SA) invites its members to attend and participate in the Ordinary General Assembly (OGA). This year the OGA will be held on Tuesday, April 5th 2016 from 11:00 to 12:00 in BE Auditorium, Meyrin (6-2-024). During the Ordinary General Assembly, the activity and financial reports of the SA are presented and submitted for approval to the members. This is the occasion to get a global view on the activities of the SA, its financial management, and an opportunity to express one’s opinion, including taking part in the votes. Other points are listed on the agenda, as proposed by the Staff Council. Who can vote? Only “ordinary” members (MPE) of the SA can vote. Associated members (MPA) of the SA and/or affiliated pensioners have a right to vote on those topics that are of direct interest to them. Who can give his/her opinion? The Ordinary General Asse...

  12. Fuel assembly

    International Nuclear Information System (INIS)

    Ueda, Sei; Ando, Ryohei; Mitsutake, Toru.

    1995-01-01

    The present invention concerns a fuel assembly suitable to a BWR-type reactor and improved especially with the nuclear characteristic, heat performance, hydraulic performance, dismantling or assembling performance and economical property. A part of poison rods are formed as a large-diameter/multi-region poison rods having a larger diameter than a fuel rod. A large number of fuel rods are disposed surrounding a large diameter water rod and a group of the large-diameter/multi-region poison rods in adjacent with the water rod. The large-diameter water rod has a burnable poison at the tube wall portion. At least a portion of the large-diameter poison rods has a coolant circulation portion allowing coolants to circulate therethrough. Since the large-diameter poison rods are disposed at a position of high neutron fluxes, a large neutron multiplication factor suppression effect can be provided, thereby enabling to reduce the number of burnable poison rods relative to fuels. As a result, power peaking in the fuel assembly is moderated and a greater amount of plutonium can be loaded. In addition the flow of cooling water which tends to gather around the large diameter water rod can be controlled to improve cooling performance of fuels. (N.H.)

  13. The wolf reference genome sequence (Canis lupus lupus) and its implications for Canis spp. population genomics

    DEFF Research Database (Denmark)

    Gopalakrishnan, Shyam; Samaniego Castruita, Jose Alfredo; Sinding, Mikkel Holger Strander

    2017-01-01

    Background An increasing number of studies are addressing the evolutionary genomics of dog domestication, principally through resequencing dog, wolf and related canid genomes. There is, however, only one de novo assembled canid genome currently available against which to map such data - that of a......Background An increasing number of studies are addressing the evolutionary genomics of dog domestication, principally through resequencing dog, wolf and related canid genomes. There is, however, only one de novo assembled canid genome currently available against which to map such data...... that regardless of the reference genome choice, most evolutionary genomic analyses yield qualitatively similar results, including those exploring the structure between the wolves and dogs using admixture and principal component analysis. However, we do observe differences in the genomic coverage of re-mapped...

  14. Partial Automated Alignment and Integration System

    Science.gov (United States)

    Kelley, Gary Wayne (Inventor)

    2014-01-01

    The present invention is a Partial Automated Alignment and Integration System (PAAIS) used to automate the alignment and integration of space vehicle components. A PAAIS includes ground support apparatuses, a track assembly with a plurality of energy-emitting components and an energy-receiving component containing a plurality of energy-receiving surfaces. Communication components and processors allow communication and feedback through PAAIS.

  15. Removable partial dentures: clinical concepts.

    Science.gov (United States)

    Bohnenkamp, David M

    2014-01-01

    This article provides a review of the traditional clinical concepts for the design and fabrication of removable partial dentures (RPDs). Although classic theories and rules for RPD designs have been presented and should be followed, excellent clinical care for partially edentulous patients may also be achieved with computer-aided design/computer-aided manufacturing technology and unique blended designs. These nontraditional RPD designs and fabrication methods provide for improved fit, function, and esthetics by using computer-aided design software, composite resin for contours and morphology of abutment teeth, metal support structures for long edentulous spans and collapsed occlusal vertical dimensions, and flexible, nylon thermoplastic material for metal-supported clasp assemblies. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Plural beam electron gun assembly

    International Nuclear Information System (INIS)

    Stratton, M.G.

    1977-01-01

    The invention relates to a cathode ray tube plural-beam-in-line bi-potential electron gun assembly, having applied beam currents of differing levels, manifests structurally modified gun structures to effect focused beam landings at the screen that are evidenced as substantially equi-sized spots thereby providing improved resolution and brightness of the screen imagery. The structural changes embody modifications of the related focusing and accelerator electrodes of the respective guns to provide a partial telescoping arrangement for effecting the discrete placement, forming and shielding of the final focusing lenses. The three lenses so formed are in different planes in partial overlapping axial relationship

  17. Preliminary High-Throughput Metagenome Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Dusheyko, Serge; Furman, Craig; Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank

    2007-03-26

    Metagenome data sets present a qualitatively different assembly problem than traditional single-organism whole-genome shotgun (WGS) assembly. The unique aspects of such projects include the presence of a potentially large number of distinct organisms and their representation in the data set at widely different fractions. In addition, multiple closely related strains could be present, which would be difficult to assemble separately. Failure to take these issues into account can result in poor assemblies that either jumble together different strains or which fail to yield useful results. The DOE Joint Genome Institute has sequenced a number of metagenomic projects and plans to considerably increase this number in the coming year. As a result, the JGI has a need for high-throughput tools and techniques for handling metagenome projects. We present the techniques developed to handle metagenome assemblies in a high-throughput environment. This includes a streamlined assembly wrapper, based on the JGI?s in-house WGS assembler, Jazz. It also includes the selection of sensible defaults targeted for metagenome data sets, as well as quality control automation for cleaning up the raw results. While analysis is ongoing, we will discuss preliminary assessments of the quality of the assembly results (http://fames.jgi-psf.org).

  18. Optimizing Transcriptome Assemblies for Eleusine indica Leaf and Seedling by Combining Multiple Assemblies from Three De Novo Assemblers

    Directory of Open Access Journals (Sweden)

    Shu Chen

    2015-03-01

    Full Text Available Due to rapid advances in sequencing technology, increasing amounts of genomic and transcriptomic data are available for plant species, presenting enormous challenges for biocomputing analysis. A crucial first step for a successful transcriptomics-based study is the building of a high-quality assembly. Here, we utilized three different de novo assemblers (Trinity, Velvet, and CLC and the EvidentialGene pipeline tr2aacds to assemble two optimized transcript sets for the notorious weed species, . Two RNA sequencing (RNA-seq datasets from leaf and aboveground seedlings were processed using three assemblers, which resulted in 20 assemblies for each dataset. The contig numbers and N50 values of each assembly were compared to study the effect of read number, k-mer size, and in silico normalization on assembly output. The 20 assemblies were then processed through the tr2aacds pipeline to remove redundant transcripts and to select the transcript set with the best coding potential. Each assembly contributed a considerable proportion to the final transcript combination with the exception of the CLC-k14. Thus each assembler and parameter set did assemble better contigs for certain transcripts. The redundancy, total contig number, N50, fully assembled contig number, and transcripts related to target-site herbicide resistance were evaluated for the EvidentialGene and Trinity assemblies. Comparing the EvidentialGene set with the Trinity assembly revealed improved quality and reduced redundancy in both leaf and seedling EvidentialGene sets. The optimized transcriptome references will be useful for studying herbicide resistance in and the evolutionary process in the three allotetraploid offspring.

  19. Fuel assembly

    International Nuclear Information System (INIS)

    Fujibayashi, Toru.

    1970-01-01

    Herein disclosed is a fuel assembly in which a fuel rod bundle is easily detachable by rotating a fuel rod fastener rotatably mounted to the upper surface of an upper tie-plate supporting a fuel bundle therebelow. A locking portion at the leading end of each fuel rod protrudes through the upper tie-plate and is engaged with or separated from the tie-plate by the rotation of the fastener. The removal of a desired fuel rod can therefore be remotely accomplished without the necessity of handling pawls, locking washers and nuts. (Owens, K.J.)

  20. Assembling consumption

    DEFF Research Database (Denmark)

    Assembling Consumption marks a definitive step in the institutionalisation of qualitative business research. By gathering leading scholars and educators who study markets, marketing and consumption through the lenses of philosophy, sociology and anthropology, this book clarifies and applies...... the investigative tools offered by assemblage theory, actor-network theory and non-representational theory. Clear theoretical explanation and methodological innovation, alongside empirical applications of these emerging frameworks will offer readers new and refreshing perspectives on consumer culture and market...... societies. This is an essential reading for both seasoned scholars and advanced students of markets, economies and social forms of consumption....

  1. The perennial ryegrass GenomeZipper: targeted use of genome resources for comparative grass genomics.

    Science.gov (United States)

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F X; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-02-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.

  2. Genome-derived vaccines.

    Science.gov (United States)

    De Groot, Anne S; Rappuoli, Rino

    2004-02-01

    Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.

  3. System and method for controlling a combustor assembly

    Science.gov (United States)

    York, William David; Ziminsky, Willy Steve; Johnson, Thomas Edward; Stevenson, Christian Xavier

    2013-03-05

    A system and method for controlling a combustor assembly are disclosed. The system includes a combustor assembly. The combustor assembly includes a combustor and a fuel nozzle assembly. The combustor includes a casing. The fuel nozzle assembly is positioned at least partially within the casing and includes a fuel nozzle. The fuel nozzle assembly further defines a head end. The system further includes a viewing device configured for capturing an image of at least a portion of the head end, and a processor communicatively coupled to the viewing device, the processor configured to compare the image to a standard image for the head end.

  4. Fuel assembly

    International Nuclear Information System (INIS)

    Kurihara, Kunitoshi; Azekura, Kazuo.

    1992-01-01

    In a reactor core of a heavy water moderated light water cooled pressure tube type reactor, no sufficient effects have been obtained for the transfer width to a negative side of void reactivity change in a region of a great void coefficient. Then, a moderation region divided into upper and lower two regions is disposed at the central portion of a fuel assembly. Coolants flown into the lower region can be discharged to the cooling region from an opening disposed at the upper end portion of the lower region. Light water flows from the lower region of the moderator region to the cooling region of the reactor core upper portion, to lower the void coefficient. As a result, the reactivity performance at low void coefficient, i.e., a void reaction rate is transferred to the negative side. Thus, this flattens the power distribution in the fuel assembly, increases the thermal margin and enables rapid operaiton and control of the reactor core, as well as contributes to the increase of fuel burnup ratio and reduction of the fuel cycle cost. (N.H.)

  5. Fuel assembly

    International Nuclear Information System (INIS)

    Chaki, Masao; Nishida, Koji; Karasawa, Hidetoshi; Kanazawa, Toru; Orii, Akihito; Nagayoshi, Takuji; Kashiwai, Shin-ichi; Masuhara, Yasuhiro

    1998-01-01

    The present invention concerns a fuel assembly, for a BWR type nuclear reactor, comprising fuel rods in 9 x 9 matrix. The inner width of the channel box is about 132mm and the length of the fuel rods which are not short fuel rods is about 4m. Two water rods having a circular cross section are arranged on a diagonal line in a portion of 3 x 3 matrix at the center of the fuel assembly, and two fuel rods are disposed at vacant spaces, and the number of fuel rods is 74. Eight fuel rods are determined as short fuel rods among 74 fuel rods. Assuming the fuel inventory in the short fuel rod as X(kg), and the fuel inventory in the fuel rods other than the short fuel rods as Y(kg), X and Y satisfy the relation: X + Y ≥ 173m, Y ≤ - 9.7X + 292, Y ≤ - 0.3X + 203 and X > 0. Then, even when the short fuel rods are used, the fuel inventory is increased and fuel economy can be improved. (I.N.)

  6. Fuel assembly

    International Nuclear Information System (INIS)

    Fushimi, Atsushi; Shimada, Hidemitsu; Aoyama, Motoo; Nakajima, Junjiro

    1998-01-01

    In a fuel assembly for an n x n lattice-like BWR type reactor, n is determined to 9 or greater, and the enrichment degree of plutonium is determined to 4.4% by weight or less. Alternatively, n is determined to 10 or greater, and the enrichment degree of plutonium is determined to 5.2% by weight or less. An average take-out burnup degree is determined to 39GWd/t or less, and the matrix is determined to 9 x 9 or more, or the average take-out burnup degree is determined to 51GWd/t, and the matrix is determined to 10 x 10 or more and the increase of the margin of the maximum power density obtained thereby is utilized for the compensation of the increase of distortion of power distribution due to decrease of the kinds of plutonium enrichment degree, thereby enabling to reduce the kind of the enrichment degree of MOX fuel rods to one. As a result, the manufacturing step for fuel pellets can be simplified to reduce the manufacturing cost for MOX fuel assemblies. (N.H.)

  7. General Assembly

    CERN Multimedia

    Staff Association

    2015-01-01

    Mardi 5 mai à 11 h 00 Salle 13-2-005 Conformément aux statuts de l’Association du personnel, une Assemblée générale ordinaire est organisée une fois par année (article IV.2.1). Projet d’ordre du jour : 1- Adoption de l’ordre du jour. 2- Approbation du procès-verbal de l’Assemblée générale ordinaire du 22 mai 2014. 3- Présentation et approbation du rapport d’activités 2014. 4- Présentation et approbation du rapport financier 2014. 5- Présentation et approbation du rapport des vérificateurs aux comptes pour 2014. 6- Programme 2015. 7- Présentation et approbation du projet de budget 2015 et taux de cotisation pour 2015. 8- Pas de modifications aux Statuts de l'Association du personnel proposée. 9- Élections des membres de la Commission é...

  8. General Assembly

    CERN Multimedia

    Staff Association

    2017-01-01

    Conformément aux statuts de l’Association du personnel, une Assemblée générale ordinaire est organisée une fois par année (article IV.2.1). Projet d’ordre du jour : Adoption de l’ordre du jour. Approbation du procès-verbal de l’Assemblée générale ordinaire du 5 avril 2016. Présentation et approbation du rapport d’activités 2016. Présentation et approbation du rapport financier 2016. Présentation et approbation du rapport des vérificateurs aux comptes pour 2016. Programme de travail 2017. Présentation et approbation du projet de budget 2017 Approbation du taux de cotisation pour 2018. Modifications aux Statuts de l'Association du personnel proposées. Élections des membres de la Commission électorale. Élections des vérifica...

  9. General Assembly

    CERN Multimedia

    Staff Association

    2016-01-01

    Mardi 5 avril à 11 h 00 BE Auditorium Meyrin (6-2-024) Conformément aux statuts de l’Association du personnel, une Assemblée générale ordinaire est organisée une fois par année (article IV.2.1). Projet d’ordre du jour : Adoption de l’ordre du jour. Approbation du procès-verbal de l’Assemblée générale ordinaire du 5 mai 2015. Présentation et approbation du rapport d’activités 2015. Présentation et approbation du rapport financier 2015. Présentation et approbation du rapport des vérificateurs aux comptes pour 2015. Programme de travail 2016. Présentation et approbation du projet de budget 2016 Approbation du taux de cotisation pour 2017. Modifications aux Statuts de l'Association du personnel proposée. Élections des membres de la Commissio...

  10. General assembly

    CERN Multimedia

    Staff Association

    2015-01-01

    Mardi 5 mai à 11 h 00 Salle 13-2-005 Conformément aux statuts de l’Association du personnel, une Assemblée générale ordinaire est organisée une fois par année (article IV.2.1). Projet d’ordre du jour : Adoption de l’ordre du jour. Approbation du procès-verbal de l’Assemblée générale ordinaire du 22 mai 2014. Présentation et approbation du rapport d’activités 2014. Présentation et approbation du rapport financier 2014. Présentation et approbation du rapport des vérificateurs aux comptes pour 2014. Programme 2015. Présentation et approbation du projet de budget 2015 et taux de cotisation pour 2015. Pas de modifications aux Statuts de l'Association du personnel proposée. Élections des membres de la Commission électorale. &am...

  11. Fuel assembly

    International Nuclear Information System (INIS)

    Nomata, Terumitsu.

    1993-01-01

    Among fuel pellets to be loaded to fuel cans of a fuel assembly, fuel pellets having a small thermal power are charged in a region from the end of each of spacers up to about 50mm on the upstream of coolants that flow vertically at the periphery of fuel rods. Coolants at the periphery of fuel rods are heated by the heat generation, to result in voids. However, since cooling effect on the upstream of the spacers is low due to influences of the spacers. Further, since the fuel pellets disposed in the upstream region have small thermal power, a void coefficient is not increased. Even if a thermal power exceeding cooling performance should be generated, there is no worry of causing burnout in the upstream region. Even if burnout should be caused, safety margin and reliability relative to burnout are improved, to increase an allowable thermal power, thereby enabling to improve integrity and reliability of fuel rods and fuel assemblies. (N.H.)

  12. Fuel assemblies for nuclear reactors

    International Nuclear Information System (INIS)

    Jabsen, F.S.

    1979-01-01

    In a nuclear fuel assembly, hollow guide posts protrude into a fuel assembly and fitting grill from a biased spring pad with a plunger that moves with the spring pad plugging one end of each of the guide posts. A plate on the end fitting grill that has a hole for fluid discharge partially plugs the other end of the guide post. Pressurized water coolant that fills the guide post volume acts as a shock absorber and should the reactor core receive a major seismic or other shock, the fuel assembly is compelled to move towards a pad depending from a transversely disposed support grid. The pad bears against the spring pad and the plunger progressively blocks the orifices provided by slots in the guide posts thus gradually absorbing the applied shock. After the orifice has been completely blocked, controlled fluid discharge continues through a hole coil spring cooperating in the attenuation of the shock. (author)

  13. Fuel assembly

    International Nuclear Information System (INIS)

    Bando, Masaru.

    1993-01-01

    As neutron irradiation progresses on a fuel assembly of an FBR type reactor, a strong force is exerted to cause ruptures if the arrangement of fuel elements is not displaced, whereas the fuel elements may be brought into direct contact with each other not by way of spacers to cause burning damages if the arrangement is displaced. In the present invention, the circumference of fuel elements arranged in a normal triangle lattice is surrounded by a wrapper tube having a hexagonal cross section, wire spacers are wound therearound, and deformable spacers are distributed to optional positions for fuel elements in the wrapper tube. Interaction between the fuel elements caused by irradiation is effectively absorbed, thereby enabling to delay the occurrence of the rupture and burning damages of the elements. (N.H.)

  14. Fuel assembly

    International Nuclear Information System (INIS)

    Ueda, Makoto.

    1991-01-01

    In a fuel assembly in which spectral shift type moderator guide members are arranged, the moderator guide member has a flow channel resistance member, that provides flow resistance against the moderators, in the upstream of a moderator flowing channel, by which the ratio of removing coolants is set greater at the upstream than downstream. With such a constitution, the void distribution increasing upward in the channel box except for the portion of the moderator guide member is moderated by the increase of the area of the void region that expands downward in the guide member. Accordingly, the axial power distribution is flattened throughout the operation cycle and excess distortion is eliminated to improve the fuel integrity. (T.M.)

  15. Fuel assembly

    International Nuclear Information System (INIS)

    Wataumi, Kazutoshi; Tajiri, Hiroshi.

    1992-01-01

    In a fuel assembly of a BWR type reactor, a pellet to be loaded comprises an external layer of fissile materials containing burnable poisons and an internal layer of fissile materials not containing burnable poison. For example, there is provided a dual type pellet comprising an external layer made of UO 2 incorporated with Gd 2 O 3 at a predetermined concentration as the burnable poisons and an internal layer made of UO 2 not containing Gd 2 O 3 . The amount of the burnable poisons required for predetermined places is controlled by the thickness of the ring of the external layer. This can dissipate an unnecessary poisoning effect at the final stage of the combustion cycle. Further, since only one or a few kinds of powder mixture of the burnable poisons and the fissile materials is necessary, production and product control can be facilitated. (I.N.)

  16. Fuel assembly

    International Nuclear Information System (INIS)

    Ishibashi, Yoko; Aoyama, Motoo; Oyama, Jun-ichi.

    1995-01-01

    Burnable poison-incorporating fuel rods of a first group are disposed in a region in adjacent with a water rod having a large diameter (neutron moderator rod) disposed to the central portion of a fuel assembly. Burnable poison-incorporating fuel rods of a second group are disposed to a region other than peripheral zone in adjacent with a channel box and corners positioned at an inner zone, in adjacent with the channel box. The average concentration of burnable poisons of the burnable poison-incorporating fuel rods of the first group is made greater than that of the second group. With such a constitution, when the burnable poisons of the first group are burnt out, the burnable poisons of the second group are also burnt out at the same time. Accordingly, an amount of burnable poisons left unburnt at the final stage of the operation cycle is reduced, to improve the reactivity. This can improve the economical property. (I.N.)

  17. Fuel assemblies

    International Nuclear Information System (INIS)

    Yoshioka, Ritsuo.

    1983-01-01

    Purpose: To improve the operation performance of a BWR type reactor by improving the distribution of the uranium enrichment and the incorporation amount of burnable poisons in fuel assemblies. Constitution: The average enrichment of uranium 235 is increased in the upper portion as compared with that in the lower portion, while the incorporation amount of burnable poisons is increased in an upper portion as compared with that in the lower portion. The difference in the incorporation amount of the burnable poisons between the upper and lower portions is attained by charging two kinds of fuel rods; the ones incorporated with the burnable poisons over the entire length and the others incorporated with the burnable poisons only in the upper portions. (Seki, T.)

  18. GENOMIC FEATURES OF COTESIA PLUTELLAE POLYDNAVIRUS

    Institute of Scientific and Technical Information of China (English)

    LIUCai-ling; ZHUXiang-xiong; FuWen-jun; ZHAOMu-jun

    2003-01-01

    Polydnavirus was purified from the calyx fluid of Cotesia plutellae ovary. The genomic features of C. plutellae polydnavirus (CpPDV) were investigated. The viral genome consists of at least 12 different segments and the aggregate genome size is a lower estimate of 80kbp. By partial digestion of CpPDV DNA with BamHI and subsequent ligation with BamHI-cut plasmid Bluescript, a representative library of CpPDV genome was obtained.

  19. Draft genome of the fungus-growing termite pathogenic fungus Ophiocordyceps bispora (Ophiocordycipitaceae, Hypocreales, Ascomycota

    Directory of Open Access Journals (Sweden)

    Benjamin H. Conlon

    2017-04-01

    Full Text Available This article documents the public availability of genome sequence data and assembled contigs representing the partial draft genome of Ophiocordyceps bispora. As one of the few known pathogens of fungus-farming termites, a draft genome of O. bispora represents the opportunity to further the understanding of disease and resistance in these complex termite societies. With the ongoing attempts to resolve the taxonomy of the Hypocralaean family, more genetic data will also help to shed light on the phylogenetic relationship between sexual and asexual life stages. Next generation sequence data is available from the European Nucleotide Archive (ENA under accession PRJEB13655; run numbers: ERR1368522, ERR1368523, and ERR1368524. Genome assembly available from ENA under accession numbers: FKNF01000001–FKNF01000302. Gene prediction available as protein fasta, nucleotide fasta and GFF file from Mendeley Data with accession doi:10.17632/r99fd6g3s4.2 (http://dx.doi.org/10.17632/r99fd6g3s4.2.

  20. Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics.

    Science.gov (United States)

    Straub, Shannon C K; Parks, Matthew; Weitemier, Kevin; Fishbein, Mark; Cronn, Richard C; Liston, Aaron

    2012-02-01

    Just as Sanger sequencing did more than 20 years ago, next-generation sequencing (NGS) is poised to revolutionize plant systematics. By combining multiplexing approaches with NGS throughput, systematists may no longer need to choose between more taxa or more characters. Here we describe a genome skimming (shallow sequencing) approach for plant systematics. Through simulations, we evaluated optimal sequencing depth and performance of single-end and paired-end short read sequences for assembly of nuclear ribosomal DNA (rDNA) and plastomes and addressed the effect of divergence on reference-guided plastome assembly. We also used simulations to identify potential phylogenetic markers from low-copy nuclear loci at different sequencing depths. We demonstrated the utility of genome skimming through phylogenetic analysis of the Sonoran Desert clade (SDC) of Asclepias (Apocynaceae). Paired-end reads performed better than single-end reads. Minimum sequencing depths for high quality rDNA and plastome assemblies were 40× and 30×, respectively. Divergence from the reference significantly affected plastome assembly, but relatively similar references are available for most seed plants. Deeper rDNA sequencing is necessary to characterize intragenomic polymorphism. The low-copy fraction of the nuclear genome was readily surveyed, even at low sequencing depths. Nearly 160000 bp of sequence from three organelles provided evidence of phylogenetic incongruence in the SDC. Adoption of NGS will facilitate progress in plant systematics, as whole plastome and rDNA cistrons, partial mitochondrial genomes, and low-copy nuclear markers can now be efficiently obtained for molecular phylogenetics studies.

  1. Targeted assembly of short sequence reads.

    Directory of Open Access Journals (Sweden)

    René L Warren

    Full Text Available As next-generation sequence (NGS production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled stringently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming genomic mutations, polymorphisms, fusions and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly.

  2. Faucet: streaming de novo assembly graph construction.

    Science.gov (United States)

    Rozov, Roye; Goldshlager, Gil; Halperin, Eran; Shamir, Ron

    2018-01-01

    We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased. Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata-coverage counts collected at junction k-mers and connections bridging between junction pairs-contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency-namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14-110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available. Faucet is available at https://github.com/Shamir-Lab/Faucet. rshamir@tau.ac.il or eranhalperin@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  3. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.

    2014-08-14

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  4. Draft Genome Sequence of Lactobacillus casei Lbs2.

    Science.gov (United States)

    Bhowmick, Swati; Malar, Mathu; Das, Abhishek; Kumar Thakur, Bhupesh; Saha, Piu; Das, Santasabuj; Rashmi, H M; Batish, Virender K; Grover, Sunita; Tripathy, Sucheta

    2014-12-24

    We report here a 3.2-Mb draft assembled genome of Lactobacillus casei Lbs2. The bacterium shows probiotic and immunomodulatory activities. The genome assembly and annotation will help to identify molecules and pathways responsible for interaction between the host immune system and the microbe. Copyright © 2014 Bhowmick et al.

  5. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.; Michell, Craig; Apprill, Amy; Voolstra, Christian R.

    2014-01-01

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  6. The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains.

    Science.gov (United States)

    Pfeiffer, Friedhelm; Zamora-Lagos, Maria-Antonia; Blettinger, Martin; Yeroslaviz, Assa; Dahl, Andreas; Gruber, Stephan; Habermann, Bianca H

    2018-01-05

    Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel. Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain's genetic profile from pathogenic to environmental.

  7. Fuel assembly

    International Nuclear Information System (INIS)

    Hirukawa, Koji; Sakurada, Koichi.

    1992-01-01

    In a fuel assembly for a BWR type reactor, water rods or water crosses are disposed between fuel rods, and a value with a spring is disposed at the top of the coolant flow channel thereof, which opens a discharge port when pressure is increased to greater than a predetermined value. Further, a control element for the amount of coolant flow rate is inserted retractable to a control element guide tube formed at the lower portion of the water rod or the water cross. When the amount of control elements inserted to the control element guide tube is small and the inflown coolant flow rate is great, the void coefficient at the inside of the water rod is less than 5%. On the other hand, when the control elements are inserted, the flow resistance is increased, so that the void coefficient in the water rod is greater than 80%. When the pressure in the water rod is increased, the valve with the spring is raised to escape water or steams. Then, since the variation range of the change of the void coefficient can be controlled reliably by the amount of the control elements inserted, and nuclear fuel materials can be utilized effectively. (N.H.)

  8. Fuel assembly

    International Nuclear Information System (INIS)

    Hiraiwa, Koji; Ueda, Makoto

    1989-01-01

    In a fuel assembly used for a light water cooled reactor such as a BWR type reactor, a water rod is divided axially into an upper outer tube and a lower outer tube by means of a plug disposed from the lower end of a water rod to a position 1/4 - 1/2 of the entire length for the water rod. Inlet apertures and exit apertures for moderators are respectively perforated for the divided outer tube and upper and lower portions. Further, an upper inner tube with less neutron irradiation growing amount than the outer tube is perforated on the plug in the outer tube, while a lower inner tube with greater neutron irradiation growing amount than the outer tube is suspended from the lower surface of the plug in the outer tube. Then, the opening area for the exit apertures disposed to the upper outer tube and the lower outer tube is controlled depending on the difference of the neutron irradiation growing amount between the upper inner tube and the upper outer tube, and the difference of the neutron irradiation growing amount between the lower inner tube and the lower outer tube. This enables effective spectral shift operation and improve the fuel economy. (T.M.)

  9. Fuel assembly

    International Nuclear Information System (INIS)

    Yamazaki, Hajime.

    1995-01-01

    In a fuel assembly having fuel rods of different length, fuel pellets of mixed oxides of uranium and plutonium are loaded to a short fuel rod. The volume ratio of a pellet-loaded portion to a plenum portion of the short fuel rod is made greater than the volume ratio of a fuel rod to which uranium fuel pellets are loaded. In addition, the volume of the plenum portion of the short fuel rod is set greater depending on the plutonium content in the loaded fuel pellets. MOX fuel pellets are loaded on the short fuel rods having a greater degree of freedom relevant to the setting for the volume of the plenum portion compared with that of a long rod fuel, and the volume of the plenum portion is ensured greater depending on the plutonium content. Even if a large amount of FP gas and He gas are discharged from the MOX fuels compared with that from the uranium fuels, the internal pressure of the MOX fuel rod during operation is maintained substantially identical with that of the uranium fuel rod, so that a risk of generating excess stresses applied to the fuel cladding tubes and rupture of fuels are greatly reduced. (N.H.)

  10. Fuel assembly

    International Nuclear Information System (INIS)

    Nakajima, Akiyoshi; Bessho, Yasunori; Aoyama, Motoo; Koyama, Jun-ichi; Hirakawa, Hiromasa; Yamashita, Jun-ichi; Hayashi, Tatsuo

    1998-01-01

    In a fuel assembly of a BWR type reactor in which a water rod of a large diameter is disposed at the central portion, the cross sectional area perpendicular to the axial direction comprises a region a of a fuel rod group facing to a wide gap water region to which a control rod is inserted, a region b of a fuel rod group disposed on the side of the wide gap water region other than the region a, a region d of a fuel rod group facing to a narrow gap water region and a region c of a fuel rod group disposed on the side of the narrow gap water region other than the region d. When comparing an amount of fission products contained in the four regions relative to that in the entire regions and average enrichment degrees of fuel rods for the four regions, the relative amount and the average enrichment degree of the fuel rod group of the region a is minimized, and the relative amount and the average enrichment degree of the fuel rod group in the region b is maximized. Then, reactor shut down margin during cold operation can be improved while flattening the power in the cross section perpendicular to the axial direction. (N.H.)

  11. Partial tooth gear bearings

    Science.gov (United States)

    Vranish, John M. (Inventor)

    2010-01-01

    A partial gear bearing including an upper half, comprising peak partial teeth, and a lower, or bottom, half, comprising valley partial teeth. The upper half also has an integrated roller section between each of the peak partial teeth with a radius equal to the gear pitch radius of the radially outwardly extending peak partial teeth. Conversely, the lower half has an integrated roller section between each of the valley half teeth with a radius also equal to the gear pitch radius of the peak partial teeth. The valley partial teeth extend radially inwardly from its roller section. The peak and valley partial teeth are exactly out of phase with each other, as are the roller sections of the upper and lower halves. Essentially, the end roller bearing of the typical gear bearing has been integrated into the normal gear tooth pattern.

  12. Electrostatics and the assembly of an RNA virus

    NARCIS (Netherlands)

    Schoot, van der P.P.A.M.; Bruinsma, R.

    2005-01-01

    Electrostatic interactions play a central role in the assembly of single-stranded RNA viruses. Under physiological conditions of salinity and acidity, virus capsid assembly requires the presence of genomic material that is oppositely charged to the core proteins. In this paper we apply basic polymer

  13. Draft Genome Sequence of "Terrisporobacter othiniensis" Isolated from a Blood Culture from a Human Patient

    DEFF Research Database (Denmark)

    Lund, Lars Christian; Sydenham, Thomas Vognbjerg; Høgh, Silje Vermedal

    2015-01-01

    "Terrisporobacter othiniensis" (proposed species) was isolated from a blood culture. Genomic DNA was sequenced using a MiSeq benchtop sequencer (Illumina) and assembled using the SPAdes genome assembler. This resulted in a draft genome sequence comprising 3,980,019 bp in 167 contigs containing 3...

  14. Prospects for Genomic Research in Forestry

    Directory of Open Access Journals (Sweden)

    K. V. Krutovsky

    2014-08-01

    Full Text Available Conifers are keystone species of boreal forests. Their whole genome sequencing, assembly and annotation will allow us to understand the evolution of the complex ancient giant conifer genomes that are 4 times larger in larch and 7–9 times larger in pines than the human genome. Genomic studies will allow also to obtain important whole genome sequence data and develop highly polymorphic and informative genetic markers, such as microsatellites and single nucleotide polymorphisms (SNPs that can be efficiently used in timber origin identification, for genetic variation monitoring, to study local and climate change adaptation and in tree improvement and conservation programs.

  15. The ecoresponsive genome of Daphnia pulex

    Energy Technology Data Exchange (ETDEWEB)

    Colbourne, John K.; Pfrender, Michael E.; Gilbert, Donald; Thomas, W. Kelley; Tucker, Abraham; Oakley, Todd H.; Tokishita, Shinichi; Aerts, Andrea; Arnold, Georg J.; Basu, Malay Kumar; Bauer, Darren J.; Caceres, Carla E.; Carmel, Liran; Casola, Claudio; Choi, Jeong-Hyeon; Detter, John C.; Dong, Qunfeng; Dusheyko, Serge; Eads, Brian D.; Frohlich, Thomas; Geiler-Samerotte, Kerry A.; Gerlach, Daniel; Hatcher, Phil; Jogdeo, Sanjuro; Krijgsveld, Jeroen; Kriventseva, Evgenia V; Kültz, Dietmar; Laforsch, Christian; Lindquist, Erika; Lopez, Jacqueline; Manak, Robert; Muller, Jean; Pangilinan, Jasmyn; Patwardhan, Rupali P.; Pitluck, Samuel; Pritham, Ellen J.; Rechtsteiner, Andreas; Rho, Mina; Rogozin, Igor B.; Sakarya, Onur; Salamov, Asaf; Schaack, Sarah; Shapiro, Harris; Shiga, Yasuhiro; Skalitzky, Courtney; Smith, Zachary; Souvorov, Alexander; Sung, Way; Tang, Zuojian; Tsuchiya, Dai; Tu, Hank; Vos, Harmjan; Wang, Mei; Wolf, Yuri I.; Yamagata, Hideo; Yamada, Takuji; Ye, Yuzhen; Shaw, Joseph R.; Andrews, Justen; Crease, Teresa J.; Tang, Haixu; Lucas, Susan M.; Robertson, Hugh M.; Bork, Peer; Koonin, Eugene V.; Zdobnov, Evgeny M.; Grigoriev, Igor V.; Lynch, Michael; Boore, Jeffrey L.

    2011-02-04

    This document provides supporting material related to the sequencing of the ecoresponsive genome of Daphnia pulex. This material includes information on materials and methods and supporting text, as well as supplemental figures, tables, and references. The coverage of materials and methods addresses genome sequence, assembly, and mapping to chromosomes, gene inventory, attributes of a compact genome, the origin and preservation of Daphnia pulex genes, implications of Daphnia's genome structure, evolutionary diversification of duplicated genes, functional significance of expanded gene families, and ecoresponsive genes. Supporting text covers chromosome studies, gene homology among Daphnia genomes, micro-RNA and transposable elements and the 46 Daphnia pulex opsins. 36 figures, 50 tables, 183 references.

  16. The UCSC Genome Browser Database: 2008 update

    DEFF Research Database (Denmark)

    Karolchik, D; Kuhn, R M; Baertsch, R

    2007-01-01

    The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrat...

  17. Essays on partial retirement

    NARCIS (Netherlands)

    Kantarci, T.

    2012-01-01

    The five essays in this dissertation address a range of topics in the micro-economic literature on partial retirement. The focus is on the labor market behavior of older age groups. The essays examine the economic and non-economic determinants of partial retirement behavior, the effect of partial

  18. A Taste of Algal Genomes from the Joint Genome Institute

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2012-06-17

    Algae play profound roles in aquatic food chains and the carbon cycle, can impose health and economic costs through toxic blooms, provide models for the study of symbiosis, photosynthesis, and eukaryotic evolution, and are candidate sources for bio-fuels; all of these research areas are part of the mission of DOE's Joint Genome Institute (JGI). To date JGI has sequenced, assembled, annotated, and released to the public the genomes of 18 species and strains of algae, sampling almost all of the major clades of photosynthetic eukaryotes. With more algal genomes currently undergoing analysis, JGI continues its commitment to driving forward basic and applied algal science. Among these ongoing projects are the pan-genome of the dominant coccolithophore Emiliania huxleyi, the interrelationships between the 4 genomes in the nucleomorph-containing Bigelowiella natans and Guillardia theta, and the search for symbiosis genes of lichens.

  19. Comparison of de novo assembly statistics of Cucumis sativus L.

    Science.gov (United States)

    Wojcieszek, Michał; Kuśmirek, Wiktor; Pawełkowicz, Magdalena; PlÄ der, Wojciech; Nowak, Robert M.

    2017-08-01

    Genome sequencing is the core of genomic research. With the development of NGS and lowering the cost of procedure there is another tight gap - genome assembly. Developing the proper tool for this task is essential as quality of genome has important impact on further research. Here we present comparison of several de Bruijn assemblers tested on C. sativus genomic reads. The assessment shows that newly developed software - dnaasm provides better results in terms of quantity and quality. The number of generated sequences is lower by 5 - 33% with even two fold higher N50. Quality check showed reliable results were generated by dnaasm. This provides us with very strong base for future genomic analysis.

  20. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  1. The genome of Chenopodium quinoa

    NARCIS (Netherlands)

    Jarvis, D.E.; Shwen Ho, Yung; Lightfoot, Damien J.; Schmöckel, Sandra M.; Li, Bo; Borm, T.J.A.; Ohyanagi, Hajime; Mineta, Katsuhiko; Mitchell, Craig T.; Saber, Noha; Kharbatia, Najeh M.; Rupper, Ryan R.; Sharp, Aaron R.; Dally, Nadine; Boughton, Berin A.; Woo, Yong H.; Gao, Ge; Schijlen, E.G.W.M.; Guo, Xiujie; Momin, Afaque A.; Negräo, Sónia; Al-Babili, Salim; Gehring, Christoph; Roessner, Ute; Jung, Christian; Murphy, Kevin; Arold, Stefan T.; Gojobori, Takashi; Linden, van der C.G.; Loo, van E.N.; Jellen, Eric N.; Maughan, Peter J.; Tester, Mark

    2017-01-01

    Chenopodium quinoa (quinoa) is a highly nutritious grain identified as an important crop to improve world food security. Unfortunately, few resources are available to facilitate its genetic improvement. Here we report the assembly of a high-quality, chromosome-scale reference genome sequence for

  2. Recurrent Partial Words

    Directory of Open Access Journals (Sweden)

    Francine Blanchet-Sadri

    2011-08-01

    Full Text Available Partial words are sequences over a finite alphabet that may contain wildcard symbols, called holes, which match or are compatible with all letters; partial words without holes are said to be full words (or simply words. Given an infinite partial word w, the number of distinct full words over the alphabet that are compatible with factors of w of length n, called subwords of w, refers to a measure of complexity of infinite partial words so-called subword complexity. This measure is of particular interest because we can construct partial words with subword complexities not achievable by full words. In this paper, we consider the notion of recurrence over infinite partial words, that is, we study whether all of the finite subwords of a given infinite partial word appear infinitely often, and we establish connections between subword complexity and recurrence in this more general framework.

  3. Swine transcriptome characterization by combined Iso-Seq and RNA-seq for annotating the emerging long read-based reference genome

    Science.gov (United States)

    PacBio long-read sequencing technology is increasingly popular in genome sequence assembly and transcriptome cataloguing. Recently, a new-generation pig reference genome was assembled based on long reads from this technology. To finely annotate this genome assembly, transcriptomes of nine tissues fr...

  4. Comparing de novo assemblers for 454 transcriptome data.

    Science.gov (United States)

    Kumar, Sujai; Blaxter, Mark L

    2010-10-16

    Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible

  5. Comparing de novo assemblers for 454 transcriptome data

    Directory of Open Access Journals (Sweden)

    Blaxter Mark L

    2010-10-01

    Full Text Available Abstract Background Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. Results Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects, which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Conclusions Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies

  6. The Chlamydomonas genome project: a decade on

    Science.gov (United States)

    Blaby, Ian K.; Blaby-Haas, Crysten; Tourasse, Nicolas; Hom, Erik F. Y.; Lopez, David; Aksoy, Munevver; Grossman, Arthur; Umen, James; Dutcher, Susan; Porter, Mary; King, Stephen; Witman, George; Stanke, Mario; Harris, Elizabeth H.; Goodstein, David; Grimwood, Jane; Schmutz, Jeremy; Vallon, Olivier; Merchant, Sabeeha S.; Prochnik, Simon

    2014-01-01

    The green alga Chlamydomonas reinhardtii is a popular unicellular organism for studying photosynthesis, cilia biogenesis and micronutrient homeostasis. Ten years since its genome project was initiated, an iterative process of improvements to the genome and gene predictions has propelled this organism to the forefront of the “omics” era. Housed at Phytozome, the Joint Genome Institute’s (JGI) plant genomics portal, the most up-to-date genomic data include a genome arranged on chromosomes and high-quality gene models with alternative splice forms supported by an abundance of RNA-Seq data. Here, we present the past, present and future of Chlamydomonas genomics. Specifically, we detail progress on genome assembly and gene model refinement, discuss resources for gene annotations, functional predictions and locus ID mapping between versions and, importantly, outline a standardized framework for naming genes. PMID:24950814

  7. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    Science.gov (United States)

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

  8. Partially closed fuel cycle of WWER-440

    International Nuclear Information System (INIS)

    Darilek, P.; Sebian, V.; Necas, V.

    2002-01-01

    Position of nuclear energy at the energy sources competition is characterised briefly. Multi-tier transmutation system is outlined out as effective back-end solution and consequently as factor that can increase nuclear energy competitiveness. LWR and equivalent WWER are suggested as a first tier reactors. Partially closed fuel cycle with combined fuel assemblies is briefed. Main back-end effects are characterised (Authors)

  9. Enhancing faba bean (Vicia faba L.) genome resources

    NARCIS (Netherlands)

    Cooper, James W.; Wilson, Michael H.; Derks, M.F.L.; Smit, Sandra; Kunert, Karl J.; Cullis, Christopher; Foyer, C.H.

    2017-01-01

    Grain legume improvement is currently impeded by a lack of genomic resources. The paucity of genome information for faba bean can be attributed to the intrinsic difficulties of assembling/annotating its giant (~13 Gb) genome. In order to address this challenge, RNA-sequencing analysis was performed

  10. Bat biology, genomes, and the Bat1K project

    DEFF Research Database (Denmark)

    Teeling, Emma C; Vernes, Sonja C; Dávalos, Liliana M

    2018-01-01

    and endangered. Here we announce Bat1K, an initiative to sequence the genomes of all living bat species (n∼1,300) to chromosome-level assembly. The Bat1K genome consortium unites bat biologists (>148 members as of writing), computational scientists, conservation organizations, genome technologists, and any...

  11. Why size really matters when sequencing plant genomes

    Czech Academy of Sciences Publication Activity Database

    Kelly, L.J.; Leitch, A.R.; Fay, M. F.; Renny-Byfield, S.; Pellicer, J.; Macas, Jiří; Leitch, I.J.

    2012-01-01

    Roč. 5, č. 4 (2012), s. 415-425 ISSN 1755-0874 Institutional research plan: CEZ:AV0Z50510513 Institutional support: RVO:60077344 Keywords : C-value * genome assembly * genome size evolution * genome sequencing Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 0.924, year: 2012

  12. First generation annotations for the fathead minnow (Pimephales promelas) genome

    Science.gov (United States)

    Ab initio gene prediction and evidence alignment were used to produce the first annotations for the fathead minnow SOAPdenovo genome assembly. Additionally, a genome browser hosted at genome.setac.org provides simplified access to the annotation data in context with fathead minno...

  13. Insights into Conifer Giga-Genomes1

    Science.gov (United States)

    De La Torre, Amanda R.; Birol, Inanc; Bousquet, Jean; Ingvarsson, Pär K.; Jansson, Stefan; Jones, Steven J.M.; Keeling, Christopher I.; MacKay, John; Nilsson, Ove; Ritland, Kermit; Street, Nathaniel; Yanchuk, Alvin; Zerbe, Philipp; Bohlmann, Jörg

    2014-01-01

    Insights from sequenced genomes of major land plant lineages have advanced research in almost every aspect of plant biology. Until recently, however, assembled genome sequences of gymnosperms have been missing from this picture. Conifers of the pine family (Pinaceae) are a group of gymnosperms that dominate large parts of the world’s forests. Despite their ecological and economic importance, conifers seemed long out of reach for complete genome sequencing, due in part to their enormous genome size (20–30 Gb) and the highly repetitive nature of their genomes. Technological advances in genome sequencing and assembly enabled the recent publication of three conifer genomes: white spruce (Picea glauca), Norway spruce (Picea abies), and loblolly pine (Pinus taeda). These genome sequences revealed distinctive features compared with other plant genomes and may represent a window into the past of seed plant genomes. This Update highlights recent advances, remaining challenges, and opportunities in light of the publication of the first conifer and gymnosperm genomes. PMID:25349325

  14. Genomic Characterization for Parasitic Weeds of the Genus Striga by Sample Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Matt C. Estep

    2012-03-01

    Full Text Available Generation of ∼2200 Sanger sequence reads or ∼10,000 454 reads for seven Lour. DNA samples (five species allowed identification of the highly repetitive DNA content in these genomes. The 14 most abundant repeats in these species were identified and partially assembled. Annotation indicated that they represent nine long terminal repeat (LTR retrotransposon families, three tandem satellite repeats, one long interspersed element (LINE retroelement, and one DNA transposon. All of these repeats are most closely related to repetitive elements in other closely related plants and are not products of horizontal transfer from their host species. These repeats were differentially abundant in each species, with the LTR retrotransposons and satellite repeats most responsible for variation in genome size. Each species had some repetitive elements that were more abundant and some less abundant than the other species examined, indicating that no single element or any unilateral growth or decrease trend in genome behavior was responsible for variation in genome size and composition. Genome sizes were determined by flow sorting, and the values of 615 Mb [ (L. Kuntze], 1330 Mb [ (Willd. Vatke], 1425 Mb [ (Delile Benth.] and 2460 Mb ( Benth. suggest a ploidy series, a prediction supported by repetitive DNA sequence analysis. Phylogenetic analysis using six chloroplast loci indicated the ancestral relationships of the five most agriculturally important species, with the unexpected result that the one parasite of dicotyledonous plants ( was found to be more closely related to some of the grass parasites than many of the grass parasites are to each other.

  15. Spaced Seed Data Structures for De Novo Assembly

    Directory of Open Access Journals (Sweden)

    Inanç Birol

    2015-01-01

    Full Text Available De novo assembly of the genome of a species is essential in the absence of a reference genome sequence. Many scalable assembly algorithms use the de Bruijn graph (DBG paradigm to reconstruct genomes, where a table of subsequences of a certain length is derived from the reads, and their overlaps are analyzed to assemble sequences. Despite longer subsequences unlocking longer genomic features for assembly, associated increase in compute resources limits the practicability of DBG over other assembly archetypes already designed for longer reads. Here, we revisit the DBG paradigm to adapt it to the changing sequencing technology landscape and introduce three data structure designs for spaced seeds in the form of paired subsequences. These data structures address memory and run time constraints imposed by longer reads. We observe that when a fixed distance separates seed pairs, it provides increased sequence specificity with increased gap length. Further, we note that Bloom filters would be suitable to implicitly store spaced seeds and be tolerant to sequencing errors. Building on this concept, we describe a data structure for tracking the frequencies of observed spaced seeds. These data structure designs will have applications in genome, transcriptome and metagenome assemblies, and read error correction.

  16. Extreme genomes

    OpenAIRE

    DeLong, Edward F

    2000-01-01

    The complete genome sequence of Thermoplasma acidophilum, an acid- and heat-loving archaeon, has recently been reported. Comparative genomic analysis of this 'extremophile' is providing new insights into the metabolic machinery, ecology and evolution of thermophilic archaea.

  17. Grass genomes

    OpenAIRE

    Bennetzen, Jeffrey L.; SanMiguel, Phillip; Chen, Mingsheng; Tikhonov, Alexander; Francki, Michael; Avramova, Zoya

    1998-01-01

    For the most part, studies of grass genome structure have been limited to the generation of whole-genome genetic maps or the fine structure and sequence analysis of single genes or gene clusters. We have investigated large contiguous segments of the genomes of maize, sorghum, and rice, primarily focusing on intergenic spaces. Our data indicate that much (>50%) of the maize genome is composed of interspersed repetitive DNAs, primarily nested retrotransposons that in...

  18. The value of new genome references.

    Science.gov (United States)

    Worley, Kim C; Richards, Stephen; Rogers, Jeffrey

    2017-09-15

    Genomic information has become a ubiquitous and almost essential aspect of biological research. Over the last 10-15 years, the cost of generating sequence data from DNA or RNA samples has dramatically declined and our ability to interpret those data increased just as remarkably. Although it is still possible for biologists to conduct interesting and valuable research on species for which genomic data are not available, the impact of having access to a high quality whole genome reference assembly for a given species is nothing short of transformational. Research on a species for which we have no DNA or RNA sequence data is restricted in fundamental ways. In contrast, even access to an initial draft quality genome (see below for definitions) opens a wide range of opportunities that are simply not available without that reference genome assembly. Although a complete discussion of the impact of genome sequencing and assembly is beyond the scope of this short paper, the goal of this review is to summarize the most common and highest impact contributions that whole genome sequencing and assembly has had on comparative and evolutionary biology. Copyright © 2016. Published by Elsevier Inc.

  19. Cancer genomics

    DEFF Research Database (Denmark)

    Norrild, Bodil; Guldberg, Per; Ralfkiær, Elisabeth Methner

    2007-01-01

    Almost all cells in the human body contain a complete copy of the genome with an estimated number of 25,000 genes. The sequences of these genes make up about three percent of the genome and comprise the inherited set of genetic information. The genome also contains information that determines whe...

  20. High molecular weight DNA assembly in vivo for synthetic biology applications.

    Science.gov (United States)

    Juhas, Mario; Ajioka, James W

    2017-05-01

    DNA assembly is the key technology of the emerging interdisciplinary field of synthetic biology. While the assembly of smaller DNA fragments is usually performed in vitro, high molecular weight DNA molecules are assembled in vivo via homologous recombination in the host cell. Escherichia coli, Bacillus subtilis and Saccharomyces cerevisiae are the main hosts used for DNA assembly in vivo. Progress in DNA assembly over the last few years has paved the way for the construction of whole genomes. This review provides an update on recent synthetic biology advances with particular emphasis on high molecular weight DNA assembly in vivo in E. coli, B. subtilis and S. cerevisiae. Special attention is paid to the assembly of whole genomes, such as those of the first synthetic cell, synthetic yeast and minimal genomes.

  1. Newnes electronics assembly handbook

    CERN Document Server

    Brindley, Keith

    2013-01-01

    Newnes Electronics Assembly Handbook: Techniques, Standards and Quality Assurance focuses on the aspects of electronic assembling. The handbook first looks at the printed circuit board (PCB). Base materials, basic mechanical properties, cleaning of assemblies, design, and PCB manufacturing processes are then explained. The text also discusses surface mounted assemblies and packaging of electromechanical assemblies, as well as the soldering process. Requirements for the soldering process; solderability and protective coatings; cleaning of PCBs; and mass solder/component reflow soldering are des

  2. Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes

    Science.gov (United States)

    Doerr, Daniel; Chauve, Cedric

    2017-01-01

    Yersinia pestis is the causative agent of the bubonic plague, a disease responsible for several dramatic historical pandemics. Progress in ancient DNA (aDNA) sequencing rendered possible the sequencing of whole genomes of important human pathogens, including the ancient Y. pestis strains responsible for outbreaks of the bubonic plague in London in the 14th century and in Marseille in the 18th century, among others. However, aDNA sequencing data are still characterized by short reads and non-uniform coverage, so assembling ancient pathogen genomes remains challenging and often prevents a detailed study of genome rearrangements. It has recently been shown that comparative scaffolding approaches can improve the assembly of ancient Y. pestis genomes at a chromosome level. In the present work, we address the last step of genome assembly, the gap-filling stage. We describe an optimization-based method AGapEs (ancestral gap estimation) to fill in inter-contig gaps using a combination of a template obtained from related extant genomes and aDNA reads. We show how this approach can be used to refine comparative scaffolding by selecting contig adjacencies supported by a mix of unassembled aDNA reads and comparative signal. We applied our method to two Y. pestis data sets from the London and Marseilles outbreaks, for which we obtained highly improved genome assemblies for both genomes, comprised of, respectively, five and six scaffolds with 95 % of the assemblies supported by ancient reads. We analysed the genome evolution between both ancient genomes in terms of genome rearrangements, and observed a high level of synteny conservation between these strains. PMID:29114402

  3. MetaQUAST: evaluation of metagenome assemblies.

    Science.gov (United States)

    Mikheenko, Alla; Saveliev, Vladislav; Gurevich, Alexey

    2016-04-01

    During the past years we have witnessed the rapid development of new metagenome assembly methods. Although there are many benchmark utilities designed for single-genome assemblies, there is no well-recognized evaluation and comparison tool for metagenomic-specific analogues. In this article, we present MetaQUAST, a modification of QUAST, the state-of-the-art tool for genome assembly evaluation based on alignment of contigs to a reference. MetaQUAST addresses such metagenome datasets features as (i) unknown species content by detecting and downloading reference sequences, (ii) huge diversity by giving comprehensive reports for multiple genomes and (iii) presence of highly relative species by detecting chimeric contigs. We demonstrate MetaQUAST performance by comparing several leading assemblers on one simulated and two real datasets. http://bioinf.spbau.ru/metaquast aleksey.gurevich@spbu.ru Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

    Science.gov (United States)

    Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

    2013-04-10

    Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.

  5. The W22 genome: a foundation for maize functional genomics and transposon biology

    Science.gov (United States)

    The maize W22 inbred has served as a platform for maize genetics since the mid twentieth century. To streamline maize genome analyses, we have sequenced and de novo assembled a W22 reference genome using small-read sequencing technologies. We show that significant structural heterogeneity exists in ...

  6. Genome-wide comparative analysis of four Indian Drosophila species.

    Science.gov (United States)

    Mohanty, Sujata; Khanna, Radhika

    2017-12-01

    Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.

  7. Snake Genome Sequencing: Results and Future Prospects.

    Science.gov (United States)

    Kerkkamp, Harald M I; Kini, R Manjunatha; Pospelov, Alexey S; Vonk, Freek J; Henkel, Christiaan V; Richardson, Michael K

    2016-12-01

    Snake genome sequencing is in its infancy-very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  8. Snake Genome Sequencing: Results and Future Prospects

    Directory of Open Access Journals (Sweden)

    Harald M. I. Kerkkamp

    2016-12-01

    Full Text Available Snake genome sequencing is in its infancy—very much behind the progress made in sequencing the genomes of humans, model organisms and pathogens relevant to biomedical research, and agricultural species. We provide here an overview of some of the snake genome projects in progress, and discuss the biological findings, with special emphasis on toxinology, from the small number of draft snake genomes already published. We discuss the future of snake genomics, pointing out that new sequencing technologies will help overcome the problem of repetitive sequences in assembling snake genomes. Genome sequences are also likely to be valuable in examining the clustering of toxin genes on the chromosomes, in designing recombinant antivenoms and in studying the epigenetic regulation of toxin gene expression.

  9. The bonobo genome compared with the chimpanzee and human genomes

    Science.gov (United States)

    Prüfer, Kay; Munch, Kasper; Hellmann, Ines; Akagi, Keiko; Miller, Jason R.; Walenz, Brian; Koren, Sergey; Sutton, Granger; Kodira, Chinnappa; Winer, Roger; Knight, James R.; Mullikin, James C.; Meader, Stephen J.; Ponting, Chris P.; Lunter, Gerton; Higashino, Saneyuki; Hobolth, Asger; Dutheil, Julien; Karakoç, Emre; Alkan, Can; Sajjadian, Saba; Catacchio, Claudia Rita; Ventura, Mario; Marques-Bonet, Tomas; Eichler, Evan E.; André, Claudine; Atencia, Rebeca; Mugisha, Lawrence; Junhold, Jörg; Patterson, Nick; Siebauer, Michael; Good, Jeffrey M.; Fischer, Anne; Ptak, Susan E.; Lachmann, Michael; Symer, David E.; Mailund, Thomas; Schierup, Mikkel H.; Andrés, Aida M.; Kelso, Janet; Pääbo, Svante

    2012-01-01

    Two African apes are the closest living relatives of humans: the chimpanzee (Pan troglodytes) and the bonobo (Pan paniscus). Although they are similar in many respects, bonobos and chimpanzees differ strikingly in key social and sexual behaviours1–4, and for some of these traits they show more similarity with humans than with each other. Here we report the sequencing and assembly of the bonobo genome to study its evolutionary relationship with the chimpanzee and human genomes. We find that more than three per cent of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. These regions allow various aspects of the ancestry of the two ape species to be reconstructed. In addition, many of the regions that overlap genes may eventually help us understand the genetic basis of phenotypes that humans share with one of the two apes to the exclusion of the other. PMID:22722832

  10. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    Science.gov (United States)

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  11. Partial characterization of the lettuce infectious yellows virus genomic RNAs, identification of the coat protein gene and comparison of its amino acid sequence with those of other filamentous RNA plant viruses.

    Science.gov (United States)

    Klaassen, V A; Boeshore, M; Dolja, V V; Falk, B W

    1994-07-01

    Purified virions of lettuce infectious yellows virus (LIYV), a tentative member of the closterovirus group, contained two RNAs of approximately 8500 and 7300 nucleotides (RNAs 1 and 2 respectively) and a single coat protein species with M(r) of approximately 28,000. LIYV-infected plants contained multiple dsRNAs. The two largest were the correct size for the replicative forms of LIYV virion RNAs 1 and 2. To assess the relationships between LIYV RNAs 1 and 2, cDNAs corresponding to the virion RNAs were cloned. Northern blot hybridization analysis showed no detectable sequence homology between these RNAs. A partial amino acid sequence obtained from purified LIYV coat protein was found to align in the most upstream of four complete open reading frames (ORFs) identified in a LIYV RNA 2 cDNA clone. The identity of this ORF was confirmed as the LIYV coat protein gene by immunological analysis of the gene product expressed in vitro and in Escherichia coli. Computer analysis of the LIYV coat protein amino acid sequence indicated that it belongs to a large family of proteins forming filamentous capsids of RNA plant viruses. The LIYV coat protein appears to be most closely related to the coat proteins of two closteroviruses, beet yellows virus and citrus tristeza virus.

  12. Comparative genomic data of the Avian Phylogenomics Project.

    Science.gov (United States)

    Zhang, Guojie; Li, Bo; Li, Cai; Gilbert, M Thomas P; Jarvis, Erich D; Wang, Jun

    2014-01-01

    The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of

  13. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates

    Energy Technology Data Exchange (ETDEWEB)

    Nordberg, Henrik [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Cantor, Michael [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Dusheyko, Serge [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Hua, Susan [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Poliakov, Alexander [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Shabalov, Igor [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Smirnova, Tatyana [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Grigoriev, Igor V. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Dubchak, Inna [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States)

    2013-11-12

    The U.S. Department of Energy (DOE) Joint Genome Institute (JGI), a national user facility, serves the diverse scientific community by providing integrated high-throughput sequencing and computational analysis to enable system-based scientific approaches in support of DOE missions related to clean energy generation and environmental characterization. The JGI Genome Portal (http://genome.jgi.doe.gov) provides unified access to all JGI genomic databases and analytical tools. The JGI maintains extensive data management systems and specialized analytical capabilities to manage and interpret complex genomic data. A user can search, download and explore multiple data sets available for all DOE JGI sequencing projects including their status, assemblies and annotations of sequenced genomes. In this paper, we describe major updates of the Genome Portal in the past 2 years with a specific emphasis on efficient handling of the rapidly growing amount of diverse genomic data accumulated in JGI.

  14. Icarus: visualizer for de novo assembly evaluation.

    Science.gov (United States)

    Mikheenko, Alla; Valin, Gleb; Prjibelski, Andrey; Saveliev, Vladislav; Gurevich, Alexey

    2016-11-01

    : Data visualization plays an increasingly important role in NGS data analysis. With advances in both sequencing and computational technologies, it has become a new bottleneck in genomics studies. Indeed, evaluation of de novo genome assemblies is one of the areas that can benefit from the visualization. However, even though multiple quality assessment methods are now available, existing visualization tools are hardly suitable for this purpose. Here, we present Icarus-a novel genome visualizer for accurate assessment and analysis of genomic draft assemblies, which is based on the tool QUAST. Icarus can be used in studies where a related reference genome is available, as well as for non-model organisms. The tool is available online and as a standalone application. http://cab.spbu.ru/software/icarus CONTACT: aleksey.gurevich@spbu.ruSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Comparative Genome Analysis of Lolium-Festuca Complex Species

    DEFF Research Database (Denmark)

    Czaban, Adrian; Byrne, Stephen; Sharma, Sapna

    2015-01-01

    , winter hardiness, drought tolerance and resistance to grazing. In this study we have sequenced and assembled the low copy fraction of the genomes of Lolium westerwoldicum, Lolium multiflorum, Festuca pratensis and Lolium temulentum. We have also generated de-novo transcriptome assemblies for each species......, and these have aided in the annotation of the genomic sequence. Using this data we were able to generate annotated assemblies of the gene rich regions of the four species to complement the already sequenced Lolium perenne genome. Using these gene models we have identified orthologous genes between the species...

  16. The rearranged mitochondrial genome of Leptopilina boulardi (Hymenoptera: Figitidae, a parasitoid wasp of Drosophila

    Directory of Open Access Journals (Sweden)

    Daniel S. Oliveira

    Full Text Available Abstract The partial mitochondrial genome sequence of Leptopilina boulardi (Hymenoptera: Figitidae was characterized. Illumina sequencing was used yielding 35,999,679 reads, from which 102,482 were utilized in the assembly. The length of the sequenced region of this partial mitochondrial genome is 15,417 bp, consisting of 13 protein-coding, two rRNA, and 21tRNA genes (the trnaM failed to be sequenced and a partial A+T-rich region. All protein-coding genes start with ATN codons. Eleven protein-coding genes presented TAA stop codons, whereas ND6 and COII that presented TA, and T nucleotides, respectively. The gene pattern revealed extensive rearrangements compared to the typical pattern generally observed in insects. These rearrangements involve two protein-coding and two ribosomal genes, along with the 16 tRNA genes. This gene order is different from the pattern described for Ibalia leucospoides (Ibaliidae, Cynipoidea, suggesting that this particular gene order can be variable among Cynipoidea superfamily members. A maximum likelihood phylogenetic analysis of the main groups of Apocrita was performed using amino acid sequence of 13 protein-coding genes, showing monophyly for the Cynipoidea superfamily within the Hymenoptera phylogeny.

  17. The Perennial Ryegrass GenomeZipper: Targeted Use of Genome Resources for Comparative Grass Genomics1[C][W

    Science.gov (United States)

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F.X.; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-01-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species. PMID:23184232

  18. Genome Sequences of Marine Shrimp Exopalaemon carinicauda Holthuis Provide Insights into Genome Size Evolution of Caridea.

    Science.gov (United States)

    Yuan, Jianbo; Gao, Yi; Zhang, Xiaojun; Wei, Jiankai; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

    2017-07-05

    Crustacea, particularly Decapoda, contains many economically important species, such as shrimps and crabs. Crustaceans exhibit enormous (nearly 500-fold) variability in genome size. However, limited genome resources are available for investigating these species. Exopalaemon carinicauda Holthuis, an economical caridean shrimp, is a potential ideal experimental animal for research on crustaceans. In this study, we performed low-coverage sequencing and de novo assembly of the E. carinicauda genome. The assembly covers more than 95% of coding regions. E. carinicauda possesses a large complex genome (5.73 Gb), with size twice higher than those of many decapod shrimps. As such, comparative genomic analyses were implied to investigate factors affecting genome size evolution of decapods. However, clues associated with genome duplication were not identified, and few horizontally transferred sequences were detected. Ultimately, the burst of transposable elements, especially retrotransposons, was determined as the major factor influencing genome expansion. A total of 2 Gb repeats were identified, and RTE-BovB, Jockey, Gypsy, and DIRS were the four major retrotransposons that significantly expanded. Both recent (Jockey and Gypsy) and ancestral (DIRS) originated retrotransposons responsible for the genome evolution. The E. carinicauda genome also exhibited potential for the genomic and experimental research of shrimps.

  19. Draft genome of the gayal, Bos frontalis

    Science.gov (United States)

    Wang, Ming-Shan; Zeng, Yan; Wang, Xiao; Nie, Wen-Hui; Wang, Jin-Huan; Su, Wei-Ting; Xiong, Zi-Jun; Wang, Sheng; Qu, Kai-Xing; Yan, Shou-Qing; Yang, Min-Min; Wang, Wen; Dong, Yang; Zhang, Ya-Ping

    2017-01-01

    Abstract Gayal (Bos frontalis), also known as mithan or mithun, is a large endangered semi-domesticated bovine that has a limited geographical distribution in the hill-forests of China, Northeast India, Bangladesh, Myanmar, and Bhutan. Many questions about the gayal such as its origin, population history, and genetic basis of local adaptation remain largely unresolved. De novo sequencing and assembly of the whole gayal genome provides an opportunity to address these issues. We report a high-depth sequencing, de novo assembly, and annotation of a female Chinese gayal genome. Based on the Illumina genomic sequencing platform, we have generated 350.38 Gb of raw data from 16 different insert-size libraries. A total of 276.86 Gb of clean data is retained after quality control. The assembled genome is about 2.85 Gb with scaffold and contig N50 sizes of 2.74 Mb and 14.41 kb, respectively. Repetitive elements account for 48.13% of the genome. Gene annotation has yielded 26 667 protein-coding genes, of which 97.18% have been functionally annotated. BUSCO assessment shows that our assembly captures 93% (3183 of 4104) of the core eukaryotic genes and 83.1% of vertebrate universal single-copy orthologs. We provide the first comprehensive de novo genome of the gayal. This genetic resource is integral for investigating the origin of the gayal and performing comparative genomic studies to improve understanding of the speciation and divergence of bovine species. The assembled genome could be used as reference in future population genetic studies of gayal. PMID:29048483

  20. Hyperbolic partial differential equations

    CERN Document Server

    Witten, Matthew

    1986-01-01

    Hyperbolic Partial Differential Equations III is a refereed journal issue that explores the applications, theory, and/or applied methods related to hyperbolic partial differential equations, or problems arising out of hyperbolic partial differential equations, in any area of research. This journal issue is interested in all types of articles in terms of review, mini-monograph, standard study, or short communication. Some studies presented in this journal include discretization of ideal fluid dynamics in the Eulerian representation; a Riemann problem in gas dynamics with bifurcation; periodic M

  1. Successful removable partial dentures.

    Science.gov (United States)

    Lynch, Christopher D

    2012-03-01

    Removable partial dentures (RPDs) remain a mainstay of prosthodontic care for partially dentate patients. Appropriately designed, they can restore masticatory efficiency, improve aesthetics and speech, and help secure overall oral health. However, challenges remain in providing such treatments, including maintaining adequate plaque control, achieving adequate retention, and facilitating patient tolerance. The aim of this paper is to review the successful provision of RPDs. Removable partial dentures are a successful form of treatment for replacing missing teeth, and can be successfully provided with appropriate design and fabrication concepts in mind.

  2. Beginning partial differential equations

    CERN Document Server

    O'Neil, Peter V

    2011-01-01

    A rigorous, yet accessible, introduction to partial differential equations-updated in a valuable new edition Beginning Partial Differential Equations, Second Edition provides a comprehensive introduction to partial differential equations (PDEs) with a special focus on the significance of characteristics, solutions by Fourier series, integrals and transforms, properties and physical interpretations of solutions, and a transition to the modern function space approach to PDEs. With its breadth of coverage, this new edition continues to present a broad introduction to the field, while also addres

  3. Fuel assembly guide tube

    International Nuclear Information System (INIS)

    Jabsen, F.S.

    1979-01-01

    This invention is directed toward a nuclear fuel assembly guide tube arrangement which restrains spacer grid movement due to coolant flow and which offers secondary means for supporting a fuel assembly during handling and transfer operations

  4. Partial knee replacement - slideshow

    Science.gov (United States)

    ... page: //medlineplus.gov/ency/presentations/100225.htm Partial knee replacement - series—Normal anatomy To use the sharing ... A.M. Editorial team. Related MedlinePlus Health Topics Knee Replacement A.D.A.M., Inc. is accredited ...

  5. De novo assembly of highly diverse viral populations

    Directory of Open Access Journals (Sweden)

    Yang Xiao

    2012-09-01

    Full Text Available Abstract Background Extensive genetic diversity in viral populations within infected hosts and the divergence of variants from existing reference genomes impede the analysis of deep viral sequencing data. A de novo population consensus assembly is valuable both as a single linear representation of the population and as a backbone on which intra-host variants can be accurately mapped. The availability of consensus assemblies and robustly mapped variants are crucial to the genetic study of viral disease progression, transmission dynamics, and viral evolution. Existing de novo assembly techniques fail to robustly assemble ultra-deep sequence data from genetically heterogeneous populations such as viruses into full-length genomes due to the presence of extensive genetic variability, contaminants, and variable sequence coverage. Results We present VICUNA, a de novo assembly algorithm suitable for generating consensus assemblies from genetically heterogeneous populations. We demonstrate its effectiveness on Dengue, Human Immunodeficiency and West Nile viral populations, representing a range of intra-host diversity. Compared to state-of-the-art assemblers designed for haploid or diploid systems, VICUNA recovers full-length consensus and captures insertion/deletion polymorphisms in diverse samples. Final assemblies maintain a high base calling accuracy. VICUNA program is publicly available at: http://www.broadinstitute.org/scientific-community/science/projects/viral-genomics/ viral-genomics-analysis-software. Conclusions We developed VICUNA, a publicly available software tool, that enables consensus assembly of ultra-deep sequence derived from diverse viral populations. While VICUNA was developed for the analysis of viral populations, its application to other heterogeneous sequence data sets such as metagenomic or tumor cell population samples may prove beneficial in these fields of research.

  6. Polymer Directed Protein Assemblies

    NARCIS (Netherlands)

    van Rijn, Patrick

    2013-01-01

    Protein aggregation and protein self-assembly is an important occurrence in natural systems, and is in some form or other dictated by biopolymers. Very obvious influences of biopolymers on protein assemblies are, e. g., virus particles. Viruses are a multi-protein assembly of which the morphology is

  7. Nuclear reactor fuel assembly

    International Nuclear Information System (INIS)

    Sasaki, Y.; Tashima, J.

    1975-01-01

    A description is given of nuclear reactor fuel assemblies arranged in the form of a lattice wherein there is attached to the interface of one of two adjacent fuel assemblies a plate spring having a concave portion curved toward said interface and to the interface of the other fuel assembly a plate spring having a convex portion curved away from said interface

  8. Beginning partial differential equations

    CERN Document Server

    O'Neil, Peter V

    2014-01-01

    A broad introduction to PDEs with an emphasis on specialized topics and applications occurring in a variety of fields Featuring a thoroughly revised presentation of topics, Beginning Partial Differential Equations, Third Edition provides a challenging, yet accessible,combination of techniques, applications, and introductory theory on the subjectof partial differential equations. The new edition offers nonstandard coverageon material including Burger's equation, the telegraph equation, damped wavemotion, and the use of characteristics to solve nonhomogeneous problems. The Third Edition is or

  9. Sensor mount assemblies and sensor assemblies

    Science.gov (United States)

    Miller, David H [Redondo Beach, CA

    2012-04-10

    Sensor mount assemblies and sensor assemblies are provided. In an embodiment, by way of example only, a sensor mount assembly includes a busbar, a main body, a backing surface, and a first finger. The busbar has a first end and a second end. The main body is overmolded onto the busbar. The backing surface extends radially outwardly relative to the main body. The first finger extends axially from the backing surface, and the first finger has a first end, a second end, and a tooth. The first end of the first finger is disposed on the backing surface, and the tooth is formed on the second end of the first finger.

  10. The genome of Eucalyptus grandis

    Energy Technology Data Exchange (ETDEWEB)

    Myburg, Alexander A.; Grattapaglia, Dario; Tuskan, Gerald A.; Hellsten, Uffe; Hayes, Richard D.; Grimwood, Jane; Jenkins, Jerry; Lindquist, Erika; Tice, Hope; Bauer, Diane; Goodstein, David M.; Dubchak, Inna; Poliakov, Alexandre; Mizrachi, Eshchar; Kullan, Anand R. K.; Hussey, Steven G.; Pinard, Desre; van der Merwe, Karen; Singh, Pooja; van Jaarsveld, Ida; Silva-Junior, Orzenil B.; Togawa, Roberto C.; Pappas, Marilia R.; Faria, Danielle A.; Sansaloni, Carolina P.; Petroli, Cesar D.; Yang, Xiaohan; Ranjan, Priya; Tschaplinski, Timothy J.; Ye, Chu-Yu; Li, Ting; Sterck, Lieven; Vanneste, Kevin; Murat, Florent; Soler, Marçal; Clemente, Hélène San; Saidi, Naijib; Cassan-Wang, Hua; Dunand, Christophe; Hefer, Charles A.; Bornberg-Bauer, Erich; Kersting, Anna R.; Vining, Kelly; Amarasinghe, Vindhya; Ranik, Martin; Naithani, Sushma; Elser, Justin; Boyd, Alexander E.; Liston, Aaron; Spatafora, Joseph W.; Dharmwardhana, Palitha; Raja, Rajani; Sullivan, Christopher; Romanel, Elisson; Alves-Ferreira, Marcio; Külheim, Carsten; Foley, William; Carocha, Victor; Paiva, Jorge; Kudrna, David; Brommonschenkel, Sergio H.; Pasquali, Giancarlo; Byrne, Margaret; Rigault, Philippe; Tibbits, Josquin; Spokevicius, Antanas; Jones, Rebecca C.; Steane, Dorothy A.; Vaillancourt, René E.; Potts, Brad M.; Joubert, Fourie; Barry, Kerrie; Pappas, Georgios J.; Strauss, Steven H.; Jaiswal, Pankaj; Grima-Pettenati, Jacqueline; Salse, Jérôme; Van de Peer, Yves; Rokhsar, Daniel S.; Schmutz, Jeremy

    2014-06-11

    Eucalypts are the world s most widely planted hardwood trees. Their broad adaptability, rich species diversity, fast growth and superior multipurpose wood, have made them a global renewable resource of fiber and energy that mitigates human pressures on natural forests. We sequenced and assembled >94% of the 640 Mbp genome of Eucalyptus grandis into its 11 chromosomes. A set of 36,376 protein coding genes were predicted revealing that 34% occur in tandem duplications, the largest proportion found thus far in any plant genome. Eucalypts also show the highest diversity of genes for plant specialized metabolism that act as chemical defence against biotic agents and provide unique pharmaceutical oils. Resequencing of a set of inbred tree genomes revealed regions of strongly conserved heterozygosity, likely hotspots of inbreeding depression. The resequenced genome of the sister species E. globulus underscored the high inter-specific genome colinearity despite substantial genome size variation in the genus. The genome of E. grandis is the first reference for the early diverging Rosid order Myrtales and is placed here basal to the Eurosids. This resource expands knowledge on the unique biology of large woody perennials and provides a powerful tool to accelerate comparative biology, breeding and biotechnology.

  11. Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

    Science.gov (United States)

    Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.

  12. Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes.

    Science.gov (United States)

    Jeong, Young-Min; Kim, Namshin; Ahn, Byung Ohg; Oh, Mijin; Chung, Won-Hyong; Chung, Hee; Jeong, Seongmun; Lim, Ki-Byung; Hwang, Yoon-Jung; Kim, Goon-Bo; Baek, Seunghoon; Choi, Sang-Bong; Hyung, Dae-Jin; Lee, Seung-Won; Sohn, Seong-Han; Kwon, Soo-Jin; Jin, Mina; Seol, Young-Joo; Chae, Won Byoung; Choi, Keun Jin; Park, Beom-Seok; Yu, Hee-Ju; Mun, Jeong-Hwan

    2016-07-01

    This study presents a chromosome-scale draft genome sequence of radish that is assembled into nine chromosomal pseudomolecules. A comprehensive comparative genome analysis with the Brassica genomes provides genomic evidences on the evolution of the mesohexaploid radish genome. Radish (Raphanus sativus L.) is an agronomically important root vegetable crop and its origin and phylogenetic position in the tribe Brassiceae is controversial. Here we present a comprehensive analysis of the radish genome based on the chromosome sequences of R. sativus cv. WK10039. The radish genome was sequenced and assembled into 426.2 Mb spanning >98 % of the gene space, of which 344.0 Mb were integrated into nine chromosome pseudomolecules. Approximately 36 % of the genome was repetitive sequences and 46,514 protein-coding genes were predicted and annotated. Comparative mapping of the tPCK-like ancestral genome revealed that the radish genome has intermediate characteristics between the Brassica A/C and B genomes in the triplicated segments, suggesting an internal origin from the genus Brassica. The evolutionary characteristics shared between radish and other Brassica species provided genomic evidences that the current form of nine chromosomes in radish was rearranged from the chromosomes of hexaploid progenitor. Overall, this study provides a chromosome-scale draft genome sequence of radish as well as novel insight into evolution of the mesohexaploid genomes in the tribe Brassiceae.

  13. Genome Imprinting

    Indian Academy of Sciences (India)

    the cell nucleus (mitochondrial and chloroplast genomes), and. (3) traits governed ... tively good embryonic development but very poor development of membranes and ... Human homologies for the type of situation described above are naturally ..... imprint; (b) New modifications of the paternal genome in germ cells of each ...

  14. Baculovirus Genomics

    NARCIS (Netherlands)

    Oers, van M.M.; Vlak, J.M.

    2007-01-01

    Baculovirus genomes are covalently closed circles of double stranded-DNA varying in size between 80 and 180 kilobase-pair. The genomes of more than fourty-one baculoviruses have been sequenced to date. The majority of these (37) are pathogenic to lepidopteran hosts; three infect sawflies

  15. Genomic Testing

    Science.gov (United States)

    ... this database. Top of Page Evaluation of Genomic Applications in Practice and Prevention (EGAPP™) In 2004, the Centers for Disease Control and Prevention launched the EGAPP initiative to establish and test a ... and other applications of genomic technology that are in transition from ...

  16. Ancient genomes

    OpenAIRE

    Hoelzel, A Rus

    2005-01-01

    Ever since its invention, the polymerase chain reaction has been the method of choice for work with ancient DNA. In an application of modern genomic methods to material from the Pleistocene, a recent study has instead undertaken to clone and sequence a portion of the ancient genome of the cave bear.

  17. Soldering in electronics assembly

    CERN Document Server

    Judd, Mike

    2013-01-01

    Soldering in Electronics Assembly discusses several concerns in soldering of electronic assemblies. The book is comprised of nine chapters that tackle different areas in electronic assembly soldering. Chapter 1 discusses the soldering process itself, while Chapter 2 covers the electronic assemblies. Chapter 3 talks about solders and Chapter 4 deals with flux. The text also tackles the CS and SC soldering process. The cleaning of soldered assemblies, solder quality, and standards and specifications are also discussed. The book will be of great use to professionals who deal with electronic assem

  18. Nuclear fuel string assembly

    International Nuclear Information System (INIS)

    Ip, A.K.; Koyanagi, K.; Tarasuk, W.R.

    1976-01-01

    A method of fabricating rodded fuels suitable for use in pressure tube type reactors and in pressure vessel type reactors is described. Fuel rods are secured as an inner and an outer sub-assembly, each rod attached between mounting rings secured to the rod ends. The two sub-assemblies are telescoped together and positioned by spaced thimbles located between them to provide precise positioning while permittng differential axial movement between the sub-assemblies. Such sub-assemblies are particularly suited for mounting as bundle strings. The method provides particular advantages in the assembly of annular-section fuel pins, which includes booster fuel containing enriched fuel material. (LL)

  19. Nuclear reactor fuel assembly

    International Nuclear Information System (INIS)

    Marmonier, Pierre; Mesnage, Bernard; Nervi, J.C.

    1975-01-01

    This invention refers to fuel assemblies for a liquid metal cooled fast neutron reactor. Each assembly is composed of a hollow vertical casing, of regular polygonal section, containing a bundle of clad pins filled with a fissile or fertile substance. The casing is open at its upper end and has a cylindrical foot at its lower end for positioning the assembly in a housing provided in the horizontal diagrid, on which the core assembly rests. A set of flat bars located on the external surface of the casing enables it to be correctly orientated in its housing among the other core assemblies [fr

  20. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    Directory of Open Access Journals (Sweden)

    Katelyn McNair

    2015-06-01

    Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.