WorldWideScience

Sample records for whole-genome amplified serum

  1. Assessing the utility of whole-genome amplified serum DNA for array-based high throughput genotyping.

    Science.gov (United States)

    Bucasas, Kristine L; Pandya, Gagan A; Pradhan, Sonal; Fleischmann, Robert D; Peterson, Scott N; Belmont, John W

    2009-12-18

    Whole genome amplification (WGA) offers new possibilities for genome-wide association studies where limited DNA samples have been collected. This study provides a realistic and high-precision assessment of WGA DNA genotyping performance from 20-year old archived serum samples using the Affymetrix Genome-Wide Human SNP Array 6.0 (SNP6.0) platform. Whole-genome amplified (WGA) DNA samples from 45 archived serum replicates and 5 fresh sera paired with non-amplified genomic DNA were genotyped in duplicate. All genotyped samples passed the imposed QC thresholds for quantity and quality. In general, WGA serum DNA samples produced low call rates (45.00 +/- 2.69%), although reproducibility for successfully called markers was favorable (concordance = 95.61 +/- 4.39%). Heterozygote dropouts explained the majority (>85% in technical replicates, 50% in paired genomic/serum samples) of discordant results. Genotyping performance on WGA serum DNA samples was improved by implementation of Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) algorithm but at the loss of many samples which failed to pass its quality threshold. Poor genotype clustering was evident in the samples that failed the CRLMM confidence threshold. We conclude that while it is possible to extract genomic DNA and subsequently perform whole-genome amplification from archived serum samples, WGA serum DNA did not perform well and appeared unsuitable for high-resolution genotyping on these arrays.

  2. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA

    DEFF Research Database (Denmark)

    Poulsen, Jesper Buchhave; Lescai, Francesco; Grove, Jakob

    2016-01-01

    be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA). Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject...... from variant calls. No differences were observed substituting 2x3.2 with 2x1.6 mm discs, allowing for additional reduction of sample material in future projects....

  3. Evaluation of whole genome amplified DNA to decrease material expenditure and increase quality

    Directory of Open Access Journals (Sweden)

    Marie Bækvad-Hansen

    2017-06-01

    Discussion: Whole genome amplified DNA samples from dried blood spots is well suited for array genotyping and produces robust and reliable genotype data. However, the amplification process introduces additional noise to the data, making detection of structural variants such as copy number variants difficult. With this study, we explore ways of optimizing the amplification protocol in order to reduce noise and increase data quality. We found, that the amplification process was very robust, and that changes in amplification time or temperature did not alter the genotyping calls or quality of the array data. Adding additional replicates of each sample also lead to insignificant changes in the array data. Thus, the amount of noise introduced by the amplification process was consistent regardless of changes made to the amplification protocol. We also explored ways of decreasing material expenditure by reducing the spot size or the amplification reaction volume. The reduction did not affect the quality of the genotyping data.

  4. Multiplex SNP analysis on whole genome amplified DNA from archived dried bloodspots, a validation study

    DEFF Research Database (Denmark)

    Tvedegaard, Kristine C.; Parner, Erik; Hooper, Craig W.

    on whole genome amplified (WGA) DNA from archived dried bloodspots. METHODS AND MATERIAL: The chemically synthesized new base pair (isoC and isoG) allow for the creation of new DNA strands that provide superior specificity and allow development of assays with greater sensitivity than conventional methods...... is a further development of allele specific primer extension (ASPE) for multiplex SNP analysis based on the Luminex 100 IS platform. It uses isobases (isoC and isoG) and the software MultiCode-PLx platform for data analysis and data handling. We validate the EraGen multicode system in two 6-plex assays used....... To validate the method 900 WGA DNA samples were genotyped in duplets. 10-20 % of all samples were sequenced to be used as reference. Accuracy and repeatability was estimated for each SNP. Robustness was estimated as rate of conclusive outcomes for each SNP. RESULTS: The accuracy ranged from 98...

  5. Whole-genome amplified DNA from stored dried blood spots is reliable in high resolution melting curve and sequencing analysis

    DEFF Research Database (Denmark)

    Winkel, Bo G; Hollegaard, Mads Vilhelm; Olesen, Morten S

    2011-01-01

    The use of dried blood spots (DBS) samples in genomic workup has been limited by the relative low amounts of genomic DNA (gDNA) they contain. It remains to be proven that whole genome amplified DNA (wgaDNA) from stored DBS samples, constitutes a reliable alternative to gDNA.We wanted to compare...

  6. Sensitive and specific KRAS somatic mutation analysis on whole-genome amplified DNA from archival tissues.

    Science.gov (United States)

    van Eijk, Ronald; van Puijenbroek, Marjo; Chhatta, Amiet R; Gupta, Nisha; Vossen, Rolf H A M; Lips, Esther H; Cleton-Jansen, Anne-Marie; Morreau, Hans; van Wezel, Tom

    2010-01-01

    Kirsten RAS (KRAS) is a small GTPase that plays a key role in Ras/mitogen-activated protein kinase signaling; somatic mutations in KRAS are frequently found in many cancers. The most common KRAS mutations result in a constitutively active protein. Accurate detection of KRAS mutations is pivotal to the molecular diagnosis of cancer and may guide proper treatment selection. Here, we describe a two-step KRAS mutation screening protocol that combines whole-genome amplification (WGA), high-resolution melting analysis (HRM) as a prescreen method for mutation carrying samples, and direct Sanger sequencing of DNA from formalin-fixed, paraffin-embedded (FFPE) tissue, from which limited amounts of DNA are available. We developed target-specific primers, thereby avoiding amplification of homologous KRAS sequences. The addition of herring sperm DNA facilitated WGA in DNA samples isolated from as few as 100 cells. KRAS mutation screening using high-resolution melting analysis on wgaDNA from formalin-fixed, paraffin-embedded tissue is highly sensitive and specific; additionally, this method is feasible for screening of clinical specimens, as illustrated by our analysis of pancreatic cancers. Furthermore, PCR on wgaDNA does not introduce genotypic changes, as opposed to unamplified genomic DNA. This method can, after validation, be applied to virtually any potentially mutated region in the genome.

  7. Assessing the utility of whole genome amplified DNA for next-generation molecular ecology.

    Science.gov (United States)

    Blair, Christopher; Campbell, C Ryan; Yoder, Anne D

    2015-09-01

    DNA quantity can be a hindrance in ecological and evolutionary research programmes due to a range of factors including endangered status of target organisms, available tissue type, and the impact of field conditions on preservation methods. A potential solution to low-quantity DNA lies in whole genome amplification (WGA) techniques that can substantially increase DNA yield. To date, few studies have rigorously examined sequence bias that might result from WGA and next-generation sequencing of nonmodel taxa. To address this knowledge deficit, we use multiple displacement amplification (MDA) and double-digest RAD sequencing on the grey mouse lemur (Microcebus murinus) to quantify bias in genome coverage and SNP calls when compared to raw genomic DNA (gDNA). We focus our efforts in providing baseline estimates of potential bias by following manufacturer's recommendations for starting DNA quantities (>100 ng). Our results are strongly suggestive that MDA enrichment does not introduce systematic bias to genome characterization. SNP calling between samples when genotyping both de-novo and with a reference genome are highly congruent (>98%) when specifying a minimum threshold of 20X stack depth to call genotypes. Relative genome coverage is also similar between MDA and gDNA, and allelic dropout is not observed. SNP concordance varies based on coverage threshold, with 95% concordance reached at ~12X coverage genotyping de-novo and ~7X coverage genotyping with the reference genome. These results suggest that MDA may be a suitable solution for next-generation molecular ecological studies when DNA quantity would otherwise be a limiting factor. © 2015 John Wiley & Sons Ltd.

  8. Capacitive DNA sensor for rapid and sensitive detection of whole genome human herpesvirus-1 dsDNA in serum.

    Science.gov (United States)

    Cheng, Cheng; Oueslati, Rania; Wu, Jayne; Chen, Jiangang; Eda, Shigetoshi

    2017-06-01

    This work presents a rapid, highly sensitive, low-cost, and specific capacitive DNA sensor for detection of whole genome human herpesvirus-1 DNA. This sensor is capable of direct DNA detection with a response time of 30 s, and it can be used to test standard buffer or serum samples. The sensing approach for DNA detection is based on alternating current (AC) electrokinetics. By applying an inhomogeneous AC electric field on sensor electrodes, positive dielectrophoresis is induced to accelerate DNA hybridization. The same applied AC signal also directly measures the hybridization of target with the probe on the sensor surface. Experiments are conducted to optimize the AC signal, as well as the buffers for probe immobilization and target DNA hybridization. The assay is highly sensitive and specific, with no response to human herpesvirus-2 DNA at 5 ng/mL and a LOD of 1.0 pg/mL (6.5 copies/μL or 10.7 aM) in standard buffer. When testing the double stranded (ds) DNA spiked in human serum samples, the sensor yields a LOD of 20.0 pg/mL (129.5 copies/μL or 0.21 femtomolar (fM)) in neat serum. In this work, the target is whole genome dsDNA, consequently the test can be performed without the use of enzyme or amplification, which considerably simplifies the sensor operation and is highly suitable for point of care disease diagnosis. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Whole genome protein microarrays for serum profiling of immunodominant antigens of Bacillus anthracis

    Directory of Open Access Journals (Sweden)

    Karen Elizabeth Kempsell

    2015-08-01

    Full Text Available A commercial Bacillus anthracis (Anthrax whole genome protein microarray has been used to identify immunogenic Anthrax proteins using sera from groups of donors with (a confirmed B. anthracis naturally acquired cutaneous infection, (b confirmed B. anthracis intravenous drug use-acquired infection (c occupational exposure in a wool-sorters factory (d humans and rabbits vaccinated with the UK Anthrax protein vaccine and compared to naïve unexposed controls. Anti-IAP responses were observed for both IgG and IgA in the challenged groups; however the anti-IAP IgG response was more evident in the vaccinated group and the anti-IAP IgA response more evident in the B. anthracis-infected groups. Infected individuals appeared somewhat suppressed for their general IgG response, compared with other challenged groups.Immunogenic protein antigens were identified in all groups, some of which were shared between groups whilst others were specific for individual groups. The toxin proteins were immunodominant in all vaccinated, infected or other challenged groups. However a number of other chromosomally-located and plasmid encoded open reading frames were also recognised by infected or exposed groups in comparison to controls. Some of these antigens e.g. BA4182 are not recognised by vaccinated individuals, suggesting that there are proteins more specifically expressed by live Anthrax spores in vivo and are not currently found in the UK licensed Anthrax Vaccine (AVP. These may perhaps be preferentially expressed during infection and represent expression of alternative pathways in the B. anthracis ‘infectome’. These may make highly attractive candidates for diagnostic and vaccine biomarker development as they may be more specifically associated with the infectious phase of the pathogen. A number of B. anthracis small hypothetical protein targets have been synthesised, tested in mouse immunogenicity studies and validated in parallel using human sera from the

  10. Whole genome protein microarrays for serum profiling of immunodominant antigens of Bacillus anthracis

    Science.gov (United States)

    Kempsell, Karen E.; Kidd, Stephen P.; Lewandowski, Kuiama; Elmore, Michael J.; Charlton, Sue; Yeates, Annemarie; Cuthbertson, Hannah; Hallis, Bassam; Altmann, Daniel M.; Rogers, Mitch; Wattiau, Pierre; Ingram, Rebecca J.; Brooks, Tim; Vipond, Richard

    2015-01-01

    A commercial Bacillus anthracis (Anthrax) whole genome protein microarray has been used to identify immunogenic Anthrax proteins (IAP) using sera from groups of donors with (a) confirmed B. anthracis naturally acquired cutaneous infection, (b) confirmed B. anthracis intravenous drug use-acquired infection, (c) occupational exposure in a wool-sorters factory, (d) humans and rabbits vaccinated with the UK Anthrax protein vaccine and compared to naïve unexposed controls. Anti-IAP responses were observed for both IgG and IgA in the challenged groups; however the anti-IAP IgG response was more evident in the vaccinated group and the anti-IAP IgA response more evident in the B. anthracis-infected groups. Infected individuals appeared somewhat suppressed for their general IgG response, compared with other challenged groups. Immunogenic protein antigens were identified in all groups, some of which were shared between groups whilst others were specific for individual groups. The toxin proteins were immunodominant in all vaccinated, infected or other challenged groups. However, a number of other chromosomally-located and plasmid encoded open reading frame proteins were also recognized by infected or exposed groups in comparison to controls. Some of these antigens e.g., BA4182 are not recognized by vaccinated individuals, suggesting that there are proteins more specifically expressed by live Anthrax spores in vivo that are not currently found in the UK licensed Anthrax Vaccine (AVP). These may perhaps be preferentially expressed during infection and represent expression of alternative pathways in the B. anthracis “infectome.” These may make highly attractive candidates for diagnostic and vaccine biomarker development as they may be more specifically associated with the infectious phase of the pathogen. A number of B. anthracis small hypothetical protein targets have been synthesized, tested in mouse immunogenicity studies and validated in parallel using human sera from

  11. Whole Genome Sequencing

    Science.gov (United States)

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  12. Whole Genome Selection

    Science.gov (United States)

    Whole genome selection (WGS) is an approach to using DNA markers that are distributed throughout the entire genome. Genes affecting most economically-important traits are distributed throughout the genome and there are relatively few that have large effects with many more genes with progressively sm...

  13. Inspecting Targeted Deep Sequencing of Whole Genome Amplified DNA Versus Fresh DNA for Somatic Mutation Detection: A Genetic Study in Myelodysplastic Syndrome Patients.

    Science.gov (United States)

    Palomo, Laura; Fuster-Tormo, Francisco; Alvira, Daniel; Ademà, Vera; Armengol, María Pilar; Gómez-Marzo, Paula; de Haro, Nuri; Mallo, Mar; Xicoy, Blanca; Zamora, Lurdes; Solé, Francesc

    2017-08-01

    Whole genome amplification (WGA) has become an invaluable method for preserving limited samples of precious stock material and has been used during the past years as an alternative tool to increase the amount of DNA before library preparation for next-generation sequencing. Myelodysplastic syndromes (MDS) are a group of clonal hematopoietic stem cell disorders characterized by presenting somatic mutations in several myeloid-related genes. In this work, targeted deep sequencing has been performed on four paired fresh DNA and WGA DNA samples from bone marrow of MDS patients, to assess the feasibility of using WGA DNA for detecting somatic mutations. The results of this study highlighted that, in general, the sequencing and alignment statistics of fresh DNA and WGA DNA samples were similar. However, after variant calling and when considering variants detected at all frequencies, there was a high level of discordance between fresh DNA and WGA DNA (overall, a higher number of variants was detected in WGA DNA). After proper filtering, a total of three somatic mutations were detected in the cohort. All somatic mutations detected in fresh DNA were also identified in WGA DNA and validated by whole exome sequencing.

  14. Small Sample Whole-Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Hara, C A; Nguyen, C P; Wheeler, E K; Sorensen, K J; Arroyo, E S; Vrankovich, G P; Christian, A T

    2005-09-20

    Many challenges arise when trying to amplify and analyze human samples collected in the field due to limitations in sample quantity, and contamination of the starting material. Tests such as DNA fingerprinting and mitochondrial typing require a certain sample size and are carried out in large volume reactions; in cases where insufficient sample is present whole genome amplification (WGA) can be used. WGA allows very small quantities of DNA to be amplified in a way that enables subsequent DNA-based tests to be performed. A limiting step to WGA is sample preparation. To minimize the necessary sample size, we have developed two modifications of WGA: the first allows for an increase in amplified product from small, nanoscale, purified samples with the use of carrier DNA while the second is a single-step method for cleaning and amplifying samples all in one column. Conventional DNA cleanup involves binding the DNA to silica, washing away impurities, and then releasing the DNA for subsequent testing. We have eliminated losses associated with incomplete sample release, thereby decreasing the required amount of starting template for DNA testing. Both techniques address the limitations of sample size by providing ample copies of genomic samples. Carrier DNA, included in our WGA reactions, can be used when amplifying samples with the standard purification method, or can be used in conjunction with our single-step DNA purification technique to potentially further decrease the amount of starting sample necessary for future forensic DNA-based assays.

  15. Whole genome amplification: Use of advanced isothermal method ...

    African Journals Online (AJOL)

    Laboratory method for amplifying genomic deoxyribonucleic acid (DNA) samples aiming to generate more amounts and sufficient quantity DNA for subsequent specific analysis is named whole genome amplification (WGA). This method is only way to increase input material from few cells and limited DNA contents.

  16. Analysis of phage Mu DNA transposition by whole-genome ...

    Indian Academy of Sciences (India)

    (Trilink Biotechnologies) were employed. Sample DNA. (ChIP or processed Mu DNA) was amplified with Cy5-9mer primer, and reference DNA (Input or whole genome DNA) with Cy3-9mer primer. The samples were loaded on microarray slides and subjected to standard hybridization procedures (NimbleGen Arrays User's ...

  17. Principles of Whole-Genome Amplification.

    Science.gov (United States)

    Czyz, Zbigniew Tadeusz; Kirsch, Stefan; Polzer, Bernhard

    2015-01-01

    Modern molecular biology relies on large amounts of high-quality genomic DNA. However, in a number of clinical or biological applications this requirement cannot be met, as starting material is either limited (e.g., preimplantation genetic diagnosis (PGD) or analysis of minimal residual cancer) or of insufficient quality (e.g., formalin-fixed paraffin-embedded tissue samples or forensics). As a consequence, in order to obtain sufficient amounts of material to analyze these demanding samples by state-of-the-art modern molecular assays, genomic DNA has to be amplified. This chapter summarizes available technologies for whole-genome amplification (WGA), bridging the last 25 years from the first developments to currently applied methods. We will especially elaborate on research application, as well as inherent advantages and limitations of various WGA technologies.

  18. Whole genome amplification and its impact on CGH array profiles

    Directory of Open Access Journals (Sweden)

    Meldrum Cliff

    2008-07-01

    Full Text Available Abstract Background Some array comparative genomic hybridisation (array CGH platforms require a minimum of micrograms of DNA for the generation of reliable and reproducible data. For studies where there are limited amounts of genetic material, whole genome amplification (WGA is an attractive method for generating sufficient quantities of genomic material from miniscule amounts of starting material. A range of WGA methods are available and the multiple displacement amplification (MDA approach has been shown to be highly accurate, although amplification bias has been reported. In the current study, WGA was used to amplify DNA extracted from whole blood. In total, six array CGH experiments were performed to investigate whether the use of whole genome amplified DNA (wgaDNA produces reliable and reproducible results. Four experiments were conducted on amplified DNA compared to unamplified DNA and two experiments on unamplified DNA compared to unamplified DNA. Findings All the experiments involving wgaDNA resulted in a high proportion of losses and gains of genomic material. Previously, amplification bias has been overcome by using amplified DNA in both the test and reference DNA. Our data suggests that this approach may not be effective, as the gains and losses introduced by WGA appears to be random and are not reproducible between different experiments using the same DNA. Conclusion In light of these findings, the use of both amplified test and reference DNA on CGH arrays may not provide an accurate representation of copy number variation in the DNA.

  19. Whole-Genome Sequencing: Automated, Indexed Library Preparation.

    Science.gov (United States)

    Mardis, Elaine; McCombie, W Richard

    2017-03-01

    This protocol describes an automated procedure for constructing an indexed Illumina DNA library. With this method, genomic DNA fragments are produced by sonication, using high-frequency acoustic energy to shear DNA. Double-stranded DNA (dsDNA) will fragment when exposed to the energy of adaptive focused acoustic shearing (AFA). The resulting DNA fragments are ligated to adaptors, amplified by polymer chain reaction (PCR), and subjected to size selection using magnetic beads. The product is suitable for use as template in whole-genome sequencing. © 2017 Cold Spring Harbor Laboratory Press.

  20. Whole genome microarray analysis, from neonatal blood cards

    Directory of Open Access Journals (Sweden)

    Hogan Michael E

    2009-07-01

    Full Text Available Abstract Background Neonatal blood, obtained from a heel stick and stored dry on paper cards, has been the standard for birth defects screening for 50 years. Such dried blood samples are used, primarily, for analysis of small-molecule analytes. More recently, the DNA complement of such dried blood cards has been used for targeted genetic testing, such as for single nucleotide polymorphism in cystic fibrosis. Expansion of such testing to include polygenic traits, and perhaps whole genome scanning, has been discussed as a formal possibility. However, until now the amount of DNA that might be obtained from such dried blood cards has been limiting, due to inefficient DNA recovery technology. Results A new technology is employed for efficient DNA release from a standard neonatal blood card. Using standard Guthrie cards, stored an average of ten years post-collection, about 1/40th of the air-dried neonatal blood specimen (two 3 mm punches was processed to obtain DNA that was sufficient in mass and quality for direct use in microarray-based whole genome scanning. Using that same DNA release technology, it is also shown that approximately 1/250th of the original purified DNA (about 1 ng could be subjected to whole genome amplification, thus yielding an additional microgram of amplified DNA product. That amplified DNA product was then used in microarray analysis and yielded statistical concordance of 99% or greater to the primary, unamplified DNA sample. Conclusion Together, these data suggest that DNA obtained from less than 10% of a standard neonatal blood specimen, stored dry for several years on a Guthrie card, can support a program of genome-wide neonatal genetic testing.

  1. Whole genome amplification and sequencing of a Daphnia resting egg.

    Science.gov (United States)

    Lack, Justin B; Weider, Lawrence J; Jeyasingh, Punidan D

    2017-09-19

    Resting eggs banks are unique windows that allow us to directly observe shifts in population genetics, and phenotypes over time as natural populations evolve. Though a variety of planktonic organisms also produce resting stages, the keystone freshwater consumer, Daphnia, is a well-known model for paleogenetics and resurrection ecology. Nevertheless, paleogenomic investigations are limited largely because resting eggs do not contain enough DNA for genomic sequencing. In fact, genomic studies even on extant populations include a laborious preparatory phase of batch culturing dozens of individuals to generate sufficient genomic DNA. Here, we furnish a protocol to generate whole genomes of single ephippial (resting) eggs and single daphniids. Whole genomes of single ephippial eggs and single adults were amplified using Qiagen REPLI-g Single Cell kit reaction, followed by NEBNext Ultra DNA Library Prep Kit for library construction and Illumina sequencing. We compared the quality of the single-egg and single-individual amplified genomes to the standard batch genomic DNA extraction in the absence of genome amplification. At mean 20× depth, coverage was essentially identical for the amplified single individual relative to the unamplified batch extracted genome (>90% of the genome was covered and callable). Finally, while amplification resulted in the slight loss of heterozygosity for the amplified genomes, estimates were largely comparable and illustrate the utility and limitations of this approach in estimating population genetic parameters over long periods of time in natural populations of Daphnia and also other small species known to produce resting stages. © 2017 John Wiley & Sons Ltd.

  2. Whole genome analysis of a Vietnamese trio

    Indian Academy of Sciences (India)

    We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average ... Wellcome Trust Center for Human Genetics, Oxford University, Oxford, UK; High Performance Computing Center, Hanoi University of Science and Technology, ...

  3. Interpreting Whole-Genome Marker Data

    Science.gov (United States)

    Weir, Bruce S.

    2013-01-01

    The challenges of whole-genome data, when genotypes are available from hundreds of thousands of genetic markers, are explored for four topics in statistical genetics: Hardy-Weinberg testing, estimating linkage disequilibrium from unphased genotypic data, association mapping and characterizing population structure. PMID:24273615

  4. Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Plant Ramona N

    2006-08-01

    Full Text Available Abstract Background Whole genome amplification is an increasingly common technique through which minute amounts of DNA can be multiplied to generate quantities suitable for genetic testing and analysis. Questions of amplification-induced error and template bias generated by these methods have previously been addressed through either small scale (SNPs or large scale (CGH array, FISH methodologies. Here we utilized whole genome sequencing to assess amplification-induced bias in both coding and non-coding regions of two bacterial genomes. Halobacterium species NRC-1 DNA and Campylobacter jejuni were amplified by several common, commercially available protocols: multiple displacement amplification, primer extension pre-amplification and degenerate oligonucleotide primed PCR. The amplification-induced bias of each method was assessed by sequencing both genomes in their entirety using the 454 Sequencing System technology and comparing the results with those obtained from unamplified controls. Results All amplification methodologies induced statistically significant bias relative to the unamplified control. For the Halobacterium species NRC-1 genome, assessed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 119 times greater than those from unamplified material, 164.0 times greater for Repli-G, 165.0 times greater for PEP-PCR and 252.0 times greater than the unamplified controls for DOP-PCR. For Campylobacter jejuni, also analyzed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 15 times greater than those from unamplified material, 19.8 times greater for Repli-G, 61.8 times greater for PEP-PCR and 220.5 times greater than the unamplified controls for DOP-PCR. Conclusion Of the amplification methodologies examined in this paper, the multiple displacement amplification products generated the least bias, and produced significantly higher yields of amplified DNA.

  5. Whole genome sequencing of Plasmodium falciparum from dried blood spots using selective whole genome amplification.

    Science.gov (United States)

    Oyola, Samuel O; Ariani, Cristina V; Hamilton, William L; Kekre, Mihir; Amenga-Etego, Lucas N; Ghansah, Anita; Rutledge, Gavin G; Redmond, Seth; Manske, Magnus; Jyothi, Dushyanth; Jacob, Chris G; Otto, Thomas D; Rockett, Kirk; Newbold, Chris I; Berriman, Matthew; Kwiatkowski, Dominic P

    2016-12-20

    Translating genomic technologies into healthcare applications for the malaria parasite Plasmodium falciparum has been limited by the technical and logistical difficulties of obtaining high quality clinical samples from the field. Sampling by dried blood spot (DBS) finger-pricks can be performed safely and efficiently with minimal resource and storage requirements compared with venous blood (VB). Here, the use of selective whole genome amplification (sWGA) to sequence the P. falciparum genome from clinical DBS samples was evaluated, and the results compared with current methods that use leucodepleted VB. Parasite DNA with high (>95%) human DNA contamination was selectively amplified by Phi29 polymerase using short oligonucleotide probes of 8-12 mers as primers. These primers were selected on the basis of their differential frequency of binding the desired (P. falciparum DNA) and contaminating (human) genomes. Using sWGA method, clinical samples from 156 malaria patients, including 120 paired samples for head-to-head comparison of DBS and leucodepleted VB were sequenced. Greater than 18-fold enrichment of P. falciparum DNA was achieved from DBS extracts. The parasitaemia threshold to achieve >5× coverage for 50% of the genome was 0.03% (40 parasites per 200 white blood cells). Over 99% SNP concordance between VB and DBS samples was achieved after excluding missing calls. The sWGA methods described here provide a reliable and scalable way of generating P. falciparum genome sequence data from DBS samples. The current data indicate that it will be possible to get good quality sequence on most if not all drug resistance loci from the majority of symptomatic malaria patients. This technique overcomes a major limiting factor in P. falciparum genome sequencing from field samples, and paves the way for large-scale epidemiological applications.

  6. Whole genome phylogenies for multiple Drosophila species

    Directory of Open Access Journals (Sweden)

    Seetharam Arun

    2012-12-01

    Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between

  7. Post-Fragmentation Whole Genome Amplification-Based Method

    Science.gov (United States)

    Benardini, James; LaDuc, Myron T.; Langmore, John

    2011-01-01

    This innovation is derived from a proprietary amplification scheme that is based upon random fragmentation of the genome into a series of short, overlapping templates. The resulting shorter DNA strands (fragmentation whole genome amplification-based technology provides a robust and accurate method of amplifying femtogram levels of starting material into microgram yields with no detectable allele bias. The amplified DNA also facilitates the preservation of samples (spacecraft samples) by amplifying scarce amounts of template DNA into microgram concentrations in just a few hours. Based on further optimization of this technology, this could be a feasible technology to use in sample preservation for potential future sample return missions. The research and technology development described here can be pivotal in dealing with backward/forward biological contamination from planetary missions. Such efforts rely heavily on an increasing understanding of the burden and diversity of microorganisms present on spacecraft surfaces throughout assembly and testing. The development and implementation of these technologies could significantly improve the comprehensiveness and resolving power of spacecraft-associated microbial population censuses, and are important to the continued evolution and advancement of planetary protection capabilities. Current molecular procedures for assaying spacecraft-associated microbial burden and diversity have inherent sample loss issues at practically every step, particularly nucleic acid extraction. In engineering a molecular means of amplifying nucleic acids directly from single cells in their native state within the sample matrix, this innovation has circumvented entirely the need for DNA extraction regimes in the sample processing scheme.

  8. Whole-Genome Sequencing: Automated, Nonindexed Library Preparation.

    Science.gov (United States)

    Mardis, Elaine; McCombie, W Richard

    2017-03-01

    This protocol describes an automated procedure for constructing a nonindexed Illumina DNA library and relies on the use of a CyBi-SELMA automated pipetting machine, the Covaris E210 shearing instrument, and the epMotion 5075. With this method, genomic DNA fragments are produced by sonication, using high-frequency acoustic energy to shear DNA. Here, double-stranded DNA is fragmented when exposed to the energy of adaptive focused acoustic shearing (AFA). The resulting DNA fragments are ligated to adaptors, amplified by polymerase chain reaction (PCR), and subjected to size selection using magnetic beads. The product is suitable for use as template in whole-genome sequencing. © 2017 Cold Spring Harbor Laboratory Press.

  9. Harnessing Whole Genome Sequencing in Medical Mycology.

    Science.gov (United States)

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  10. Comparison of whole genome amplification techniques for human single cell exome sequencing.

    Directory of Open Access Journals (Sweden)

    Erik Borgström

    Full Text Available Whole genome amplification (WGA is currently a prerequisite for single cell whole genome or exome sequencing. Depending on the method used the rate of artifact formation, allelic dropout and sequence coverage over the genome may differ significantly.The largest difference between the evaluated protocols was observed when analyzing the target coverage and read depth distribution. These differences also had impact on the downstream variant calling. Conclusively, the products from the AMPLI1 and MALBAC kits were shown to be most similar to the bulk samples and are therefore recommended for WGA of single cells.In this study four commercial kits for WGA (AMPLI1, MALBAC, Repli-G and PicoPlex were used to amplify human single cells. The WGA products were exome sequenced together with non-amplified bulk samples from the same source. The resulting data was evaluated in terms of genomic coverage, allelic dropout and SNP calling.

  11. Benchmark Dataset for Whole Genome Sequence Compression.

    Science.gov (United States)

    C L, Biji; S Nair, Achuthsankar

    2017-01-01

    The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.

  12. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  13. Strategies and tools for whole genome alignments

    Energy Technology Data Exchange (ETDEWEB)

    Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas; Ishkhanov,Tigran; Ryaboy, Dmitriy; Rubin, Edward; Pachter, Lior; Dubchak, Inna

    2002-11-25

    The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With a view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.

  14. Whole Genome Sequencing and Newborn Screening.

    Science.gov (United States)

    Botkin, Jeffrey R; Rothwell, Erin

    2016-03-01

    Clinical applications of next generation sequencing are growing at a tremendous pace. Currently the largest application of genetic testing in medicine occurs with newborn screening through state-mandated public health programs, and there are suggestions that sequencing could become a standard component of newborn care within the next decade. As such, newborn screening may appear to be a logical starting point to explore whole genome and whole exome sequencing on a population level. Yet, there are a number of ethical, social and legal implications about the use of a mandatory public health screening program that create challenges for the use of sequencing technologies in this context. Additionally, at this time we still have limited understanding and strategies for managing genomic data, supporting our conclusion that genome sequencing is not justified within population based public health programs for newborn screening.

  15. Spiked GBS: a unified, open platform for single marker genotyping and whole-genome profiling.

    Science.gov (United States)

    Rife, Trevor W; Wu, Shuangye; Bowden, Robert L; Poland, Jesse A

    2015-03-28

    In plant breeding, there are two primary applications for DNA markers in selection: 1) selection of known genes using a single marker assay (marker-assisted selection; MAS); and 2) whole-genome profiling and prediction (genomic selection; GS). Typically, marker platforms have addressed only one of these objectives. We have developed spiked genotyping-by-sequencing (sGBS), which combines targeted amplicon sequencing with reduced representation genotyping-by-sequencing. To minimize the cost of targeted assays, we utilize a small percent of sequencing capacity available in runs of GBS libraries to "spike" amplified targets of a priori alleles tagged with a different set of unique barcodes. This open platform allows multiple, single-target loci to be assayed while simultaneously generating a whole-genome profile. This dual-genotyping approach allows different sets of samples to be evaluated for single markers or whole genome-profiling. Here, we report the application of sGBS on a winter wheat panel that was screened for converted KASP markers and newly-designed markers targeting known polymorphisms in the leaf rust resistance gene Lr34. The flexibility and low-cost of sGBS will enable a range of applications across genetics research. Specifically in breeding applications, the sGBS approach will allow breeders to obtain a whole-genome profile of important individuals while simultaneously targeting specific genes for a range of selection strategies across the breeding program.

  16. Whole genome sequencing for lung cancer.

    Science.gov (United States)

    Daniels, Marissa; Goh, Felicia; Wright, Casey M; Sriram, Krishna B; Relan, Vandana; Clarke, Belinda E; Duhig, Edwina E; Bowman, Rayleen V; Yang, Ian A; Fong, Kwun M

    2012-04-01

    Lung cancer is a leading cause of cancer related morbidity and mortality globally, and carries a dismal prognosis. Improved understanding of the biology of cancer is required to improve patient outcomes. Next-generation sequencing (NGS) is a powerful tool for whole genome characterisation, enabling comprehensive examination of somatic mutations that drive oncogenesis. Most NGS methods are based on polymerase chain reaction (PCR) amplification of platform-specific DNA fragment libraries, which are then sequenced. These techniques are well suited to high-throughput sequencing and are able to detect the full spectrum of genomic changes present in cancer. However, they require considerable investments in time, laboratory infrastructure, computational analysis and bioinformatic support. Next-generation sequencing has been applied to studies of the whole genome, exome, transcriptome and epigenome, and is changing the paradigm of lung cancer research and patient care. The results of this new technology will transform current knowledge of oncogenic pathways and provide molecular targets of use in the diagnosis and treatment of cancer. Somatic mutations in lung cancer have already been identified by NGS, and large scale genomic studies are underway. Personalised treatment strategies will improve care for those likely to benefit from available therapies, while sparing others the expense and morbidity of futile intervention. Organisational, computational and bioinformatic challenges of NGS are driving technological advances as well as raising ethical issues relating to informed consent and data release. Differentiation between driver and passenger mutations requires careful interpretation of sequencing data. Challenges in the interpretation of results arise from the types of specimens used for DNA extraction, sample processing techniques and tumour content. Tumour heterogeneity can reduce power to detect mutations implicated in oncogenesis. Next-generation sequencing will

  17. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  18. Whole genome sequence analysis of Mycobacterium suricattae.

    Science.gov (United States)

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Whole genome sequence of a Turkish individual.

    Directory of Open Access Journals (Sweden)

    Haluk Dogan

    Full Text Available Although whole human genome sequencing can be done with readily available technical and financial resources, the need for detailed analyses of genomes of certain populations still exists. Here we present, for the first time, sequencing and analysis of a Turkish human genome. We have performed 35x coverage using paired-end sequencing, where over 95% of sequencing reads are mapped to the reference genome covering more than 99% of the bases. The assembly of unmapped reads rendered 11,654 contigs, 2,168 of which did not reveal any homology to known sequences, resulting in ∼1 Mbp of unmapped sequence. Single nucleotide polymorphism (SNP discovery resulted in 3,537,794 SNP calls with 29,184 SNPs identified in coding regions, where 106 were nonsense and 259 were categorized as having a high-impact effect. The homo/hetero zygosity (1,415,123∶2,122,671 or 1∶1.5 and transition/transversion ratios (2,383,204∶1,154,590 or 2.06∶1 were within expected limits. Of the identified SNPs, 480,396 were potentially novel with 2,925 in coding regions, including 48 nonsense and 95 high-impact SNPs. Functional analysis of novel high-impact SNPs revealed various interaction networks, notably involving hereditary and neurological disorders or diseases. Assembly results indicated 713,640 indels (1∶1.09 insertion/deletion ratio, ranging from -52 bp to 34 bp in length and causing about 180 codon insertion/deletions and 246 frame shifts. Using paired-end- and read-depth-based methods, we discovered 9,109 structural variants and compared our variant findings with other populations. Our results suggest that whole genome sequencing is a valuable tool for understanding variations in the human genome across different populations. Detailed analyses of genomes of diverse origins greatly benefits research in genetics and medicine and should be conducted on a larger scale.

  20. [Progress on whole genome sequencing in woody plants].

    Science.gov (United States)

    Shi, Ji-Sen; Wang, Zhan-Jun; Chen, Jin-Hui

    2012-02-01

    In recent years, the number of sequencing data of plant whole genome have been increasing rapidly and the whole genome sequencing has been also performed widely in woody plants. However, there are a set of obstacles in investigating the whole genome sequencing in woody plants, which include larger genome, complex genome structure, limitations of assembly, annotation, functional analysis, and restriction of the funds for scientific research. Therefore, to promote the efficiency of the whole genome sequencing in woody plants, the development and defect of this field should be analyzed. The three-generation sequencing technologies (i.e., Sanger sequencing, synthesis sequencing, and single molecule sequencing) were compared in our studies. The progress mainly focused on the whole genome sequencing in four woody plants (Populus, Grapevine, Papaya, and Apple), and the application of sequencing results also was analyzed. The future of whole genome sequencing research in woody plants, consisting of material selection, establishment of genetic map and physical map, selection of sequencing technology, bioinformatic analysis, and application of sequencing results, was discussed.

  1. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  2. Whole genome sequencing analysis of lung adenocarcinoma in Xuanwei, China.

    Science.gov (United States)

    Wang, Xiao; Li, Jing; Duan, Yong; Wu, Huifei; Xu, Qiuyue; Zhang, Yanliang

    2017-03-01

    The lung cancer mortality rate in Xuanwei city is among the highest in China and adenocarcinoma is the major histological type. Lung cancer has been associated with exposure to indoor smoky coal emissions that contain high levels of polycyclic aromatic hydrocarbons; however, the pathogenesis of lung cancer has not yet been fully elucidated. We performed whole genome sequencing with lung adenocarcinoma and corresponding non-tumor tissue to explore the genomic features of Xuanwei lung cancer. We used the Molecule Annotation System to determine and plot alterations in genes and signaling pathways. A total of 3 428 060 and 3 416 989 single nucleotide variants were detected in tumor and normal genomes, respectively. After comparison of these two genomes, 977 high-confidence somatic single nucleotide variants were identified. We observed a remarkably high proportion of C·G-A·T transversions. HECTD4, RCBTB2, KLF15, and CACNA1C may be cancer-related genes. Nine copy number variations increased in chromosome 5 and one in chromosome 7. The novel junctions were detected via clustered discordant paired ends and 1955 structural variants were discovered. Among these, we found 44 novel chromosome structural variations. In addition, EGFR and CACNA1C in the mitogen-activated protein kinase signaling pathway were mutated or amplified in lung adenocarcinoma tumor tissue. We obtained a comprehensive view of somatic alterations of Xuanwei lung adenocarcinoma. These findings provide insight into the genomic landscape in order to further learn about the progress and development of Xuanwei lung adenocarcinoma. © 2017 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.

  3. Isolation and enrichment of Cryptosporidium DNA and verification of DNA purity for whole-genome sequencing.

    Science.gov (United States)

    Guo, Yaqiong; Li, Na; Lysén, Colleen; Frace, Michael; Tang, Kevin; Sammons, Scott; Roellig, Dawn M; Feng, Yaoyu; Xiao, Lihua

    2015-02-01

    Whole-genome sequencing of Cryptosporidium spp. is hampered by difficulties in obtaining sufficient, highly pure genomic DNA from clinical specimens. In this study, we developed procedures for the isolation and enrichment of Cryptosporidium genomic DNA from fecal specimens and verification of DNA purity for whole-genome sequencing. The isolation and enrichment of genomic DNA were achieved by a combination of three oocyst purification steps and whole-genome amplification (WGA) of DNA from purified oocysts. Quantitative PCR (qPCR) analysis of WGA products was used as an initial quality assessment of amplified genomic DNA. The purity of WGA products was assessed by Sanger sequencing of cloned products. Next-generation sequencing tools were used in final evaluations of genome coverage and of the extent of contamination. Altogether, 24 fecal specimens of Cryptosporidium parvum, C. hominis, C. andersoni, C. ubiquitum, C. tyzzeri, and Cryptosporidium chipmunk genotype I were processed with the procedures. As expected, WGA products with low (sequences in Sanger sequencing. The cloning-sequencing analysis, however, showed significant contamination in 5 WGA products (proportion of positive colonies derived from Cryptosporidium genomic DNA, ≤25%). Following this strategy, 20 WGA products from six Cryptosporidium species or genotypes with low (mostly sequencing, generating sequence data covering 94.5% to 99.7% of Cryptosporidium genomes, with mostly minor contamination from bacterial, fungal, and host DNA. These results suggest that the described strategy can be used effectively for the isolation and enrichment of Cryptosporidium DNA from fecal specimens for whole-genome sequencing. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  4. Selective Whole-Genome Amplification Is a Robust Method That Enables Scalable Whole-Genome Sequencing of Plasmodium vivax from Unprocessed Clinical Samples.

    Science.gov (United States)

    Cowell, Annie N; Loy, Dorothy E; Sundararaman, Sesh A; Valdivia, Hugo; Fisch, Kathleen; Lescano, Andres G; Baldeviano, G Christian; Durand, Salomon; Gerbasi, Vince; Sutherland, Colin J; Nolder, Debbie; Vinetz, Joseph M; Hahn, Beatrice H; Winzeler, Elizabeth A

    2017-02-07

    . However, WGS of P. vivax is challenging due to low parasite levels in humans and the lack of a routine system to culture the parasites. Selective whole-genome amplification (SWGA) preferentially amplifies the genomes of pathogens from mixtures of target and host gDNA. Here, we demonstrate that SWGA is a simple, robust method that can be used to enrich P. vivax genomic DNA (gDNA) from unprocessed human blood samples and dried blood spots for cost-effective, high-quality WGS. Copyright © 2017 Cowell et al.

  5. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  6. Systematic evaluation of bias in microbial community profiles induced by whole genome amplification.

    Science.gov (United States)

    Direito, Susana O L; Zaura, Egija; Little, Miranda; Ehrenfreund, Pascale; Röling, Wilfred F M

    2014-03-01

    Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used [multiple displacement amplification (MDA)] and one new primer-free method [primase-based whole genome amplification (pWGA)] were compared using a polymerase chain reaction (PCR)-based method as control. Pyrosequencing of an environmental sample and principal component analysis revealed that MDA impacted community profiles more strongly than pWGA and indicated that this related to species GC content, although an influence of DNA integrity could not be excluded. Subsequently, biases by species GC content, DNA integrity and fragment size were separately analysed using defined mixtures of DNA from various species. We found significantly less amplification of species with the highest GC content for MDA-based templates and, to a lesser extent, for pWGA. DNA fragmentation also interfered severely: species with more fragmented DNA were less amplified with MDA and pWGA. pWGA was unable to amplify low molecular weight DNA (< 1.5 kb), whereas MDA was inefficient. We conclude that pWGA is the most promising method for characterization of microbial communities in low-biomass environments and for currently planned astrobiological missions to Mars. © 2013 Society for Applied Microbiology and John Wiley & Sons Ltd.

  7. Whole genome amplification: Use of advanced isothermal method

    African Journals Online (AJOL)

    Yomi

    2010-12-29

    Dec 29, 2010 ... Whole genome amplification: Use of advanced isothermal method. Sima Moghaddaszadeh Ahrabi1, Safar Farajnia2,3*, Ghodratollah Rahimi-Mianji4, Soheila. Montazer Saheb3 ... Moreover, application of high fidelity and high possessive DNA ..... between I-PEP with MDA by using serial dilutions of.

  8. Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny

    NARCIS (Netherlands)

    Herniou, E.A.; Luque, T.; Chen, X.; Vlak, J.M.; Winstanley, D.; Cory, J.S.; O'Reilly, D.R.

    2001-01-01

    Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of

  9. Comparative Copy Number Variation From Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2011-01-01

    Whole genome sequencing enables a high resolution view of the humangenome and enables unique insights into copy number variations in anunprecedented scale. Numerous tools and studies have already been introduced that provide confirmatory and new genomic variability datain individuals and across

  10. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.

    2014-08-14

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  11. Whole genome homology-based identification of candidate genes ...

    African Journals Online (AJOL)

    ... and might play some important roles in drought tolerance in sesame. Our results provided genomic resources for further functional analysis and genetic engineering towards drought tolerance improvement in sesame. Keywords: Sesamum indicum, candidate genes, drought tolerance, orthologous gene, whole genome ...

  12. Optimized design and assessment of whole genome tiling arrays.

    NARCIS (Netherlands)

    Graf, S.; Nielsen, F.G.G.; Kurtz, S.; Huynen, M.A.; Birney, E.; Stunnenberg, H.G.; Flicek, P.

    2007-01-01

    MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling

  13. Whole genome amplification - Review of applications and advances

    Energy Technology Data Exchange (ETDEWEB)

    Hawkins, Trevor L.; Detter, J.C.; Richardson, Paul

    2001-11-15

    The concept of Whole Genome Amplification is something that has arisen in the past few years as modifications to the polymerase chain reaction (PCR) have been adapted to replicate regions of genomes which are of biological interest. The applications here are many--forensics, embryonic disease diagnosis, bio terrorism genome detection, ''imoralization'' of clinical samples, microbial diversity, and genotyping. The key question is if DNA can be replicated a genome at a time without bias or non random distribution of the target. Several papers published in the last year and currently in preparation may lead to the conclusion that whole genome amplification may indeed be possible and therefore open up a new avenue to molecular biology.

  14. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  15. Archived neonatal dried blood spot samples can be used for accurate whole genome and exome-targeted next-generation sequencing

    DEFF Research Database (Denmark)

    Hollegaard, Mads Vilhelm; Grauholm, Jonas; Nielsen, Ronni

    2013-01-01

    , for example, to examine the genetics of various disorders. We have previously demonstrated that DNA extracted from a fraction (2×3.2mm discs) of an archived DBSS can be whole genome amplified (wgaDNA) and used for accurate array genotyping. However, until now, it has been uncertain whether wgaDNA from DBSS...... can be used for accurate whole genome sequencing (WGS) and exome sequencing (WES). This study examined two individuals represented by three different types of samples each: whole-blood (reference samples), 3-year-old DBSS spotted with reference material (refDBSS), and 27- to 29-year-old archived...

  16. Cgaln: fast and space-efficient whole-genome alignment

    Directory of Open Access Journals (Sweden)

    Nakato Ryuichiro

    2010-04-01

    Full Text Available Abstract Background Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer. Results We previously proposed the CGAT (Coarse-Grained AlignmenT algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory. Conclusions Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and

  17. Cgaln: fast and space-efficient whole-genome alignment.

    Science.gov (United States)

    Nakato, Ryuichiro; Gotoh, Osamu

    2010-04-30

    Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer. We previously proposed the CGAT (Coarse-Grained AlignmenT) algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory. Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and will contribute to various fields of genome science.

  18. Alignathon: a competitive assessment of whole-genome alignment methods

    OpenAIRE

    Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Harris, Robert S.; Fitzgerald, Stephen; Beal, Kathryn; Seledtsov, Igor; Molodtsov, Vladimir; Raney, Brian J.; Clawson, Hiram; Kim, Jaebum; Kemena, Carsten; Chang, Jia-Ming; Erb, Ionas; Poliakov, Alexander

    2014-01-01

    Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assess...

  19. Whole genome and transcriptome sequencing of a B3 thymoma.

    Directory of Open Access Journals (Sweden)

    Iacopo Petrini

    Full Text Available Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37. Copy number (CN aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs and 2 insertion/deletions (INDELs were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma.

  20. Priors in whole-genome regression: the bayesian alphabet returns.

    Science.gov (United States)

    Gianola, Daniel

    2013-07-01

    Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term "Bayesian alphabet" denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless n ≫ p. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters ("tuning knobs") are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.

  1. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Directory of Open Access Journals (Sweden)

    David Koslicki

    Full Text Available With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  2. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Science.gov (United States)

    Koslicki, David; Foucart, Simon; Rosen, Gail

    2014-01-01

    With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  3. Whole-genome shotgun optical mapping of Rhodospirillum rubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, S. [Univ. Wisc.-Madison; Zhou, S. [Univ. Wisc.-Madison; Place, M. [Univ. Wisc.-Madison; Zhang, Y. [Univ. Wisc.-Madison; Briska, A. [Univ. Wisc.-Madison; Goldstein, S. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Forrest, D. [Univ. Wisc.-Madison; Lim, A. [Univ. Wisc.-Madison; Lapidus, A. [Univ. Wisc.-Madison; Han, C. S. [Univ. Wisc.-Madison; Roberts, G. P. [Univ. Wisc.-Madison; Schwartz, D. C. [Univ. Wisc.-Madison

    2005-09-01

    Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.

  4. Whole-genome shotgun optical mapping of rhodospirillumrubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, Susan; Zhou, Shiguo; Place, Mike; Zhang, Yaoping; Briska, Adam; Goldstein, Steve; Churas, Chris; Runnheim, Rod; Forrest,Dan; Lim, Alex; Lapidus, Alla; Han, Cliff S.; Roberts, Gary P.; Schwartz,David C.

    2004-07-01

    Rhodospirillum rubrum is a phototrophic purple non-sulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems, and as a source of hydrogen and biodegradable plastics production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction maps (Xba I, Nhe I, and Hind III) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction maps from randomly sheared genomic DNA molecules extracted directly from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the Hind III map acted as a scaffold for high resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and validation of genome sequence, our work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a ''molecular cytogenetics'' approach to solving problems in genomic analysis.

  5. Identification of hallmarks of lung adenocarcinoma prognosis using whole genome sequencing

    Science.gov (United States)

    Liu, Li; Huang, Jiao; Wang, Ke; Li, Li; Li, Yangkai; Yuan, Jingsong; Wei, Sheng

    2015-01-01

    In conjunction with clinical characteristics, prognostic biomarkers are essential for choosing optimal therapies to lower the mortality of lung adenocarcinoma. Whole genome sequencing (WGS) of 7 cancerous-noncancerous tissue pairs was performed to explore the comparative copy number variations (CNVs) associated with lung adenocarcinoma. The frequencies of top ranked CNVs were verified in an independent set of 114 patients and then the roles of target CNVs in disease prognosis were assessed in 313 patients. The WGS yielded 2604 CNVs. After frequency validation and biological function screening of top 10 CNVs, 9 mutant driver genes from 7 CNVs were further analyzed for an association with survival. Compared with the PBXIP1 amplified copy number, unamplified carriers had a 0.62-fold (95%CI = 0.43–0.91) decreased risk of death. Compared with an amplified TERT, those with an unamplified TERT had a 35% reduction (95% CI = 3%–56%) in risk of lung adenocarcinoma progression. Cases with both unamplified PBXIP1 and TERT had a median 34.32-month extension of overall survival and 34.55-month delay in disease progression when compared with both amplified CNVs. This study demonstrates that CNVs of TERT and PBXIP1 have the potential to translate into the clinic and be used to improve outcomes for patients with this fatal disease. PMID:26497366

  6. A binary search approach to whole-genome data analysis

    Science.gov (United States)

    Brodsky, Leonid; Kogan, Simon; BenJacob, Eshel; Nevo, Eviatar

    2010-01-01

    A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regulated signals, at equal accuracy. The score of an enriched genome fragment reflects the difference between the actual concentration of up-regulated signals in the fragment and the chromosome signal baseline. The “divide-and-conquer”-type algorithm detects a series of nonintersecting fragments of various lengths with locally optimal scores. The procedure is applied to detected fragments in a nested manner by recalculating the lower-than-baseline signals in the chromosome. The algorithm was applied to simulated whole-genome data, and its sensitivity/specificity were compared with those of several alternative algorithms. The algorithm was also tested with four biological tiling array datasets comprising Arabidopsis (i) expression and (ii) histone 3 lysine 27 trimethylation CHIP-on-chip datasets; Saccharomyces cerevisiae (iii) spliced intron data and (iv) chromatin remodeling factor binding sites. The analyses’ results demonstrate the power of the algorithm in identifying both the short up-regulated fragments (such as exons and transcription factor binding sites) and the long—even moderately up-regulated zones—at their precise genome margins. The algorithm generates an accurate whole-genome landscape that could be used for cross-comparison of signals across the same genome in evolutionary and general genomic studies. PMID:20833816

  7. Deep whole-genome sequencing of 100 southeast Asian Malays.

    Science.gov (United States)

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  8. Whole genome sequencing of clinical isolates of Giardia lamblia.

    Science.gov (United States)

    Hanevik, K; Bakken, R; Brattbakk, H R; Saghaug, C S; Langeland, N

    2015-02-01

    Clinical isolates from protozoan parasites such as Giardia lamblia are at present practically impossible to culture. By using simple cyst purification methods, we show that Giardia whole genome sequencing of clinical stool samples is possible. Immunomagnetic separation after sucrose gradient flotation gave superior results compared to sucrose gradient flotation alone. The method enables detailed analysis of a wide range of genes of interest for genotyping, virulence and drug resistance. Copyright © 2014 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.

  9. Genomic prediction using QTL derived from whole genome sequence data

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    This study investigated the gain in accuracy of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k SNP data. Analyses were performed for Nordic Holstein and Danish Jersey animals, using either...... a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model, results showed increases in accuracy of up to two percentage points for production traits in both Holstein and Jersey animals by including the extra variants in the analysis, and an extra 1.5 percentage points...

  10. Estimating telomere length from whole genome sequence data.

    Science.gov (United States)

    Ding, Zhihao; Mangino, Massimo; Aviv, Abraham; Spector, Tim; Durbin, Richard

    2014-05-01

    Telomeres play a key role in replicative ageing and undergo age-dependent attrition in vivo. Here, we report a novel method, TelSeq, to measure average telomere length from whole genome or exome shotgun sequence data. In 260 leukocyte samples, we show that TelSeq results correlate with Southern blot measurements of the mean length of terminal restriction fragments (mTRFs) and display age-dependent attrition comparably well as mTRFs. © The Author(s) 2014. Published by Oxford University Press [on behalf of insert name of society].

  11. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  12. A novel whole genome amplification method using type IIS restriction enzymes to create overhangs with random sequences.

    Science.gov (United States)

    Pan, Xiaoming; Wan, Baihui; Li, Chunchuan; Liu, Yu; Wang, Jing; Mou, Haijin; Liang, Xingguo

    2014-08-20

    Ligation-mediated polymerase chain reaction (LM-PCR) is a whole genome amplification (WGA) method, for which genomic DNA is cleaved into numerous fragments and then all of the fragments are amplified by PCR after attaching a universal end sequence. However, the self-ligation of these fragments could happen and may cause biased amplification and restriction of its application. To decrease the self-ligation probability, here we use type IIS restriction enzymes to digest genomic DNA into fragments with 4-5nt long overhangs with random sequences. After ligation to an adapter with random end sequences to above fragments, PCR is carried out and almost all present DNA sequences are amplified. In this study, whole genome of Vibrio parahaemolyticus was amplified and the amplification efficiency was evaluated by quantitative PCR. The results suggested that our approach could provide sufficient genomic DNA with good quality to meet requirements of various genetic analyses. Copyright © 2014. Published by Elsevier B.V.

  13. Whole genome sequence-based serogrouping of Listeria monocytogenes isolates.

    Science.gov (United States)

    Hyden, Patrick; Pietzka, Ariane; Lennkh, Anna; Murer, Andrea; Springer, Burkhard; Blaschitz, Marion; Indra, Alexander; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner; Sensen, Christoph W

    2016-10-10

    Whole genome sequencing (WGS) is currently becoming the method of choice for characterization of Listeria monocytogenes isolates in national reference laboratories (NRLs). WGS is superior with regards to accuracy, resolution and analysis speed in comparison to several other methods including serotyping, PCR, pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable number tandem repeat analysis (MLVA), and multivirulence-locus sequence typing (MVLST), which have been used thus far for the characterization of bacterial isolates (and are still important tools in reference laboratories today) to control and prevent listeriosis, one of the major sources of foodborne diseases for humans. Backward compatibility of WGS to former methods can be maintained by extraction of the respective information from WGS data. Serotyping was the first subtyping method for L. monocytogenes capable of differentiating 12 serovars and national reference laboratories still perform serotyping and PCR-based serogrouping as a first level classification method for Listeria monocytogenes surveillance. Whole genome sequence based core genome MLST analysis of a L. monocytogenes collection comprising 172 isolates spanning all 12 serotypes was performed for serogroup determination. These isolates clustered according to their serotypes and it was possible to group them either into the IIa, IIc, IVb or IIb clusters, respectively, which were generated by minimum spanning tree (MST) and neighbor joining (NJ) tree data analysis, demonstrating the power of the new approach. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  14. Copy number and loss of heterozygosity detected by SNP array of formalin-fixed tissues using whole-genome amplification.

    Directory of Open Access Journals (Sweden)

    Angela Stokes

    Full Text Available The requirement for large amounts of good quality DNA for whole-genome applications prohibits their use for small, laser capture micro-dissected (LCM, and/or rare clinical samples, which are also often formalin-fixed and paraffin-embedded (FFPE. Whole-genome amplification of DNA from these samples could, potentially, overcome these limitations. However, little is known about the artefacts introduced by amplification of FFPE-derived DNA with regard to genotyping, and subsequent copy number and loss of heterozygosity (LOH analyses. Using a ligation adaptor amplification method, we present data from a total of 22 Affymetrix SNP 6.0 experiments, using matched paired amplified and non-amplified DNA from 10 LCM FFPE normal and dysplastic oral epithelial tissues, and an internal method control. An average of 76.5% of SNPs were called in both matched amplified and non-amplified DNA samples, and concordance was a promising 82.4%. Paired analysis for copy number, LOH, and both combined, showed that copy number changes were reduced in amplified DNA, but were 99.5% concordant when detected, amplifications were the changes most likely to be 'missed', only 30% of non-amplified LOH changes were identified in amplified pairs, and when copy number and LOH are combined ∼50% of gene changes detected in the unamplified DNA were also detected in the amplified DNA and within these changes, 86.5% were concordant for both copy number and LOH status. However, there are also changes introduced as ∼20% of changes in the amplified DNA are not detected in the non-amplified DNA. An integrative network biology approach revealed that changes in amplified DNA of dysplastic oral epithelium localize to topologically critical regions of the human protein-protein interaction network, suggesting their functional implication in the pathobiology of this disease. Taken together, our results support the use of amplification of FFPE-derived DNA, provided sufficient samples are used

  15. Dirofilaria immitis JYD-34 isolate: whole genome analysis

    Directory of Open Access Journals (Sweden)

    Catherine Bourguinat

    2017-11-01

    Full Text Available Abstract Background Macrocyclic lactone (ML anthelmintics are used for chemoprophylaxis for heartworm infection in dogs and cats. Cases of dogs becoming infected with heartworms, despite apparent compliance to recommended chemoprophylaxis with approved preventives, has led to such cases being considered as suspected lack of efficacy (LOE. Recently, microfilariae collected from a small number of LOE isolates were used as a source of infection of new host dogs and confirmed to have reduced susceptibility to ML in controlled efficacy studies using L3 challenge in dogs. A specific Dirofilaria immitis laboratory isolate named JYD-34 has also been confirmed to have less than 100% susceptibility to ML-based preventives. For preventive claims against heartworm disease, evidence of 100% efficacy is required by FDA-CVM. It was therefore of interest to determine whether JYD-34 has a genetic profile similar to other documented LOE and confirmed reduced susceptibility isolates or has a genetic profile similar to known ML-susceptible isolates. Methods In this study, the 90Mbp whole genome of the JYD-34 strain was sequenced. This genome was compared using bioinformatics tools to pooled whole genomes of four well-characterized susceptible D. immitis populations, one susceptible Missouri laboratory isolate, as well as the pooled whole genomes of four LOE D. immitis populations. Fixation indexes (FST, which allow the genetic structure of each population (isolate to be compared at the level of single nucleotide polymorphisms (SNP across the genome, have been calculated. Forty-one previously reported SNP, that appeared to differentiate between susceptible and LOE and confirmed reduced susceptibility isolates, were also investigated in the JYD-34 isolate. Results The FST analysis, and the analysis of the 41 SNP that appeared to differentiate reduced susceptibility from fully susceptible isolates, confirmed that the JYD-34 isolate has a genome similar to previously

  16. Dirofilaria immitis JYD-34 isolate: whole genome analysis.

    Science.gov (United States)

    Bourguinat, Catherine; Lefebvre, Francois; Sandoval, Johanna; Bondesen, Brenda; Moreno, Yovany; Prichard, Roger K

    2017-11-09

    Macrocyclic lactone (ML) anthelmintics are used for chemoprophylaxis for heartworm infection in dogs and cats. Cases of dogs becoming infected with heartworms, despite apparent compliance to recommended chemoprophylaxis with approved preventives, has led to such cases being considered as suspected lack of efficacy (LOE). Recently, microfilariae collected from a small number of LOE isolates were used as a source of infection of new host dogs and confirmed to have reduced susceptibility to ML in controlled efficacy studies using L3 challenge in dogs. A specific Dirofilaria immitis laboratory isolate named JYD-34 has also been confirmed to have less than 100% susceptibility to ML-based preventives. For preventive claims against heartworm disease, evidence of 100% efficacy is required by FDA-CVM. It was therefore of interest to determine whether JYD-34 has a genetic profile similar to other documented LOE and confirmed reduced susceptibility isolates or has a genetic profile similar to known ML-susceptible isolates. In this study, the 90Mbp whole genome of the JYD-34 strain was sequenced. This genome was compared using bioinformatics tools to pooled whole genomes of four well-characterized susceptible D. immitis populations, one susceptible Missouri laboratory isolate, as well as the pooled whole genomes of four LOE D. immitis populations. Fixation indexes (FST), which allow the genetic structure of each population (isolate) to be compared at the level of single nucleotide polymorphisms (SNP) across the genome, have been calculated. Forty-one previously reported SNP, that appeared to differentiate between susceptible and LOE and confirmed reduced susceptibility isolates, were also investigated in the JYD-34 isolate. The FST analysis, and the analysis of the 41 SNP that appeared to differentiate reduced susceptibility from fully susceptible isolates, confirmed that the JYD-34 isolate has a genome similar to previously investigated LOE isolates, and isolates confirmed to

  17. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  18. Whole-genome sequencing to control antimicrobial resistance

    Science.gov (United States)

    Köser, Claudio U.; Ellington, Matthew J.; Peacock, Sharon J.

    2014-01-01

    Following recent improvements in sequencing technologies, whole-genome sequencing (WGS) is positioned to become an essential tool in the control of antibiotic resistance, a major threat in modern healthcare. WGS has already found numerous applications in this area, ranging from the development of novel antibiotics and diagnostic tests through to antibiotic stewardship of currently available drugs via surveillance and the elucidation of the factors that allow the emergence and persistence of resistance. Numerous proof-of-principle studies have also highlighted the value of WGS as a tool for day-to-day infection control and, for some pathogens, as a primary diagnostic tool to detect antibiotic resistance. However, appropriate data analysis platforms will need to be developed before routine WGS can be introduced on a large scale. PMID:25096945

  19. Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.

    Science.gov (United States)

    Li, Qing; Hermanson, Peter J; Springer, Nathan M

    2018-01-01

    DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.

  20. Whole genome sequencing: an efficient approach to ensuring food safety

    Science.gov (United States)

    Lakicevic, B.; Nastasijevic, I.; Dimitrijevic, M.

    2017-09-01

    Whole genome sequencing is an effective, powerful tool that can be applied to a wide range of public health and food safety applications. A major difference between WGS and the traditional typing techniques is that WGS allows all genes to be included in the analysis, instead of a well-defined subset of genes or variable intergenic regions. Also, the use of WGS can facilitate the understanding of contamination/colonization routes of foodborne pathogens within the food production environment, and can also afford efficient tracking of pathogens’ entry routes and distribution from farm-to-consumer. Tracking foodborne pathogens in the food processing-distribution-retail-consumer continuum is of the utmost importance for facilitation of outbreak investigations and rapid action in controlling/preventing foodborne outbreaks. Therefore, WGS likely will replace most of the numerous workflows used in public health laboratories to characterize foodborne pathogens into one consolidated, efficient workflow.

  1. Whole genome sequencing in clinical and public health microbiology

    Science.gov (United States)

    Kwong, J. C.; McCallum, N.; Sintchenko, V.; Howden, B. P.

    2015-01-01

    SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure. PMID:25730631

  2. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  3. Computational operon prediction in whole-genomes and metagenomes.

    Science.gov (United States)

    Zaidi, Syed Shujaat Ali; Zhang, Xuegong

    2017-07-01

    Microbial diversity in unique environmental settings enables abrupt responses catalysed by altering the gene regulation and formation of gene clusters called operons. Operons increases bacterial adaptability, which in turn increases their survival. This review article presents the emergence of computational operon prediction methods for whole microbial genomes and metagenomes, and discusses their strengths and limitations. Most of the whole-genome operon prediction methods struggle to generalize on unrelated genomes. The applicability of universal whole-genome operon prediction methods to metagenomic data is an interesting yet less investigated question. We have evaluated the potential of various operon prediction features for genomic and metagenomic data. Most of operon prediction methods with high accuracy have been compiled into databases. Despite of the high predictive performance, the data among many databases are not completely consistent for similar species. We performed a correlation analysis between the computationally predicted operon databases and experimentally validated data for Escherichia coli, Bacillus subtilis and Mycobacterium tuberculosis. Operon prediction for most of the less characterized microbes cannot be verified due to absence of experimentally validated operons. The generation of validated information for other microbes would test the authenticity of operon databases for other less annotated microbes as well. Advances in sequencing technologies and development of better analysis methods will help researchers to overcome the technological hurdles (such as long sequencing reads and improved contig size) and further improve operon predictions and better utilize operonic information. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  4. Effects of DNA mass on multiple displacement whole genome amplification and genotyping performance

    Directory of Open Access Journals (Sweden)

    Haque Kashif A

    2005-09-01

    Full Text Available Abstract Background Whole genome amplification (WGA promises to eliminate practical molecular genetic analysis limitations associated with genomic DNA (gDNA quantity. We evaluated the performance of multiple displacement amplification (MDA WGA using gDNA extracted from lymphoblastoid cell lines (N = 27 with a range of starting gDNA input of 1–200 ng into the WGA reaction. Yield and composition analysis of whole genome amplified DNA (wgaDNA was performed using three DNA quantification methods (OD, PicoGreen® and RT-PCR. Two panels of N = 15 STR (using the AmpFlSTR® Identifiler® panel and N = 49 SNP (TaqMan® genotyping assays were performed on each gDNA and wgaDNA sample in duplicate. gDNA and wgaDNA masses of 1, 4 and 20 ng were used in the SNP assays to evaluate the effects of DNA mass on SNP genotyping assay performance. A total of N = 6,880 STR and N = 56,448 SNP genotype attempts provided adequate power to detect differences in STR and SNP genotyping performance between gDNA and wgaDNA, and among wgaDNA produced from a range of gDNA templates inputs. Results The proportion of double-stranded wgaDNA and human-specific PCR amplifiable wgaDNA increased with increased gDNA input into the WGA reaction. Increased amounts of gDNA input into the WGA reaction improved wgaDNA genotyping performance. Genotype completion or genotype concordance rates of wgaDNA produced from all gDNA input levels were observed to be reduced compared to gDNA, although the reduction was not always statistically significant. Reduced wgaDNA genotyping performance was primarily due to the increased variance of allelic amplification, resulting in loss of heterozygosity or increased undetermined genotypes. MDA WGA produces wgaDNA from no template control samples; such samples exhibited substantial false-positive genotyping rates. Conclusion The amount of gDNA input into the MDA WGA reaction is a critical determinant of genotyping performance of wgaDNA. At least 10 ng of

  5. Application of Whole-Genome Sequencing to an Unusual Outbreak of Invasive Group A Streptococcal Disease.

    Science.gov (United States)

    Galloway-Peña, Jessica; Clement, Meredith E; Sharma Kuinkel, Batu K; Ruffin, Felicia; Flores, Anthony R; Levinson, Howard; Shelburne, Samuel A; Moore, Zack; Fowler, Vance G

    2016-01-01

    Whole-genome analysis was applied to investigate atypical point-source transmission of 2 invasive group A streptococcal (GAS) infections. Isolates were serotype M4, ST39, and genetically indistinguishable. Comparison with MGAS10750 revealed nonsynonymous polymorphisms in ropB and increased speB transcription. This study demonstrates the usefulness of whole-genome analyses for GAS outbreaks.

  6. Accuracy of genomic prediction using imputed whole-genome sequence data in white layers

    NARCIS (Netherlands)

    Heidaritabar, M.; Calus, M.P.L.; Megens, H.J.; Vereijken, A.; Groenen, M.A.M.; Bastiaansen, J.W.M.

    2016-01-01

    There is an increasing interest in using whole-genome sequence data in genomic selection breeding programmes. Prediction of breeding values is expected to be more accurate when whole-genome sequence is used, because the causal mutations are assumed to be in the data. We performed genomic

  7. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    The Genome of the Netherlands Consortium; T. Marschall (Tobias); A. Schönhuth (Alexander)

    2014-01-01

    htmlabstractWhole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch

  8. Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle

    NARCIS (Netherlands)

    Binsbergen, van R.; Bink, M.C.A.M.; Calus, M.P.L.; Eeuwijk, van F.A.; Hayes, B.J.; Hulsegge, B.; Veerkamp, R.F.

    2014-01-01

    Background The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina

  9. Prospects of whole-genome sequence data in animal and plant breeding

    NARCIS (Netherlands)

    Binsbergen, van Rianne

    2017-01-01

    The rapid decrease in costs of DNA sequencing implies that whole-genome sequence data will be widely available in the coming few years. Whole-genome sequence data includes all base-pairs on the genome that show variation in the sequenced population. Consequently, it is assumed that the causal

  10. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls...

  11. Signatures of selection in tilapia revealed by whole genome resequencing.

    Science.gov (United States)

    Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua

    2015-09-16

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.

  12. Genomic V exons from whole genome shotgun data in reptiles.

    Science.gov (United States)

    Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

    2014-08-01

    Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).

  13. Whole genome sequencing of Chinese clearhead icefish, Protosalanx hyalocranius.

    Science.gov (United States)

    Liu, Kai; Xu, Dongpo; Li, Jia; Bian, Chao; Duan, Jinrong; Zhou, Yanfeng; Zhang, Minying; You, Xinxin; You, Yang; Chen, Jieming; Yu, Hui; Xu, Gangchun; Fang, Di-An; Qiang, Jun; Jiang, Shulun; He, Jie; Xu, Junmin; Shi, Qiong; Zhang, Zhiyong; Xu, Pao

    2017-04-01

    Chinese clearhead icefish, Protosalanx hyalocranius , is a representative icefish species with economic importance and special appearance. Due to its great economic value in China, the fish was introduced into Lake Dianchi and several other lakes from the Lake Taihu half a century ago. Similar to the Sinocyclocheilus cavefish, the clearhead icefish has certain cavefish-like traits, such as transparent body and nearly scaleless skin. Here, we provide the whole genome sequence of this surface-dwelling fish and generated a draft genome assembly, aiming at exploring molecular mechanisms for the biological interests. A total of 252.1 Gb of raw reads were sequenced. Subsequently, a novel draft genome assembly was generated, with the scaffold N50 reaching 1.163 Mb. The genome completeness was estimated to be 98.39 % by using the CEGMA evaluation. Finally, we annotated 19 884 protein-coding genes and observed that repeat sequences account for 24.43 % of the genome assembly. We report the first draft genome of the Chinese clearhead icefish. The genome assembly will provide a solid foundation for further molecular breeding and germplasm resource protection in Chinese clearhead icefish, as well as other icefishes. It is also a valuable genetic resource for revealing the molecular mechanisms for the cavefish-like characters.

  14. Whole-Genome Sequencing for National Surveillance of Shigella flexneri

    Directory of Open Access Journals (Sweden)

    Marie A. Chattaway

    2017-09-01

    Full Text Available National surveillance of Shigella flexneri ensures the rapid detection of outbreaks to facilitate public health investigation and intervention strategies. In this study, we used whole-genome sequencing (WGS to type S. flexneri in order to detect linked cases and support epidemiological investigations. We prospectively analyzed 330 isolates of S. flexneri received at the Gastrointestinal Bacteria Reference Unit at Public Health England between August 2015 and January 2016. Traditional phenotypic and WGS sub-typing methods were compared. PCR was carried out on isolates exhibiting phenotypic/genotypic discrepancies with respect to serotype. Phylogenetic relationships between isolates were analyzed by WGS using single nucleotide polymorphism (SNP typing to facilitate cluster detection. For 306/330 (93% isolates there was concordance between serotype derived from the genome and phenotypic serology. Discrepant results between the phenotypic and genotypic tests were attributed to novel O-antigen synthesis/modification gene combinations or indels identified in O-antigen synthesis/modification genes rendering them dysfunctional. SNP typing identified 36 clusters of two isolates or more. WGS provided microbiological evidence of epidemiologically linked clusters and detected novel O-antigen synthesis/modification gene combinations associated with two outbreaks. WGS provided reliable and robust data for monitoring trends in the incidence of different serotypes over time. SNP typing can be used to facilitate outbreak investigations in real-time thereby informing surveillance strategies and providing the opportunities for implementing timely public health interventions.

  15. GPCR genes are preferentially retained after whole genome duplication.

    Directory of Open Access Journals (Sweden)

    Jenia Semyonov

    Full Text Available One of the most interesting questions in biology is whether certain pathways have been favored during evolution, and if so, what properties could cause such a preference. Due to the lack of experimental evidence, whether select gene families have been preferentially retained over time after duplication in metazoan organisms remains unclear. Here, by syntenic mapping of nonchemosensory G protein-coupled receptor genes (nGPCRs which represent half the receptome for transmembrane signaling in the vertebrate genomes, we found that, as opposed to the 8-15% retention rate for whole genome duplication (WGD-derived gene duplicates in the entire genome of pufferfish, greater than 27.8% of WGD-derived nGPCRs which interact with a nonpeptide ligand were retained after WGD in pufferfish Tetraodon nigroviridis. In addition, we show that concurrent duplication of cognate ligand genes by WGD could impose selection of nGPCRs that interact with a polypeptide ligand. Against less than 2.25% probability for parallel retention of a pair of WGD-derived ligands and a pair of cognate receptor duplicates, we found a more than 8.9% retention of WGD-derived ligand-nGPCR pairs--threefold greater than one would surmise. These results demonstrate that gene retention is not uniform after WGD in vertebrates, and suggest a Darwinian selection of GPCR-mediated intercellular communication in metazoan organisms.

  16. Current Developments in Prokaryotic Single Cell Whole Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Goudeau, Danielle; Nath, Nandita; Ciobanu, Doina; Cheng, Jan-Fang; Malmstrom, Rex

    2014-03-14

    Our approach to prokaryotic single-cell Whole Genome Amplification at the JGI continues to evolve. To increase both the quality and number of single-cell genomes produced, we explore all aspects of the process from cell sorting to sequencing. For example, we now utilize specialized reagents, acoustic liquid handling, and reduced reaction volumes eliminate non-target DNA contamination in WGA reactions. More specifically, we use a cleaner commercial WGA kit from Qiagen that employs a UV decontamination procedure initially developed at the JGI, and we use the Labcyte Echo for tip-less liquid transfer to set up 2uL reactions. Acoustic liquid handling also dramatically reduces reagent costs. In addition, we are exploring new cell lysis methods including treatment with Proteinase K, lysozyme, and other detergents, in order to complement standard alkaline lysis and allow for more efficient disruption of a wider range of cells. Incomplete lysis represents a major hurdle for WGA on some environmental samples, especially rhizosphere, peatland, and other soils. Finding effective lysis strategies that are also compatible with WGA is challenging, and we are currently assessing the impact of various strategies on genome recovery.

  17. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  18. Whole genomes redefine the mutational landscape of pancreatic cancer

    Science.gov (United States)

    Waddell, Nicola; Pajic, Marina; Patch, Ann-Marie; Chang, David K.; Kassahn, Karin S.; Bailey, Peter; Johns, Amber L.; Miller, David; Nones, Katia; Quek, Kelly; Quinn, Michael C. J.; Robertson, Alan J.; Fadlullah, Muhammad Z. H.; Bruxner, Tim J. C.; Christ, Angelika N.; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Wilson, Peter J; Markham, Emma; Cloonan, Nicole; Anderson, Matthew J.; Fink, J. Lynn; Holmes, Oliver; Kazakoff, Stephen H.; Leonard, Conrad; Newell, Felicity; Poudel, Barsha; Song, Sarah; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Wu, Jianmin; Pinese, Mark; Cowley, Mark J.; Lee, Hong C.; Jones, Marc D.; Nagrial, Adnan M.; Humphris, Jeremy; Chantrill, Lorraine A.; Chin, Venessa; Steinmann, Angela M.; Mawson, Amanda; Humphrey, Emily S.; Colvin, Emily K.; Chou, Angela; Scarlett, Christopher J.; Pinho, Andreia V.; Giry-Laterriere, Marc; Rooman, Ilse; Samra, Jaswinder S.; Kench, James G.; Pettitt, Jessica A.; Merrett, Neil D.; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q.; Barbour, Andrew; Zeps, Nikolajs; Jamieson, Nigel B.; Graham, Janet S.; Niclou, Simone P.; Bjerkvig, Rolf; Grützmann, Robert; Aust, Daniela; Hruban, Ralph H.; Maitra, Anirban; Iacobuzio-Donahue, Christine A.; Wolfgang, Christopher L.; Morgan, Richard A.; Lawlor, Rita T.; Corbo, Vincenzo; Bassi, Claudio; Falconi, Massimo; Zamboni, Giuseppe; Tortora, Giampaolo; Tempero, Margaret A.; Gill, Anthony J.; Eshleman, James R.; Pilarsky, Christian; Scarpa, Aldo; Musgrove, Elizabeth A.; Pearson, John V.; Biankin, Andrew V.; Grimmond, Sean M.

    2015-01-01

    Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded. PMID:25719666

  19. Evidence for an ancient whole genome duplication in the cycad lineage.

    Directory of Open Access Journals (Sweden)

    Danielle Roodt

    Full Text Available Contrary to the many whole genome duplication events recorded for angiosperms (flowering plants, whole genome duplications in gymnosperms (non-flowering seed plants seem to be much rarer. Although ancient whole genome duplications have been reported for most gymnosperm lineages as well, some are still contested and need to be confirmed. For instance, data for ginkgo, but particularly cycads have remained inconclusive so far, likely due to the quality of the data available and flaws in the analysis. We extracted and sequenced RNA from both the cycad Encephalartos natalensis and Ginkgo biloba. This was followed by transcriptome assembly, after which these data were used to build paralog age distributions. Based on these distributions, we identified remnants of an ancient whole genome duplication in both cycads and ginkgo. The most parsimonious explanation would be that this whole genome duplication event was shared between both species and had occurred prior to their divergence, about 300 million years ago.

  20. Digital Droplet Multiple Displacement Amplification (ddMDA for Whole Genome Sequencing of Limited DNA Samples.

    Directory of Open Access Journals (Sweden)

    Minsoung Rhee

    Full Text Available Multiple displacement amplification (MDA is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently, the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet, ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology.

  1. High Whole-Genome Sequence Diversity of Human Papillomavirus Type 18 Isolates

    Directory of Open Access Journals (Sweden)

    Pascal van der Weele

    2018-02-01

    Full Text Available Background: The most commonly found human papillomavirus (HPV types in cervical cancer are HPV16 and HPV18. Genome variants of these types have been associated with differential carcinogenic potential. To date, only a handful of studies have described HPV18 whole genome sequencing results. Here we describe HPV18 variant diversity and conservation of persistent infections in a longitudinal retrospective cohort study. Methods: Cervical self-samples were obtained annually over four years and genotyped on the SPF10-DEIA-LiPA25 platform. Clearing and persistent HPV18 positive infections were selected, amplified in two overlapping fragments, and sequenced using 32 sequence primers. Results: Complete viral genomes were obtained from 25 participants with persistent and 26 participants with clearing HPV18 infections, resulting in 52 unique HPV18 genomes. Sublineage A3 was predominant in this population. The consensus viral genome was completely conserved over time in persistent infections, with one exception, where different HPV18 variants were identified in follow-up samples. Conclusions: This study identified a diverse set of HPV18 variants. In persistent infections, the consensus viral genome is conserved. The identification of only one HPV18 infection with different major variants in follow-up implies that this is a potentially rare event. This dataset adds 52 HPV18 genome variants to Genbank, more than doubling the currently available HPV18 information resource, and all but one variant are unique additions.

  2. Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics.

    Science.gov (United States)

    Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko

    2017-07-12

    Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.

  3. A systematic evaluation of whole genome amplification of bisulfite-modified DNA

    Directory of Open Access Journals (Sweden)

    Bundo Miki

    2012-11-01

    Full Text Available Abstract Background Studying DNA methylation profiles in detail should be the first step in epigenetic research. Although sodium bisulfite modification of genomic DNA is the gold standard method for DNA methylation analysis, this method results in the loss of the majority of the DNA material. Whole genome amplification (WGA of bisulfite-modified DNA is expected to provide a rich source of materials, but its validity has not been thoroughly evaluated. In this study, we evaluated the extent of biased amplification in the WGA of bisulfite-modified DNA and the reproducibility of independent WGA reactions. We performed the multiple displacement amplification-based WGA separately three times. Each experiment included two reactions using 10 or 50 ng of bisulfite-modified DNA as template. DNA methylation levels were compared between WGA products and original bisulfite-modified DNA at about 450,000 CpG sites. Results Using a sufficient amount of bisulfite-modified DNA for WGA was critical for downstream application. The considerable deviations from original bisulfite-modified DNA were found in the middle range of DNA methylation levels. Distribution of hyper- and hypomethylation were equal, which suggested that the deviation at each CpG site occurred randomly. Averaging the data from independently amplified WGA products dramatically improved the overall quality. Conclusions WGA of bisulfite-modified DNA could be a valuable tool for epigenetic research, but careful experimental design and data interpretation are required.

  4. Use of routinely collected amniotic fluid for whole-genome expression analysis of polygenic disorders.

    Science.gov (United States)

    Nagy, Gyula Richárd; Gyõrffy, Balázs; Galamb, Orsolya; Molnár, Béla; Nagy, Bálint; Papp, Zoltán

    2006-11-01

    Neural tube defects related to polygenic disorders are the second most common birth defects in the world, but no molecular biologic tests are available to analyze the genes involved in the pathomechanism of these disorders. We explored the use of routinely collected amniotic fluid to characterize the differential gene expression profiles of polygenic disorders. We used oligonucleotide microarrays to analyze amniotic fluid samples obtained from pregnant women carrying fetuses with neural tube defects diagnosed during ultrasound examination. The control samples were obtained from pregnant women who underwent routine genetic amniocentesis because of advanced maternal age (>35 years). We also investigated specific folate-related genes because maternal periconceptional folic acid supplementation has been found to have a protective effect with respect to neural tube defects. Fetal mRNA from amniocytes was successfully isolated, amplified, labeled, and hybridized to whole-genome transcript arrays. We detected differential gene expression profiles between cases and controls. Highlighted genes such as SLA, LST1, and BENE might be important in the development of neural tube defects. None of the specific folate-related genes were in the top 100 associated transcripts. This pilot study demonstrated that a routinely collected amount of amniotic fluid (as small as 6 mL) can provide sufficient RNA to successfully hybridize to expression arrays. Analysis of the differences in fetal gene expressions might help us decipher the complex genetic background of polygenic disorders.

  5. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  6. SNP detection for massively parallel whole-genome resequencing.

    Science.gov (United States)

    Li, Ruiqiang; Li, Yingrui; Fang, Xiaodong; Yang, Huanming; Wang, Jian; Kristiansen, Karsten; Wang, Jun

    2009-06-01

    Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.

  7. Parent and Public Interest in Whole Genome Sequencing

    Science.gov (United States)

    Dodson, Daniel S.; Goldenberg, Aaron J.; Davis, Matthew M.; Singer, Dianne C.; Tarini, Beth A.

    2015-01-01

    Objective To assess the baseline interest of the public in whole genome sequencing (WGS) for themselves, parents’ interest in WGS for their youngest children, and factors associated with such interest. Methods A random sample of adults from a probability-based nationally representative online panel was surveyed. All participants were provided basic information about WGS and then asked their interest in WGS for themselves. Those participants who self-identified as parents were asked about their interest in WGS for their children. The order in which parents were asked about their interest in WGS for themselves and their child was randomized. The relationship between parent/child characteristics and interest in WGS was examined. Results Overall response rate was 62% (55% among parents). 58.6% of the total population (parents and non-parents) was interested in WGS for themselves. Similarly, 61.8% of parents were interested in WGS for themselves and 57.8% were interested in WGS for their youngest children. Of note, 84.7% of parents showed an identical interest level in WGS for themselves and their youngest children. Mothers as a whole, and parents whose youngest children had ≥2 health conditions had significantly more interest in WGS for themselves and their youngest children, while those with conservative political ideologies had considerably less. Conclusions While U.S. adults have varying interest levels in WGS, parents appear to have similar interests in genome testing for themselves and their youngest children. As WGS technology becomes available in the clinic and private market, clinicians should be prepared to discuss WGS risks and benefits with their patients. PMID:25765282

  8. Parent and public interest in whole-genome sequencing.

    Science.gov (United States)

    Dodson, Daniel S; Goldenberg, Aaron J; Davis, Matthew M; Singer, Dianne C; Tarini, Beth A

    2015-01-01

    The aim of this study was to assess the baseline interest of the public in whole-genome sequencing (WGS) for oneself, parents' interest in WGS for their youngest children, and factors associated with such interest. A random sample of adults from a probability-based nationally representative online panel was surveyed. All participants were provided basic information about WGS and then asked about their interest in WGS for themselves. Those participants who were parents were additionally asked about their interest in WGS for their children. The order in which parents were asked about their interest in WGS for themselves and for their child was randomized. The relationship between parent/child characteristics and interest in WGS was examined. The overall response rate was 62% (55% among parents). 58.6% of the total population (parents and nonparents) was interested in WGS for themselves. Similarly, 61.8% of the parents were interested in WGS for themselves and 57.8% were interested in WGS for their youngest children. Of note, 84.7% of the parents showed an identical interest level in WGS for themselves and their youngest children. Mothers as a group and parents whose youngest children had ≥2 health conditions had significantly more interest in WGS for themselves and their youngest children, while those with conservative political ideologies had considerably less. While US adults have varying interest levels in WGS, parents appear to have similar interests in genome testing for themselves and their youngest children. As WGS technology becomes available in the clinic and private market, clinicians should be prepared to discuss WGS risks and benefits with their patients. © 2015 S. Karger AG, Basel.

  9. A Whole Genome Association Study on Meat Palatability in Hanwoo

    Directory of Open Access Journals (Sweden)

    K.-E. Hyeong

    2014-09-01

    Full Text Available A whole genome association (WGA study was carried out to find quantitative trait loci (QTL for sensory evaluation traits in Hanwoo. Carcass samples of 250 Hanwoo steers were collected from National Agricultural Cooperative Livestock Research Institute, Ansung, Gyeonggi province, Korea, between 2011 and 2012 and genotyped with the Affymetrix Bovine Axiom Array 640K single nucleotide polymorphism (SNP chip. Among the SNPs in the chip, a total of 322,160 SNPs were chosen after quality control tests. After adjusting for the effects of age, slaughter-year-season, and polygenic effects using genome relationship matrix, the corrected phenotypes for the sensory evaluation measurements were regressed on each SNP using a simple linear regression additive based model. A total of 1,631 SNPs were detected for color, aroma, tenderness, juiciness and palatability at 0.1% comparison-wise level. Among the significant SNPs, the best set of 52 SNP markers were chosen using a forward regression procedure at 0.05 level, among which the sets of 8, 14, 11, 10, and 9 SNPs were determined for the respectively sensory evaluation traits. The sets of significant SNPs explained 18% to 31% of phenotypic variance. Three SNPs were pleiotropic, i.e. AX-26703353 and AX-26742891 that were located at 101 and 110 Mb of BTA6, respectively, influencing tenderness, juiciness and palatability, while AX-18624743 at 3 Mb of BTA10 affected tenderness and palatability. Our results suggest that some QTL for sensory measures are segregating in a Hanwoo steer population. Additional WGA studies on fatty acid and nutritional components as well as the sensory panels are in process to characterize genetic architecture of meat quality and palatability in Hanwoo.

  10. Whole genome analysis of epidemiologically closely related Staphylococcus aureus isolates.

    Directory of Open Access Journals (Sweden)

    Maarten Schijffelen

    Full Text Available The change of the bacteria from colonizers to pathogens is accompanied by a drastic change in expression profiles. These changes may be due to environmental signals or to mutational changes. We therefore compared the whole genome sequences of four sets of S. aureus isolates. Three sets were from the same patients. The isolates of each pair (S1800/S1805, S2396/S2395, S2398/S2397, an isolate from colonization and an isolate from infection, respectively were obtained within <30 days of each other and the isolate from infection caused skin infections. The isolates were then compared for differences in gene content and SNPs. In addition, a set of isolates from a colonized pig and a farmer from the same farm at the same time (S0462 and S0460 were analyzed. The isolates pair S1800/S1805 showed a difference in a prophage, but these are easily lost or acquired. However, S1805 contained an integrative conjugative element not present in S1800. In addition, 92 SNPs were present in a variety of genes and the isolates S1800 and S1805 were not considered a pair. Between S2395/S2396 two SNPs were present: one was in an intergenic region and one was a synonymous mutation in a putative membrane protein. Between S2397/S2398 only one synonymous mutation in a putative lipoprotein was found. The two farm isolates were very similar and showed 12 SNPs in genes that belong to a number of different functional categories. However, we cannot pinpoint any gene that explains the change from carrier status to infection. The data indicate that differences between the isolate from infection and the colonizing isolate for S2395/S2396 and S2397/S2398 exist as well as between isolates from different hosts, but S1800/S1805 are not clonal.

  11. Whole genome amplification and real-time PCR in forensic casework

    Directory of Open Access Journals (Sweden)

    Asili Paola

    2009-04-01

    Full Text Available Abstract Background WGA (Whole Genome Amplification in forensic genetics can eliminate the technical limitations arising from low amounts of genomic DNA (gDNA. However, it has not been used to date because any amplification bias generated may complicate the interpretation of results. Our aim in this paper was to assess the applicability of MDA to forensic SNP genotyping by performing a comparative analysis of genomic and amplified DNA samples. A 26-SNPs TaqMan panel specifically designed for low copy number (LCN and/or severely degraded genomic DNA was typed on 100 genomic as well as amplified DNA samples. Results Aliquots containing 1, 0.1 and 0.01 ng each of 100 DNA samples were typed for a 26-SNPs panel. Similar aliquots of the same DNA samples underwent multiple displacement amplification (MDA before being typed for the same panel. Genomic DNA samples showed 0% PCR failure rate for all three dilutions, whilst the PCR failure rate of the amplified DNA samples was 0% for the 1 ng and 0.1 ng dilutions and 0.077% for the 0.01 ng dilution. The genotyping results of both the amplified and genomic DNA samples were also compared with reference genotypes of the same samples obtained by direct sequencing. The genomic DNA samples showed genotype concordance rates of 100% for all three dilutions while the concordance rates of the amplified DNA samples were 100% for the 1 ng and 0.1 ng dilutions and 99.923% for the 0.01 ng dilution. Moreover, ten artificially-degraded DNA samples, which gave no results when analyzed by current forensic methods, were also amplified by MDA and genotyped with 100% concordance. Conclusion We investigated the suitability of MDA material for forensic SNP typing. Comparative analysis of amplified and genomic DNA samples showed that a large number of SNPs could be accurately typed starting from just 0.01 ng of template. We found that the MDA genotyping call and accuracy rates were only slightly lower than those for genomic DNA

  12. Whole genome amplification and real-time PCR in forensic casework

    Science.gov (United States)

    Giardina, Emiliano; Pietrangeli, Ilenia; Martone, Claudia; Zampatti, Stefania; Marsala, Patrizio; Gabriele, Luciano; Ricci, Omero; Solla, Gianluca; Asili, Paola; Arcudi, Giovanni; Spinella, Aldo; Novelli, Giuseppe

    2009-01-01

    Background WGA (Whole Genome Amplification) in forensic genetics can eliminate the technical limitations arising from low amounts of genomic DNA (gDNA). However, it has not been used to date because any amplification bias generated may complicate the interpretation of results. Our aim in this paper was to assess the applicability of MDA to forensic SNP genotyping by performing a comparative analysis of genomic and amplified DNA samples. A 26-SNPs TaqMan panel specifically designed for low copy number (LCN) and/or severely degraded genomic DNA was typed on 100 genomic as well as amplified DNA samples. Results Aliquots containing 1, 0.1 and 0.01 ng each of 100 DNA samples were typed for a 26-SNPs panel. Similar aliquots of the same DNA samples underwent multiple displacement amplification (MDA) before being typed for the same panel. Genomic DNA samples showed 0% PCR failure rate for all three dilutions, whilst the PCR failure rate of the amplified DNA samples was 0% for the 1 ng and 0.1 ng dilutions and 0.077% for the 0.01 ng dilution. The genotyping results of both the amplified and genomic DNA samples were also compared with reference genotypes of the same samples obtained by direct sequencing. The genomic DNA samples showed genotype concordance rates of 100% for all three dilutions while the concordance rates of the amplified DNA samples were 100% for the 1 ng and 0.1 ng dilutions and 99.923% for the 0.01 ng dilution. Moreover, ten artificially-degraded DNA samples, which gave no results when analyzed by current forensic methods, were also amplified by MDA and genotyped with 100% concordance. Conclusion We investigated the suitability of MDA material for forensic SNP typing. Comparative analysis of amplified and genomic DNA samples showed that a large number of SNPs could be accurately typed starting from just 0.01 ng of template. We found that the MDA genotyping call and accuracy rates were only slightly lower than those for genomic DNA. Indeed, when 10 pg of

  13. Whole genome comparative analysis of four Georgian grape cultivars.

    Science.gov (United States)

    Tabidze, V; Pipia, I; Gogniashvili, M; Kunelauri, N; Ujmajuridze, L; Pirtskhalava, M; Vishnepolsky, B; Hernandez, A G; Fields, C J; Beridze, Tengiz

    2017-12-01

    Grapevine is the one of the most important fruit species in the world. Comparative genome sequencing of grape cultivars is very important for the interpretation of the grape genome and understanding its evolution. The genomes of four Georgian grape cultivars-Chkhaveri, Saperavi, Meskhetian green, and Rkatsiteli, belonging to different haplogroups, were resequenced. The shotgun genomic libraries of grape cultivars were sequenced on an Illumina HiSeq. Pinot Noir nuclear, mitochondrial, and chloroplast DNA were used as reference. Mitochondrial DNA of Chkhaveri closely matches that of the reference Pinot noir mitochondrial DNA, with the exception of 16 SNPs found in the Chkhaveri mitochondrial DNA. The number of SNPs in mitochondrial DNA from Saperavi, Meskhetian green, and Rkatsiteli was 764, 702, and 822, respectively. Nuclear DNA differs from the reference by 1,800,675 nt in Chkhaveri, 1,063,063 nt in Meskhetian green, 2,174,995 in Saperavi, and 5,011,513 in Rkatsiteli. Unlike mtDNA Pinot noir, chromosomal DNA is closer to the Meskhetian green than to other cultivars. Substantial differences in the number of SNPs in mitochondrial and nuclear DNA of Chkhaveri and Pinot noir cultivars are explained by backcrossing or introgression of their wild predecessors before or during the process of domestication. Annotation of chromosomal DNA of Georgian grape cultivars by MEGANTE, a web-based annotation system, shows 66,745 predicted genes (Chkhaveri-17,409; Saperavi-17,021; Meskhetian green-18,355; and Rkatsiteli-13,960). Among them, 106 predicted genes and 43 pseudogenes of terpene synthase genes were found in chromosomes 12, 18 random (18R), and 19. Four novel TPS genes not present in reference Pinot noir DNA were detected. Two of them-germacrene A synthase (Chromosome 18R) and (-) germacrene D synthase (Chromosome 19) can be identified as putatively full-length proteins. This work performs the first attempt of the comparative whole genome analysis of different haplogroups

  14. Clinical interpretation and implications of whole-genome sequencing.

    Science.gov (United States)

    Dewey, Frederick E; Grove, Megan E; Pan, Cuiping; Goldstein, Benjamin A; Bernstein, Jonathan A; Chaib, Hassan; Merker, Jason D; Goldfeder, Rachel L; Enns, Gregory M; David, Sean P; Pakdaman, Neda; Ormond, Kelly E; Caleshu, Colleen; Kingham, Kerry; Klein, Teri E; Whirl-Carrillo, Michelle; Sakamoto, Kenneth; Wheeler, Matthew T; Butte, Atul J; Ford, James M; Boxer, Linda; Ioannidis, John P A; Yeung, Alan C; Altman, Russ B; Assimes, Themistocles L; Snyder, Michael; Ashley, Euan A; Quertermous, Thomas

    2014-03-12

    Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication. To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings. An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings. Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up. Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in

  15. TCGA's Pan-Cancer Efforts and Expansion to Include Whole Genome Sequence - TCGA

    Science.gov (United States)

    Carolyn Hutter, Ph.D., Program Director of NHGRI's Division of Genomic Medicine, discusses the expansion of TCGA's Pan-Cancer efforts to include the Pan-Cancer Analysis of Whole Genomes (PAWG) project.

  16. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria

    NARCIS (Netherlands)

    Ellington, M.J.; Ekelund, O.; Aarestrup, F.M.; Canton, R.; Doumith, M.; Giske, C.; Grundman, H.; Hasman, H.; Holden, M.T.G.; Hopkins, K.L.; Iredell, J.; Kahlmeter, G.; Köser, C.U.; MacGowan, A.; Mevius, D.; Mulvey, M.; Naas, T.; Peto, T.; Rolain, J.M.; Samuelsen,; Woodford, N.

    2017-01-01

    Whole genome sequencing (WGS) offers the potential to predict antimicrobial susceptibility from a single assay. The European Committee on Antimicrobial Susceptibility Testing established a subcommittee to review the current development status of WGS for bacterial antimicrobial susceptibility testing

  17. Towards a whole-genome sequence for rye (Secale cereale L.)

    National Research Council Canada - National Science Library

    Bauer, Eva; Schmutzer, Thomas; Barilar, Ivan; Mascher, Martin; Gundlach, Heidrun; Martis, Mihaela-Maria; Twardziok, Sven O; Hackauf, Bernd; Gordillo, Andres; Wilde, Peer; Schmidt, Malthe; Korzun, Viktor; Mayer, Klaus F. X; Schmid, Karl; Schoen, Chris-Carolin; Scholz, Uwe

    2017-01-01

    We report on a whole-genome draft sequence of rye (Secale cereale L.). Rye is a diploid Triticeae species closely related to wheat and barley, and an important crop for food and feed in Central and Eastern Europe...

  18. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing

    National Research Council Canada - National Science Library

    Helman, Elena; Lawrence, Michael S; Stewart, Chip; Sougnez, Carrie; Getz, Gad; Meyerson, Matthew

    2014-01-01

    .... Here, we applied TranspoSeq, a computational framework that identifies retrotransposon insertions from sequencing data, to whole genomes from 200 tumor/normal pairs across 11 tumor types as part...

  19. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected...... this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome sequence data alongside the 54k SNP set....

  20. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian

    2015-01-01

    BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate-oligonucleoti......BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate...

  1. HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data

    OpenAIRE

    Nariai, Naoki; Kojima, Kaname; Saito, Sakae; Mimori, Takahiro; Sato, Yukuto; Kawai, Yosuke; Yamaguchi-Kabata, Yumi; Yasuda, Jun; Nagasaki, Masao

    2015-01-01

    Background Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data. Results We have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data. HLA-VBSeq simultaneously optimiz...

  2. Identification of molecular phenotypic descriptors of breast capsular contracture formation using informatics analysis of the whole genome transcriptome.

    Science.gov (United States)

    Kyle, Daniel J T; Harvey, Alison G; Shih, Barbara; Tan, Kian T; Chaudhry, Iskander H; Bayat, Ardeshir

    2013-01-01

    Breast capsular contracture formation following silicone implant augmentation/reconstruction is a common complication that remains poorly understood. The aim of this study was to identify potential biomarkers implicated in breast capsular contracture formation by using, for the first time, whole genome arrays. Biopsy samples were taken from 18 patients (23 breast capsules) with Baker Grade I-II (Control) and Baker Grade III-IV (Contracted). Whole genome microarrays were performed and six significantly dysregulated genes were selected for further validation with quantitative reverse transcriptase polymerase chain reaction and immunohistochemistry. Hematoxylin and eosin was also carried out to compare the histological characteristics of control and contracted samples. Microarray results showed that aggrecan, tissue inhibitor of metalloproteinase 4 (TIMP4), and tumor necrosis factor superfamily (ligand) member 11 were significantly down-regulated in contracted capsules; while matrix metallopeptidase 12, serum amyloid A 1, and interleukin 8 (IL8) were significantly up-regulated. The dysregulation of aggrecan, tumor necrosis factor superfamily (ligand) member 11, TIMP4, and IL8 was validated by quantitative reverse transcriptase polymerase chain reaction (p contracture formation. IL8 and TIMP4 may serve as potential key diagnostic, therapeutic, and prognostic biomarkers in capsular contracture formation. © 2013 by the Wound Healing Society.

  3. Comparing whole-genome sequencing with Sanger sequencing for spa typing of methicillin-resistant Staphylococcus aureus.

    Science.gov (United States)

    Bartels, Mette Damkjær; Petersen, Andreas; Worning, Peder; Nielsen, Jesper Boye; Larner-Svensson, Hanna; Johansen, Helle Krogh; Andersen, Leif Percival; Jarløv, Jens Otto; Boye, Kit; Larsen, Anders Rhod; Westh, Henrik

    2014-12-01

    spa typing of methicillin-resistant Staphylococcus aureus (MRSA) has traditionally been done by PCR amplification and Sanger sequencing of the spa repeat region. At Hvidovre Hospital, Denmark, whole-genome sequencing (WGS) of all MRSA isolates has been performed routinely since January 2013, and an in-house analysis pipeline determines the spa types. Due to national surveillance, all MRSA isolates are sent to Statens Serum Institut, where the spa type is determined by PCR and Sanger sequencing. The purpose of this study was to evaluate the reliability of the spa types obtained by 150-bp paired-end Illumina WGS. MRSA isolates from new MRSA patients in 2013 (n = 699) in the capital region of Denmark were included. We found a 97% agreement between spa types obtained by the two methods. All isolates achieved a spa type by both methods. Nineteen isolates differed in spa types by the two methods, in most cases due to the lack of 24-bp repeats in the whole-genome-sequenced isolates. These related but incorrect spa types should have no consequence in outbreak investigations, since all epidemiologically linked isolates, regardless of spa type, will be included in the single nucleotide polymorphism (SNP) analysis. This will reveal the close relatedness of the spa types. In conclusion, our data show that WGS is a reliable method to determine the spa type of MRSA. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  4. Whole genome sequencing as the ultimate tool to diagnose tuberculosis.

    Science.gov (United States)

    van Soolingen, Dick; Jajou, Rana; Mulder, Arnout; de Neeling, Han

    2016-12-01

    In the past two decades, DNA techniques have been increasingly used in the laboratory diagnosis of tuberculosis (TB). The (sub) species of the Mycobacterium tuberculosis complex are usually identified using reverse line blot techniques. The resistance is predicted by the detection of mutations in genes associated with resistance. Nevertheless, all cases are still subjected to cumbersome phenotypic resistance testing. The production of a strain-characteristic DNA fingerprint, to investigate the epidemiology of TB, is done by the 24-locus variable number tandem repeat (VNTR) typing. However, most of the molecular techniques in the diagnosis of TB can eventually be replaced by whole genome sequencing (WGS). Many international TB reference laboratories are currently working on the introduction of WGS; however, standardization in the international context is lacking. The European Centre for Infectious Disease Prevention and Control in Stockholm, Sweden organizes a yearly round of quality control on VNTR typing and in 2015 for the first time also WGS. In this first proficiency study, only three out of eight international TB laboratories produced WGS results in line with those of the reference laboratory. The whole process of DNA isolation, purification, quantification, sequencing, and analysis/interpretation of data is still under development. In this presentation, many aspects will be covered that influence the quality and interpretation of WGS results. The turn-around-time, analysis, and utility of WGS will be discussed. Moreover, the experiences in the use of WGS in the molecular epidemiology of TB in The Netherlands are detailed. It can be concluded that many difficulties still have to be conquered. The state of the art is that bacteria still have to be cultured to have sufficient quality and quantity of DNA for succesful WGS. The quality of sequencing has improved significantly over the past 7years, and the detection of mutations has, therefore, become more reliable

  5. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli

    DEFF Research Database (Denmark)

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole

    2014-01-01

    was established from WGS, enabling discrimination between sporadic and outbreak isolates. Overall, WGS typing produced results faster and at a lower cost than the current routine. Therefore, WGS typing is a superior alternative to conventional typing strategies. This approach may also be applied to typing......Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming...... cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-producing Escherichia coli (VTEC). In Denmark, the Statens Serum Institut (SSI) routinely receives all...

  6. Comparing Whole-Genome Sequencing with Sanger Sequencing for spa Typing of Methicillin-Resistant Staphylococcus aureus

    DEFF Research Database (Denmark)

    Bartels, Mette Damkjaer; Petersen, Andreas; Worning, Peder

    2014-01-01

    spa typing of methicillin-resistant Staphylococcus aureus (MRSA) has traditionally been done by PCR amplification and Sanger sequencing of the spa repeat region. At Hvidovre Hospital, Denmark, whole-genome sequencing (WGS) of all MRSA isolates has been performed routinely since January 2013......, and an in-house analysis pipeline determines the spa types. Due to national surveillance, all MRSA isolates are sent to Statens Serum Institut, where the spa type is determined by PCR and Sanger sequencing. The purpose of this study was to evaluate the reliability of the spa types obtained by 150-bp paired......-end Illumina WGS. MRSA isolates from new MRSA patients in 2013 (n = 699) in the capital region of Denmark were included. We found a 97% agreement between spa types obtained by the two methods. All isolates achieved a spa type by both methods. Nineteen isolates differed in spa types by the two methods, in most...

  7. Wgssat: A High-Throughput Computational Pipeline for Mining and Annotation of SSR Markers from Whole Genomes.

    Science.gov (United States)

    Pandey, Manmohan; Kumar, Ravindra; Srivastava, Prachi; Agarwal, Suyash; Srivastava, Shreya; Nagpure, N S; Jena, J K; Kushwaha, Basdeo

    2017-09-16

    Mining and characterization of SSR markers from whole genomes provide valuable information about biological significance of SSR distribution and also facilitate development of markers for genetic analysis. WGS-SSR Annotation Tool (WGSSAT) is a graphical user interface pipeline developed using Java Netbeans and Perl scripts which facilitates in simplifying the process of SSR mining and characterization. WGSSAT takes input in FASTA format and automates the prediction of genes, ncRNA, core genes, repeats and SSRs from whole genomes followed by mapping of the predicted SSRs onto a genome (classified according to genes, ncRNA, repeats, exonic, intronic and core gene region) along with primer identification and mining of cross-species markers. The program also generates a detailed statistical report along with visualization of mapped SSRs, genes, core genes and RNAs. The features of WGSSAT were demonstrated using Takifugurubripes data. This yielded a total of 139057 SSR, out of which 113703 SSR primer pairs were uniquely amplified in silico onto a Takifugurubripes (fugu) genome. Out of 1,13,703 mined SSRs, 81,463 were from coding region (including 4,286 exonic and 77,177 intronic), 7 from RNA, 267 from core genes of fugu, whereas 1,05,641 SSR and 601 SSR primer pairs were uniquely mapped onto the medaka genome. WGSSAT is tested under Ubuntu Linux. The source code, documentation, user manual, example dataset and scripts are available online at https://sourceforge.net/projects/wgssat-nbfgr. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. Whole-genome sequences of Chlamydia trachomatis directly from clinical samples without culture

    Science.gov (United States)

    Seth-Smith, Helena M.B.; Harris, Simon R.; Skilton, Rachel J.; Radebe, Frans M.; Golparian, Daniel; Shipitsyna, Elena; Duy, Pham Thanh; Scott, Paul; Cutcliffe, Lesley T.; O’Neill, Colette; Parmar, Surendra; Pitt, Rachel; Baker, Stephen; Ison, Catherine A.; Marsh, Peter; Jalal, Hamid; Lewis, David A.; Unemo, Magnus; Clarke, Ian N.; Parkhill, Julian; Thomson, Nicholas R.

    2013-01-01

    The use of whole-genome sequencing as a tool for the study of infectious bacteria is of growing clinical interest. Chlamydia trachomatis is responsible for sexually transmitted infections and the blinding disease trachoma, which affect hundreds of millions of people worldwide. Recombination is widespread within the genome of C. trachomatis, thus whole-genome sequencing is necessary to understand the evolution, diversity, and epidemiology of this pathogen. Culture of C. trachomatis has, until now, been a prerequisite to obtain DNA for whole-genome sequencing; however, as C. trachomatis is an obligate intracellular pathogen, this procedure is technically demanding and time consuming. Discarded clinical samples represent a large resource for sequencing the genomes of pathogens, yet clinical swabs frequently contain very low levels of C. trachomatis DNA and large amounts of contaminating microbial and human DNA. To determine whether it is possible to obtain whole-genome sequences from bacteria without the need for culture, we have devised an approach that combines immunomagnetic separation (IMS) for targeted bacterial enrichment with multiple displacement amplification (MDA) for whole-genome amplification. Using IMS-MDA in conjunction with high-throughput multiplexed Illumina sequencing, we have produced the first whole bacterial genome sequences direct from clinical samples. We also show that this method can be used to generate genome data from nonviable archived samples. This method will prove a useful tool in answering questions relating to the biology of many difficult-to-culture or fastidious bacteria of clinical concern. PMID:23525359

  9. Whole genome sequencing in the prevention and control of Staphylococcus aureus infection.

    Science.gov (United States)

    Price, J R; Didelot, X; Crook, D W; Llewelyn, M J; Paul, J

    2013-01-01

    Staphylococcus aureus remains a leading cause of hospital-acquired infection but weaknesses inherent in currently available typing methods impede effective infection prevention and control. The high resolution offered by whole genome sequencing has the potential to revolutionise our understanding and management of S. aureus infection. To outline the practicalities of whole genome sequencing and discuss how it might shape future infection control practice. We review conventional typing methods and compare these with the potential offered by whole genome sequencing. In contrast with conventional methods, whole genome sequencing discriminates down to single nucleotide differences and allows accurate characterisation of transmission events and outbreaks and additionally provides information about the genetic basis of phenotypic characteristics, including antibiotic susceptibility and virulence. However, translating its potential into routine practice will depend on affordability, acceptable turnaround times and on creating a reliable standardised bioinformatic infrastructure. Whole genome sequencing has the potential to provide a universal test that facilitates outbreak investigation, enables the detection of emerging strains and predicts their clinical importance. Copyright © 2012 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.

  10. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia.

    Directory of Open Access Journals (Sweden)

    Ulziijargal Gurjav

    Full Text Available Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24 genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW, Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841 had complete demographic and genotyping data. East-African Indian (474; 28.0% and Beijing (470; 27.8% lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692 and was highest among Beijing lineage strains (35.7%; 168/470. One Beijing and three East-African Indian (EAI clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates demonstrated diverse single nucleotide polymorphisms (SNPs within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings.

  11. Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly

    Energy Technology Data Exchange (ETDEWEB)

    Shou, S. [Univ. Wisc.-Madison; Kvikstad, E. [Univ. Wisc.-Madison; Kile, A. [Univ. Wisc.-Madison; Severin, J. [Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly; Forrest, D. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Hickman, J. W. [Univ. Wisc.-Madison; Mackenzie, C. [University of Texas–Houston Medical School; Choudhary, M. [University of Texas–Houston Medical School; Donohue, T. [Univ. Wisc.-Madison; Kaplan, S. [University of Texas–Houston Medical School; Schwartz, D. C. [Univ. Wisc.-Madison

    2003-09-01

    Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verification of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.

  12. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Whole-Genome Sequences of Two Borrelia afzelii and Two Borrelia garinii Lyme Disease Agent Isolates

    Energy Technology Data Exchange (ETDEWEB)

    Casjens, S.R.; Dunn, J.; Mongodin, E. F.; Qiu, W.-G.; Luft, B. J.; Fraser-Liggett, C. M.; Schutzer, S. E.

    2011-12-01

    Human Lyme disease is commonly caused by several species of spirochetes in the Borrelia genus. In Eurasia these species are largely Borrelia afzelii, B. garinii, B. burgdorferi, and B. bavariensis sp. nov. Whole-genome sequencing is an excellent tool for investigating and understanding the influence of bacterial diversity on the pathogenesis and etiology of Lyme disease. We report here the whole-genome sequences of four isolates from two of the Borrelia species that cause human Lyme disease, B. afzelii isolates ACA-1 and PKo and B. garinii isolates PBr and Far04.

  14. A randomization test for controlling population stratification in whole-genome association studies.

    Science.gov (United States)

    Kimmel, Gad; Jordan, Michael I; Halperin, Eran; Shamir, Ron; Karp, Richard M

    2007-11-01

    Population stratification can be a serious obstacle in the analysis of genomewide association studies. We propose a method for evaluating the significance of association scores in whole-genome cohorts with stratification. Our approach is a randomization test akin to a standard permutation test. It conditions on the genotype matrix and thus takes into account not only the population structure but also the complex linkage disequilibrium structure of the genome. As we show in simulation experiments, our method achieves higher power and significantly better control over false-positive rates than do existing methods. In addition, it can be easily applied to whole-genome association studies.

  15. Whole-Genome de novo Sequencing Of Quail And Grey Partridge

    DEFF Research Database (Denmark)

    Holm, Lars-Erik; Panitz, Frank; Burt, Dave

    2011-01-01

    The development in sequencing methods has made it possible to perform whole genome de novo sequencing of species without large commercial interests. Within the EU-financed QUANTOMICS project (KBBE-2A-222664), we have performed de novo sequencing of quail (Coturnix coturnix) and grey partridge...... comparative studies towards the chicken genome and will aid in identifying evolutionarily conserved sequences within the Galliformes. The obtained sequences from quail and partridge represent a beginning of generating the whole genome sequence for these species. The continuation of establishing the genome...

  16. Landscape of somatic mutations in 560 breast cancer whole-genome sequences

    NARCIS (Netherlands)

    S. Nik-Zainal (Serena); H. Davies (Helen); J. Staaf (Johan); M. Ramakrishna (Manasa); D. Glodzik (Dominik); X. Zou (Xueqing); I. Martincorena (Inigo); L.B. Alexandrov (Ludmil); S. Martin (Sandra); D.C. Wedge (David); P. van Loo (Peter); Y.S. Ju (Young Seok); M. Smid (Marcel); A.B. Brinkman (Arie B.); S. Morganella (Sandro); Aure, M.R. (Miriam R.); Lingjærde, O.C. (Ole Christian); A. Langerød (Anita); Ringnér, M. (Markus); Ahn, S.-M. (Sung-Min); S. Boyault (Sandrine); Brock, J.E. (Jane E.); A. Broeks (Annegien); A. Butler (Adam); C. Desmedt (Christine); L.Y. Dirix (Luc); S. Dronov (Serge); A. Fatima (Aquila); J.A. Foekens (John); M. Gerstung (Moritz); J. Hooijer; Jang, S.J. (Se Jin); Jones, D.R. (David R.); H.-Y. Kim (Hyung-Yong); King, T.A. (Tari A.); Krishnamurthy, S. (Savitri); Lee, H.J. (Hee Jin); Lee, J.-Y. (Jeong-Yeon); Y. Li (Yilong); S. McLaren (Stuart); D. Menzies; Mustonen, V. (Ville); S. O'Meara (Sarah); I. Pauporté (Iris); X. Pivot (Xavier); C.A. Purdie (Colin A.); J.W. Raine (John); Ramakrishnan, K. (Kamna); F.G. Rodriguez-Gonzalez (F. German); Romieu, G. (Gilles); A.M. Sieuwerts (Anieta); Simpson, P.T. (Peter T.); Shepherd, R. (Rebecca); L.A. Stebbings (Lucy); Stefansson, O.A. (Olafur A.); J. Teague (Jon); Tommasi, S. (Stefania); I. Treilleux (Isabelle); G. van den Eynden; P.B. Vermeulen; A. Vincent-Salomon (Anne); L.R. Yates (Lucy); C. Caldas (Carlos); L.J. van 't Veer (Laura); A. Tutt (Andrew); S. Knappskog (Stian); Tan, B.K.T. (Benita Kiat Tee); J. Jonkers (Jos); Å. Borg (Åke); Ueno, N.T. (Naoto T.); C. Sotiriou (Christos); Viari, A. (Alain); P.A. Futreal (Andrew); P.J. Campbell (Peter); P.N. Span (Paul); S.J. van Laere (Steven); S. Lakhani (Sunil); J. Eyfjord; A.M. Thompson (Alastair M.); E. Birney (Ewan); H. Stunnenberg (Henk); M.J. Vijver (Marc ); J.W.M. Martens (John); A.-L. Borresen-Dale (Anne-Lise); A.L. Richardson (Andrea); G. Kong (Gu); G. Thomas (Gilles); M.R. Stratton (Michael)

    2016-01-01

    textabstractWe analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding

  17. Determination of Elizabethkingia Diversity by MALDI-TOF Mass Spectrometry and Whole-Genome Sequencing

    DEFF Research Database (Denmark)

    Eriksen, Helle Brander; Gumpert, Heidi; Faurholt, Cecilie Haase

    2017-01-01

    In a hospital-acquired infection with multidrug-resistant Elizabethkingia, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry and 16S rRNA gene analysis identified the pathogen as Elizabethkingia miricola. Whole-genome sequencing, genus-level core genome analysis...

  18. Diagnosis of Capnocytophaga canimorsus Sepsis by Whole-Genome Next-Generation Sequencing.

    Science.gov (United States)

    Abril, Maria K; Barnett, Adam S; Wegermann, Kara; Fountain, Eric; Strand, Andrew; Heyman, Benjamin M; Blough, Britton A; Swaminathan, Aparna C; Sharma-Kuinkel, Batu; Ruffin, Felicia; Alexander, Barbara D; McCall, Chad M; Costa, Sylvia F; Arcasoy, Murat O; Hong, David K; Blauwkamp, Timothy A; Kertesz, Michael; Fowler, Vance G; Kraft, Bryan D

    2016-09-01

    We report the case of a 60-year-old man with septic shock due to Capnocytophaga canimorsus that was diagnosed in 24 hours by a novel whole-genome next-generation sequencing assay. This technology shows great promise in identifying fastidious pathogens, and, if validated, it has profound implications for infectious disease diagnosis.

  19. The effect of whole genome amplification on samples originating from more than one donor

    DEFF Research Database (Denmark)

    Thacker, C.R.; Balogh, M.K.; Børsting, Claus

    2006-01-01

    In this study, the GenomiPhi(TM) DNA Amplification Kit (Amersham Biosciences) was used to investigate the potential of whole genome amplification (WGA) when considering samples originating from more than one donor. DNA was extracted from blood samples, quantified and normalised before being mixed...

  20. Toxicological effects of benzo[a]pyrene on DNA methylation of whole genome in ICR mice.

    Science.gov (United States)

    Zhao, L; Zhang, S; An, X; Tan, W; Pang, D; Ouyang, H

    2015-10-30

    It has been well known that alterations in DNA methylation - an important regulator of gene transcription - lead to cancer. Therefore a change in the level of DNA methylation of whole genome has been considered as a biomarker of carcinogenesis. Previously, a large number of experimental results in genetic toxicology have showed that benzo[a]pyrene could cause DNA mutation and fragmentation. However, there was little to no studies on alterations in DNA methylation of genome directly result from exposure to benzo[a]pyrene. In this paper, possible mechanisms of alterations in whole genomic DNA methylation by benzo[a]pyrene were investigated using ICR mice after benzo[a]pyrene exposure. The blood, liver, pancreas, skin, lung and bladder of ICR mice were removed and checked after a fixed time interval (6 hours) of benzo[a]pyrene exposure, and whole genomic DNA methylation level was determined by high performance liquid chromatography (HPLC). The results exhibited tissue specificity, that is, the level of whole genomic DNA methylation decreases significantly in blood and liver, rather than pancreas, lung, skin and bladder of ICR mice. This study investigated the direct relationship between aberrant DNA methylation level and benzo[a]pyrene exposure, which might be helpful to clarify the toxicological mechanism of benzo[a]pyrene in epigenetic perspectives.

  1. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    NARCIS (Netherlands)

    Yuen, Ryan K C; Merico, Daniele; Bookman, Matt; Howe, Jennifer L.; Thiruvahindrapuram, Bhooma; Patel, Rohan V.; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A.; Walker, Susan; Marshall, Christian R.; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D'Abate, Lia; Chan, Ada J S; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L.; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J.; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R.; Nalpathamkalam, Thomas; Sung, Wilson W L; Tsoi, Fiona J.; Wei, John; Xu, Lizhen; Tasse, Anne Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie Mackinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M.; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H.; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A.; Parr, Jeremy R.; Spence, Sarah J.; Vorstman, Jacob|info:eu-repo/dai/nl/304817023; Frey, Brendan J.; Robinson, James T.; Strug, Lisa J.; Fernandez, Bridget A.; Elsabbagh, Mayada; Carter, Melissa T.; Hallmayer, Joachim; Knoppers, Bartha M.; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H.; Glazer, David; Pletcher, Mathew T.; Scherer, Stephen W.

    2017-01-01

    We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information,

  2. Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data

    NARCIS (Netherlands)

    Farmery, James H. R.; Smith, Mike L.; Lynch, Andy G.; Huissoon, Aarnoud; Furnell, Abigail; Mead, Adam; Levine, Adam P.; Manzur, Adnan; Thrasher, Adrian; Greenhalgh, Alan; Parker, Alasdair; Sanchis-Juan, Alba; Richter, Alex; Gardham, Alice; Lawrie, Allan; Sohal, Aman; Creaser-Myers, Amanda; Frary, Amy; Greinacher, Andreas; Themistocleous, Andreas; Peacock, Andrew J.; Marshall, Andrew; Mumford, Andrew; Rice, Andrew; Webster, Andrew; Brady, Angie; Koziell, Ania; Manson, Ania; Chandra, Anita; Hensiek, Anke; Veld, Anna Huis In't; Maw, Anna; Kelly, Anne M.; Moore, Anthony; Vonk Noordegraaf, Anton; Attwood, Antony; Herwadkar, Archana; Ghofrani, Ardi; Houweling, Arjan C.; Girerd, Barbara; Furie, Bruce; Treacy, Carmen M.; Millar, Carolyn M.; Sewell, Carrock; Roughley, Catherine; Titterton, Catherine; Williamson, Catherine; Hadinnapola, Charaka; Deshpande, Charu; Toh, Cheng-Hock; Bacchelli, Chiara; Patch, Chris; Geet, Chris Van; Babbs, Christian; Bryson, Christine; Penkett, Christopher J.; Rhodes, Christopher J.; Watt, Christopher; Bethune, Claire; Booth, Claire; Lentaigne, Claire; McJannet, Coleen; Church, Colin; French, Courtney; Samarghitean, Crina; Halmagyi, Csaba; Gale, Daniel; Greene, Daniel; Hart, Daniel; Allsup, David; Bennett, David; Edgar, David; Kiely, David G.; Gosal, David; Perry, David J.; Keeling, David; Montani, David; Shipley, Debbie; Whitehorn, Deborah; Fletcher, Debra; Krishnakumar, Deepa; Grozeva, Detelina; Kumararatne, Dinakantha; Thompson, Dorothy; Josifova, Dragana; Maher, Eamonn; Wong, Edwin K. S.; Murphy, Elaine; Dewhurst, Eleanor; Louka, Eleni; Rosser, Elisabeth; Chalmers, Elizabeth; Colby, Elizabeth; Drewe, Elizabeth; McDermott, Elizabeth; Thomas, Ellen; Staples, Emily; Clement, Emma; Matthews, Emma; Wakeling, Emma; Oksenhendler, Eric; Turro, Ernest; Reid, Evan; Wassmer, Evangeline; Raymond, F. Lucy; Hu, Fengyuan; Kennedy, Fiona; Soubrier, Florent; Flinter, Frances; Kovacs, Gabor; Polwarth, Gary; Ambegaonkar, Gautum; Arno, Gavin; Hudson, Gavin; Woods, Geoff; Coghlan, Gerry; Hayman, Grant; Arumugakani, Gururaj; Schotte, Gwen; Cook, H. Terry; Alachkar, Hana; Lango Allen, Hana; Lango-Allen, Hana; Stark, Hannah; Stauss, Hans; Schulze, Harald; Boggard, Harm J.; Baxendale, Helen; Dolling, Helen; Firth, Helen; Gall, Henning; Watson, Henry; Longhurst, Hilary; Markus, Hugh S.; Watkins, Hugh; Simeoni, Ilenia; Emmerson, Ingrid; Roberts, Irene; Quinti, Isabella; Wanjiku, Ivy; Gibbs, J. Simon R.; Thaventhiran, James; Whitworth, James; Hurst, Jane; Collins, Janine; Suntharalingam, Jay; Payne, Jeanette; Thachil, Jecko; Martin, Jennifer M.; Martin, Jennifer; Carmichael, Jenny; Maimaris, Jesmeen; Paterson, Joan; Pepke-Zaba, Joanna; Heemskerk, Johan W. M.; Gebhart, Johanna; Davis, John; Pasi, John; Bradley, John R.; Wharton, John; Stephens, Jonathan; Rankin, Julia; Anderson, Julie; Vogt, Julie; von Ziegenweldt, Julie; Rehnstrom, Karola; Megy, Karyn; Talks, Kate; Peerlinck, Kathelijne; Yates, Katherine; Freson, Kathleen; Stirrups, Kathleen; Gomez, Keith; Smith, Kenneth G. C.; Carss, Keren; Rue-Albrecht, Kevin; Gilmour, Kimberley; Masati, Larahmie; Scelsi, Laura; Southgate, Laura; Ranganathan, Lavanya; Ginsberg, Lionel; Devlin, Lisa; Willcocks, Lisa; Ormondroyd, Liz; Lorenzo, Lorena; Harper, Lorraine; Allen, Louise; Daugherty, Louise; Chitre, Manali; Kurian, Manju; Humbert, Marc; Tischkowitz, Marc; Bitner-Glindzicz, Maria; Erwood, Marie; Scully, Marie; Veltman, Marijke; Caulfield, Mark; Layton, Mark; McCarthy, Mark; Ponsford, Mark; Toshner, Mark; Bleda, Marta; Wilkins, Martin; Mathias, Mary; Reilly, Mary; Afzal, Maryam; Brown, Matthew; Rondina, Matthew; Stubbs, Matthew; Haimel, Matthias; Lees, Melissa; Laffan, Michael A.; Browning, Michael; Gattens, Michael; Richards, Michael; Michaelides, Michel; Lambert, Michele P.; Makris, Mike; de Vries, Minka; Mahdi-Rogers, Mohamed; Saleem, Moin; Thomas, Moira; Holder, Muriel; Eyries, Mélanie; Clements-Brod, Naomi; Canham, Natalie; Dormand, Natalie; Zuydam, Natalie Van; Kingston, Nathalie; Ghali, Neeti; Cooper, Nichola; Morrell, Nicholas W.; Yeatman, Nigel; Roy, Noémi; Shamardina, Olga; Alavijeh, Omid S.; Gresele, Paolo; Nurden, Paquita; Chinnery, Patrick; Deegan, Patrick; Yong, Patrick; Man, Patrick Yu Wai; Corris, Paul A.; Calleja, Paul; Gissen, Paul; Bolton-Maggs, Paula; Rayner-Matthews, Paula; Ghataorhe, Pavandeep K.; Gordins, Pavel; Stein, Penelope; Collins, Peter; Dixon, Peter; Kelleher, Peter; Ancliff, Phil; Yu, Ping; Tait, R. Campbell; Linger, Rachel; Doffinger, Rainer; Machado, Rajiv; Kazmi, Rashid; Sargur, Ravishankar; Favier, Remi; Tan, Rhea; Liesner, Ri; Antrobus, Richard; Sandford, Richard; Scott, Richard; Trembath, Richard; Horvath, Rita; Hadden, Rob; MackenzieRoss, Rob V.; Henderson, Robert; MacLaren, Robert; James, Roger; Ghurye, Rohit; DaCosta, Rosa; Hague, Rosie; Mapeta, Rutendo; Armstrong, Ruth; Noorani, Sadia; Murng, Sai; Santra, Saikat; Tuna, Salih; Johnson, Sally; Chong, Sam; Lear, Sara; Walker, Sara; Goddard, Sarah; Mangles, Sarah; Westbury, Sarah; Mehta, Sarju; Hackett, Scott; Nejentsev, Sergey; Moledina, Shahin; Bibi, Shahnaz; Meehan, Sharon; Othman, Shokri; Revel-Vilk, Shoshana; Holden, Simon; McGowan, Simon; Staines, Simon; Savic, Sinisa; Burns, Siobhan; Grigoriadou, Sofia; Papadia, Sofia; Ashford, Sofie; Schulman, Sol; Ali, Sonia; Park, Soo-Mi; Davies, Sophie; Stock, Sophie; Ali, Souad; Deevi, Sri V. V.; Gräf, Stefan; Ghio, Stefano; Wort, Stephen J.; Jolles, Stephen; Austin, Steve; Welch, Steve; Meacham, Stuart; Rankin, Stuart; Walker, Suellen; Seneviratne, Suranjith; Holder, Susan; Sivapalaratnam, Suthesh; Richardson, Sylvia; Kuijpers, Taco; Bariana, Tadbir K.; Bakchoul, Tamam; Everington, Tamara; Renton, Tara; Young, Tim; Aitman, Timothy; Warner, Timothy Q.; Vale, Tom; Hammerton, Tracey; Pollock, Val; Matser, Vera; Cookson, Victoria; Clowes, Virginia; Qasim, Waseem; Wei, Wei; Erber, Wendy N.; Ouwehand, Willem H.; Astle, William; Egner, William; Turek, Wojciech; Henskens, Yvonne; Tan, Yvonne

    2018-01-01

    Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously

  3. Effective Normalization for Copy Number Variation Detection from Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2012-01-01

    Background Whole genome sequencing enables a high resolution view ofthe human genome and provides unique insights into genome structureat an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools while validatedalso include a number of

  4. Genomic Epidemiology: Whole-Genome-Sequencing–Powered Surveillance and Outbreak Investigation of Foodborne Bacterial Pathogens

    DEFF Research Database (Denmark)

    Deng, Xiangyu; den Bakker, Henk C.; Hendriksen, Rene S.

    2016-01-01

    -called next-generation sequencing (NGS) technologies that have made whole-genome sequencing (WGS) of foodborne bacterial pathogens a realistic and superior alternative to traditional subtyping methods. Routine, real-time, and widespread application of WGS in food safety and public health is on the horizon...

  5. Whole Genome Selection Project Involving 2,000 Industry AI Sires

    Science.gov (United States)

    Whole genome selection (WGS) uses markers spanning the genome to predict genetic merit for economically important traits. WGS may increase the rate of genetic progress through improved accuracy and reduced generation interval especially for traits that cannot be measured on breeding animals. In cont...

  6. The effect of rare alleles on estimated genomic relationships from whole genome sequence data

    NARCIS (Netherlands)

    Eynard, S.E.; Windig, J.J.; Leroy, G.; Binsbergen, van R.; Calus, M.P.L.

    2015-01-01

    Relationships between individuals and inbreeding coefficients are commonly used for breeding decisions, but may be affected by the type of data used for their estimation. The proportion of variants with low Minor Allele Frequency (MAF) is larger in whole genome sequence (WGS) data compared to Single

  7. Characterization of C. jejuni and C. coli broiler isolates by whole genome sequencing

    DEFF Research Database (Denmark)

    Cantero, G.; Correa-Fiz, F.; Ronco, Troels

    vast majority of infections, which may subsequently lead to serious neuropathologies such as Guillain-Barré syndrome. The aim of this study was to take advantage of whole genome sequencing (WGS) to in-depth characterize a subset of 16 C. jejuni and C. coli isolates from broilers from five farms....

  8. Whole-genome sequence of Bacillus solimangrovi GH 2-4T, isolated from mangrove soil.

    Science.gov (United States)

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-12-01

    Bacillus solimangrovi GH 2-4T was isolated from mangrove soil and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession MJEH00000000.

  9. Whole genome analysis of Klebsiella pneumoniae T2-1-1 from human oral cavity.

    Science.gov (United States)

    Chan, Kok-Gan; Yin, Wai-Fong; Chan, Xin-Yue

    2016-03-01

    Klebsiella pneumoniae T2-1-1 was isolated from the human tongue debris and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession JAQL00000000.

  10. Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle

    NARCIS (Netherlands)

    Binsbergen, van R.; Calus, M.P.L.; Bink, M.C.A.M.; Eeuwijk, van F.A.; Schrooten, C.; Veerkamp, R.F.

    2015-01-01

    Background In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to

  11. The use of whole genome sequence data to estimate genetic relationships including rare alleles information

    NARCIS (Netherlands)

    Eynard, S.E.; Windig, J.J.; Leroy, G.; Verrier, E.; Hiemstra, S.J.; Binsbergen, van R.; Calus, M.P.L.

    2014-01-01

    Whole genome sequencing technologies are rapidly developing. In some ways, the speed of this development has outstripped our capacity to use this type of data in selection strategies, especially in livestock diversity conservation. In this study, relationship matrices were computed for 118 Holstein

  12. Whole-Genome Sequence of the Spodoptera frugiperda Sf9 Insect Cell Line

    OpenAIRE

    Nandakumar, Subhiksha; Ma, Hailun; Khan, Arifa S

    2017-01-01

    ABSTRACT The draft whole-genome sequence of the Spodoptera frugiperda Sf9 insect cell line was obtained using long-read PacBio sequence technology and Canu assembly. The final assembled genome consisted of 451?Mbp in 4,577 contigs, with 12,716? mean coverage and a G+C content of 36.53%.

  13. Whole-genome sequence of aeromonas hydrophila strain AH-1 (Serotype O11)

    OpenAIRE

    Forn-Cun?, Gabriel; Tom?s, Juan M.; Merino, Susana

    2016-01-01

    Aeromonas?hydrophila is an emerging pathogen of aquatic and terrestrial animals, including humans. Here, we report the whole-genome sequence of the septicemic A.?hydrophila AH-1 strain, belonging to the serotype O11, and the first mesophilic Aeromonas with surface layer (S-layer) to be sequenced.

  14. The whole genome sequence assembly of the soybean aphid, Aphis glycines

    Science.gov (United States)

    Aphids are emerging as model organisms for both basic and applied research. Of the 5,000 estimated species, only two aphids have published whole genome sequences: the pea aphid Acyrthosiphon pisum, and the Russian wheat aphid, Diuraphis noxia. The soybean aphid (Aphis glycines) is an extreme special...

  15. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy

    NARCIS (Netherlands)

    Bouwman, A.C.; Veerkamp, R.F.

    2014-01-01

    The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken

  16. Whole-genome sequence of Bacillus solimangrovi GH 2-4T, isolated from mangrove soil

    OpenAIRE

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-01-01

    Bacillus solimangrovi GH 2-4T was isolated from mangrove soil and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession MJEH00000000.

  17. Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data

    National Research Council Canada - National Science Library

    Oesper, Layla; Satas, Gryte; Raphael, Benjamin J

    2014-01-01

    .... We describe an algorithm called THetA2 that infers the composition of a tumor sample-including not only tumor purity but also the number and content of tumor subpopulations-directly from both whole-genome (WGS) and whole-exome (WXS...

  18. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer

    NARCIS (Netherlands)

    Wang, Kai; Yuen, Siu Tsan; Xu, Jiangchun; Lee, Siu Po; Yan, Helen H N; Shi, Stephanie T; Siu, Hoi Cheong; Deng, Shibing; Chu, Kent Man; Law, Simon; Chan, Kok Hoe; Chan, Annie S Y; Tsui, Wai Yin; Ho, Siu Lun; Chan, Anthony K W; Man, Jonathan L K; Foglizzo, Valentina; Ng, Man Kin; Chan, April S; Ching, Yick Pang; Cheng, Grace H W; Xie, Tao; Fernandez, Julio; Li, Vivian S W; Clevers, Hans; Rejto, Paul A; Mao, Mao; Leung, Suet Yi

    Gastric cancer is a heterogeneous disease with diverse molecular and histological subtypes. We performed whole-genome sequencing in 100 tumor-normal pairs, along with DNA copy number, gene expression and methylation profiling, for integrative genomic analysis. We found subtype-specific genetic and

  19. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans)

    OpenAIRE

    Tran, Phuong N.; Tan, Nicholas E. H.; Lee, Yin Peng; Gan, Han Ming; Polter, Steven J.; Dailey, Lucas K.; Hudson, Andr? O.; Savka, Michael A.

    2015-01-01

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy.

  20. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans).

    Science.gov (United States)

    Tran, Phuong N; Tan, Nicholas E H; Lee, Yin Peng; Gan, Han Ming; Polter, Steven J; Dailey, Lucas K; Hudson, André O; Savka, Michael A

    2015-11-19

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy. Copyright © 2015 Tran et al.

  1. Whole-Genome Scans Provide Evidence of Adaptive Evolution in Malawian Plasmodium falciparum Isolates

    DEFF Research Database (Denmark)

    Ocholla, Harold; Preston, Mark D; Mipando, Mwapatsa

    2014-01-01

    BACKGROUND:  Selection by host immunity and antimalarial drugs has driven extensive adaptive evolution in Plasmodium falciparum and continues to produce ever-changing landscapes of genetic variation. METHODS:  We performed whole-genome sequencing of 69 P. falciparum isolates from Malawi and used...

  2. Animal selection for whole genome sequencing by quantifying the unique contribution of homozygous haplotypes sequenced

    Science.gov (United States)

    Major whole genome sequencing projects promise to identify rare and causal variants within livestock species; however, the efficient selection of animals for sequencing remains a major problem within these surveys. The goal of this project was to develop a library of high accuracy genetic variants f...

  3. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    DEFF Research Database (Denmark)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  4. Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

    Directory of Open Access Journals (Sweden)

    Jong-Sung Lim

    2012-03-01

    Full Text Available Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the next-generation DNA sequencer (NGS Roche/454 and Illumina/Solexa systems, along with bioinformation analysis technologies of whole-genome de novo assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing de novo assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least 2× and 30× depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive short-length reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a whole-genome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through de novo assembly in any whole-genome sequenced species. The 20× and 50× coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average 30× coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

  5. Comparative analysis of copy number detection by whole-genome BAC and oligonucleotide array CGH

    Directory of Open Access Journals (Sweden)

    Bejjani Bassem A

    2010-06-01

    Full Text Available Abstract Background Microarray-based comparative genomic hybridization (aCGH is a powerful diagnostic tool for the detection of DNA copy number gains and losses associated with chromosome abnormalities, many of which are below the resolution of conventional chromosome analysis. It has been presumed that whole-genome oligonucleotide (oligo arrays identify more clinically significant copy-number abnormalities than whole-genome bacterial artificial chromosome (BAC arrays, yet this has not been systematically studied in a clinical diagnostic setting. Results To determine the difference in detection rate between similarly designed BAC and oligo arrays, we developed whole-genome BAC and oligonucleotide microarrays and validated them in a side-by-side comparison of 466 consecutive clinical specimens submitted to our laboratory for aCGH. Of the 466 cases studied, 67 (14.3% had a copy-number imbalance of potential clinical significance detectable by the whole-genome BAC array, and 73 (15.6% had a copy-number imbalance of potential clinical significance detectable by the whole-genome oligo array. However, because both platforms identified copy number variants of unclear clinical significance, we designed a systematic method for the interpretation of copy number alterations and tested an additional 3,443 cases by BAC array and 3,096 cases by oligo array. Of those cases tested on the BAC array, 17.6% were found to have a copy-number abnormality of potential clinical significance, whereas the detection rate increased to 22.5% for the cases tested by oligo array. In addition, we validated the oligo array for detection of mosaicism and found that it could routinely detect mosaicism at levels of 30% and greater. Conclusions Although BAC arrays have faster turnaround times, the increased detection rate of oligo arrays makes them attractive for clinical cytogenetic testing.

  6. Canaries in the coal mine: Personal and professional impact of undergoing whole genome sequencing on medical professionals.

    Science.gov (United States)

    Zierhut, Heather; McCarthy Veach, Patricia; LeRoy, Bonnie

    2015-11-01

    Public interest in personal whole genome sequencing is increasing. The technology is publicly available and is being used as an educational tool in higher education. Empirical evidence regarding its utility is vital. The goals of this study were to characterize the process of whole genome sequencing in a population of medical and basic science professionals undergoing whole genome sequencing as a part of an educational symposium. Thirty-eight individuals completed one or more surveys from the time of informed consent for whole genome sequencing to 3 months post-symposium. The four surveys assessed demographics, decision-making, communication, decision regret, and personal and professional impact. The most prevalent motivation to participate was professional enhancement, followed by curiosity about the technology, and personal health benefits. The most important initial impact concerned medical implications. Over time, however, impact on professional development was greater than on personal health. Anticipated reactions to receiving whole genome sequencing results generally matched participants' actual reactions and decision regret remained low over time. Benefits and risks of whole genome sequencing included medically actionable results and misunderstanding by healthcare providers. Whole genome sequencing generally had a positive impact professionally and personally on participants. Further education of providers and the public about whole genome sequencing and psychosocial support is warranted. © 2015 Wiley Periodicals, Inc.

  7. Monodisperse Picoliter Droplets for Low-Bias and Contamination-Free Reactions in Single-Cell Whole Genome Amplification.

    Directory of Open Access Journals (Sweden)

    Yohei Nishikawa

    Full Text Available Whole genome amplification (WGA is essential for obtaining genome sequences from single bacterial cells because the quantity of template DNA contained in a single cell is very low. Multiple displacement amplification (MDA, using Phi29 DNA polymerase and random primers, is the most widely used method for single-cell WGA. However, single-cell MDA usually results in uneven genome coverage because of amplification bias, background amplification of contaminating DNA, and formation of chimeras by linking of non-contiguous chromosomal regions. Here, we present a novel MDA method, termed droplet MDA, that minimizes amplification bias and amplification of contaminants by using picoliter-sized droplets for compartmentalized WGA reactions. Extracted DNA fragments from a lysed cell in MDA mixture are divided into 105 droplets (67 pL within minutes via flow through simple microfluidic channels. Compartmentalized genome fragments can be individually amplified in these droplets without the risk of encounter with reagent-borne or environmental contaminants. Following quality assessment of WGA products from single Escherichia coli cells, we showed that droplet MDA minimized unexpected amplification and improved the percentage of genome recovery from 59% to 89%. Our results demonstrate that microfluidic-generated droplets show potential as an efficient tool for effective amplification of low-input DNA for single-cell genomics and greatly reduce the cost and labor investment required for determination of nearly complete genome sequences of uncultured bacteria from environmental samples.

  8. A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease.

    Science.gov (United States)

    Smedley, Damian; Schubach, Max; Jacobsen, Julius O B; Köhler, Sebastian; Zemojtel, Tomasz; Spielmann, Malte; Jäger, Marten; Hochheiser, Harry; Washington, Nicole L; McMurry, Julie A; Haendel, Melissa A; Mungall, Christopher J; Lewis, Suzanna E; Groza, Tudor; Valentini, Giorgio; Robinson, Peter N

    2016-09-01

    The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease. Copyright © 2016 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. Microfluidic screening and whole-genome sequencing identifies mutations associated with improved protein secretion by yeast

    DEFF Research Database (Denmark)

    Huang, Mingtao; Bai, Yunpeng; Sjostrom, Staffan L.

    2015-01-01

    interest in improving its protein secretion capacity. Due to the complexity of the secretory machinery in eukaryotic cells, it is difficult to apply rational engineering for construction of improved strains. Here we used high-throughput microfluidics for the screening of yeast libraries, generated by UV...... mutagenesis. Several screening and sorting rounds resulted in the selection of eight yeast clones with significantly improved secretion of recombinant a-amylase. Efficient secretion was genetically stable in the selected clones. We performed whole-genome sequencing of the eight clones and identified 330...... to construct efficient cell factories for protein secretion. The combined use of microfluidics screening and whole-genome sequencing to map the mutations associated with the improved phenotype can easily be adapted for other products and cell types to identify novel engineering targets, and this approach could...

  10. Diversity and Evolution of Mycobacterium tuberculosis: Moving to Whole-Genome-Based Approaches

    Science.gov (United States)

    Niemann, Stefan; Supply, Philip

    2014-01-01

    Genotyping of clinical Mycobacterium tuberculosis complex (MTBC) strains has become a standard tool for epidemiological tracing and for the investigation of the local and global strain population structure. Of special importance is the analysis of the expansion of multidrug (MDR) and extensively drug-resistant (XDR) strains. Classical genotyping and, more recently, whole-genome sequencing have revealed that the strains of the MTBC are more diverse than previously anticipated. Globally, several phylogenetic lineages can be distinguished whose geographical distribution is markedly variable. Strains of particular (sub)lineages, such as Beijing, seem to be more virulent and associated with enhanced resistance levels and fitness, likely fueling their spread in certain world regions. The upcoming generalization of whole-genome sequencing approaches will expectedly provide more comprehensive insights into the molecular and epidemiological mechanisms involved and lead to better diagnostic and therapeutic tools. PMID:25190252

  11. Tolerance of Whole-Genome Doubling Propagates Chromosomal Instability and Accelerates Cancer Genome Evolution

    DEFF Research Database (Denmark)

    Dewhurst, Sally M.; McGranahan, Nicholas; Burrell, Rebecca A.

    2014-01-01

    The contribution of whole-genome doubling to chromosomal instability (CIN) and tumor evolution is unclear. We use long-term culture of isogenic tetraploid cells from a stable diploid colon cancer progenitor to investigate how a genome-doubling event affects genome stability over time. Rare cells ...... [discovery data: hazard ratio (HR), 4.70, 95% confidence interval (CI), 1.04–21.37; validation data: HR, 1.59, 95% CI, 1.05–2.42]. These data highlight an important role for the tolerance of genome doubling in driving cancer genome evolution.......The contribution of whole-genome doubling to chromosomal instability (CIN) and tumor evolution is unclear. We use long-term culture of isogenic tetraploid cells from a stable diploid colon cancer progenitor to investigate how a genome-doubling event affects genome stability over time. Rare cells...

  12. Downsizing genomic medicine: approaching the ethical complexity of whole-genome sequencing by starting small.

    Science.gov (United States)

    Sharp, Richard R

    2011-03-01

    As we look to a time when whole-genome sequencing is integrated into patient care, it is possible to anticipate a number of ethical challenges that will need to be addressed. The most intractable of these concern informed consent and the responsible management of very large amounts of genetic information. Given the range of possible findings, it remains unclear to what extent it will be possible to obtain meaningful patient consent to genomic testing. Equally unclear is how clinicians will disseminate the enormous volume of genetic information produced by whole-genome sequencing. Toward developing practical strategies for managing these ethical challenges, we propose a research agenda that approaches multiplexed forms of clinical genetic testing as natural laboratories in which to develop best practices for managing the ethical complexities of genomic medicine.

  13. Whole genome multilocus sequence typing as an epidemiologic tool for Yersinia pestis.

    Science.gov (United States)

    Kingry, Luke C; Rowe, Lori A; Respicio-Kingry, Laurel B; Beard, Charles B; Schriefer, Martin E; Petersen, Jeannine M

    2016-04-01

    Human plague is a severe and often fatal zoonotic disease caused by Yersinia pestis. For public health investigations of human cases, nonintensive whole genome molecular typing tools, capable of defining epidemiologic relationships, are advantageous. Whole genome multilocus sequence typing (wgMLST) is a recently developed methodology that simplifies genomic analyses by transforming millions of base pairs of sequence into character data for each gene. We sequenced 13 US Y. pestis isolates with known epidemiologic relationships. Sequences were assembled de novo, and multilocus sequence typing alleles were assigned by comparison against 3979 open reading frames from the reference strain CO92. Allele-based cluster analysis accurately grouped the 13 isolates, as well as 9 publicly available Y. pestis isolates, by their epidemiologic relationships. Our findings indicate wgMLST is a simplified, sensitive, and scalable tool for epidemiologic analysis of Y. pestis strains. Published by Elsevier Inc.

  14. Whole genome sequence of Enterobacter ludwigii type strain EN-119T, isolated from clinical specimens.

    Science.gov (United States)

    Li, Gengmi; Hu, Zonghai; Zeng, Ping; Zhu, Bing; Wu, Lijuan

    2015-04-01

    Enterobacter ludwigii strain EN-119(T) is the type strain of E. ludwigii, which belongs to the E. cloacae complex (Ecc). This strain was first reported and nominated in 2005 and later been found in many hospitals. In this paper, the whole genome sequencing of this strain was carried out. The total genome size of EN-119(T) is 4952,770 bp with 4578 coding sequences, 88 tRNAs and 10 rRNAs. The genome sequence of EN-119(T) is the first whole genome sequence of E. ludwigii, which will further our understanding of Ecc. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Efficiency of methylated DNA immunoprecipitation bisulphite sequencing for whole-genome DNA methylation analysis.

    Science.gov (United States)

    Jeong, Hae Min; Lee, Sangseon; Chae, Heejoon; Kim, RyongNam; Kwon, Mi Jeong; Oh, Ensel; Choi, Yoon-La; Kim, Sun; Shin, Young Kee

    2016-08-01

    We compared four common methods for measuring DNA methylation levels and recommended the most efficient method in terms of cost and coverage. The DNA methylation status of liver and stomach tissues was profiled using four different methods, whole-genome bisulphite sequencing (WG-BS), targeted bisulphite sequencing (Targeted-BS), methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA immunoprecipitation bisulphite sequencing (MeDIP-BS). We calculated DNA methylation levels using each method and compared the results. MeDIP-BS yielded the most similar DNA methylation profile to WG-BS, with 20 times less data, suggesting remarkable cost savings and coverage efficiency compared with the other methods. MeDIP-BS is a practical cost-effective method for analyzing whole-genome DNA methylation that is highly accurate at base-pair resolution.

  16. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing

    DEFF Research Database (Denmark)

    Li, Ying-hui; Zhao, Shan-cen; Ma, Jian-xin

    2013-01-01

    BACKGROUND:Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re......-sequencing accessions, which represent wild, domesticated landrace, and Chinese elite soybean populations were analyzed.RESULTS:A total of 5,102,244 single nucleotide polymorphisms (SNPs) and 707,969 insertion/deletions were identified. Among the SNPs detected, 25.5% were not described previously. We found...... that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars. Only a small proportion (2.99%) of the whole genomic regions appear to be affected by artificial selection for preferred agricultural traits...

  17. Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors.

    Science.gov (United States)

    MacLeod, Iona M; Larkin, Denis M; Lewin, Harris A; Hayes, Ben J; Goddard, Mike E

    2013-09-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493-496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.

  18. Microarray-based whole-genome hybridization as a tool for determining procaryotic species relatedness

    Energy Technology Data Exchange (ETDEWEB)

    Wu, L.; Liu, X.; Fields, M.W.; Thompson, D.K.; Bagwell, C.E.; Tiedje, J. M.; Hazen, T.C.; Zhou, J.

    2008-01-15

    The definition and delineation of microbial species are of great importance and challenge due to the extent of evolution and diversity. Whole-genome DNA-DNA hybridization is the cornerstone for defining procaryotic species relatedness, but obtaining pairwise DNA-DNA reassociation values for a comprehensive phylogenetic analysis of procaryotes is tedious and time consuming. A previously described microarray format containing whole-genomic DNA (the community genome array or CGA) was rigorously evaluated as a high-throughput alternative to the traditional DNA-DNA reassociation approach for delineating procaryotic species relationships. DNA similarities for multiple bacterial strains obtained with the CGA-based hybridization were comparable to those obtained with various traditional whole-genome hybridization methods (r=0.87, P<0.01). Significant linear relationships were also observed between the CGA-based genome similarities and those derived from small subunit (SSU) rRNA gene sequences (r=0.79, P<0.0001), gyrB sequences (r=0.95, P<0.0001) or REP- and BOX-PCR fingerprinting profiles (r=0.82, P<0.0001). The CGA hybridization-revealed species relationships in several representative genera, including Pseudomonas, Azoarcus and Shewanella, were largely congruent with previous classifications based on various conventional whole-genome DNA-DNA reassociation, SSU rRNA and/or gyrB analyses. These results suggest that CGA-based DNA-DNA hybridization could serve as a powerful, high-throughput format for determining species relatedness among microorganisms.

  19. Whole-genome shotgun sequence of phenazine-producing endophytic Streptomyces kebangsaanensis SUK12

    OpenAIRE

    Juwairiah Remali; Kok-Keong Loke; Chyan Leong Ng; Wan Mohd Aizat; John Tiong; Noraziah Mohamad Zin

    2017-01-01

    Streptomyces sp. produces bioactive compounds with a broad spectrum of activities. Streptomyces kebangsaanesis SUK12 has been identified as a novel endophytic bacteria isolated from ethnomedicinal plant Portulaca olerace, and was found to produce the phenazine class of biologically active antimicrobial metabolites. The potential use of the phenazines has led to our research interest in determining the genome sequence of Streptomyces kebangsaanensis SUK12. This Whole Genome Shotgun project has...

  20. Whole-Genome Analysis in Korean Patients with Autoimmune Myasthenia Gravis

    OpenAIRE

    Na, Sang-Jun; Lee, Ji Hyun; Kim, So Won; Kim, Dae-Seong; Shon, Eun Hee; Park, Hyung Jun; Shin, Ha Young; Kim, Seung Min; Choi, Young-Chul

    2014-01-01

    Purpose The underlying cause of myasthenia gravis (MG) is unknown, although it likely involves a genetic component. However, no common genetic variants have been unequivocally linked to autoimmune MG. We sought to identify the genetic variants associated with an increased or decreased risk of developing MG in samples from a Korean Multicenter MG Cohort. Materials and Methods To determine new genetic targets related to autoimmune MG, a whole genome-based single nucleotide polymorphisms (SNP) a...

  1. Demographic history and biologically relevant genetic variation of Native Mexicans inferred from whole-genome sequencing

    OpenAIRE

    Romero-Hidalgo, Sandra; Ochoa-Leyva, Adrián; Garcíarrubio, Alejandro; Acuña-Alonzo, Victor; Antúnez-Argüelles, Erika; Balcazar-Quintero, Martha; Barquera-Lozano, Rodrigo; Carnevale, Alessandra; Cornejo-Granados, Fernanda; Fernández-López, Juan Carlos; García-Herrera, Rodrigo; García-Ortíz, Humberto; Granados-Silvestre, Ángeles; Granados, Julio; Guerrero-Romero, Fernando

    2017-01-01

    Understanding the genetic structure of Native American populations is important to clarify their diversity, demographic history, and to identify genetic factors relevant for biomedical traits. Here, we show a demographic history reconstruction from 12 Native American whole genomes belonging to six distinct ethnic groups representing the three main described genetic clusters of Mexico (Northern, Southern, and Maya). Effective population size estimates of all Native American groups remained bel...

  2. High Depth, Whole-Genome Sequencing of Cholera Isolates from Haiti and the Dominican Republic

    Science.gov (United States)

    2012-09-11

    glycerol at −80 degrees C. Illumina-based whole genome sequencing We extracted DNA from V. cholerae strains using QiagenDNEasy (Qiagen, Valencia, CA...distinct DNA mismatch repair proteins, and two mutations in two outer membrane proteins, OmpV and OmpH. In order to identify purifying or positive...spontaneously passed human stool samples of patients with a diagnosis of cholera. All patients received standard medical treatment for cholera

  3. ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis

    OpenAIRE

    Riaz, Tiayyba; Shehzad, Wasim; Viari, Alain; Pompanon, Fran?ois; Taberlet, Pierre; Coissac, Eric

    2011-01-01

    Using non-conventional markers, DNA metabarcoding allows biodiversity assessment from complex substrates. In this article, we present ecoPrimers, a software for identifying new barcode markers and their associated PCR primers. ecoPrimers scans whole genomes to find such markers without a priori knowledge. ecoPrimers optimizes two quality indices measuring taxonomical range and discrimination to select the most efficient markers from a set of reference sequences, according to specific experime...

  4. Attitudes of African Americans toward Return of Results from Exome and Whole Genome Sequencing

    OpenAIRE

    Yu, Joon-Ho; Crouch, Julia; Jamal, Seema M.; Holly K Tabor; Bamshad, Michael J.

    2013-01-01

    Exome sequencing and whole genome sequencing (ES/WGS) present patients and research participants with the opportunity to benefit from a broad scope of genetic results of clinical and personal utility. Yet, this potential for benefit also risks disenfranchising populations such as African Americans (AAs) that are already underrepresented in genetic research and utilize genetic tests at lower rates than other populations. Understanding a diverse range of perspectives on consenting for ES/WGS an...

  5. Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors

    Science.gov (United States)

    MacLeod, Iona M.; Larkin, Denis M.; Lewin, Harris A.; Hayes, Ben J.; Goddard, Mike E.

    2013-01-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals. PMID:23842528

  6. A Randomization Test for Controlling Population Stratification in Whole-Genome Association Studies

    OpenAIRE

    Kimmel, Gad; Jordan, Michael I.; Halperin, Eran; Shamir, Ron; Karp, Richard M.

    2007-01-01

    Population stratification can be a serious obstacle in the analysis of genomewide association studies. We propose a method for evaluating the significance of association scores in whole-genome cohorts with stratification. Our approach is a randomization test akin to a standard permutation test. It conditions on the genotype matrix and thus takes into account not only the population structure but also the complex linkage disequilibrium structure of the genome. As we show in simulation experime...

  7. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing

    OpenAIRE

    Helman, Elena; Lawrence, Michael S.; Stewart, Chip; Sougnez, Carrie; Getz, Gad; Meyerson, Matthew

    2014-01-01

    Retrotransposons constitute a major source of genetic variation, and somatic retrotransposon insertions have been reported in cancer. Here, we applied TranspoSeq, a computational framework that identifies retrotransposon insertions from sequencing data, to whole genomes from 200 tumor/normal pairs across 11 tumor types as part of The Cancer Genome Atlas (TCGA) Pan-Cancer Project. In addition to novel germline polymorphisms, we find 810 somatic retrotransposon insertions primarily in lung squa...

  8. Comparative performance of two whole-genome capture methodologies on ancient DNA Illumina libraries

    OpenAIRE

    Ávila-Arcos María C; Sandoval-Velasco Marcela; Schroeder Hannes; Carpenter Meredith L.; Malaspinas Anna-Sapfo; Wales Nathan; Peñaloza Fernando; Bustamante Carlos D.; Gilbert M. Thomas P.

    2015-01-01

    Application of whole genome capture (WGC) methods to ancient DNA (aDNA) promises to increase efficiency of ancient genome sequencing. We compared the performance of two recent WGC methods in enriching human aDNA within Illumina libraries built using both double stranded and single stranded build protocols. Although both methods effectively enriched aDNA we observed consistent differences between the methods providing the opportunity to further explore parameters influencing WGC experiments. ...

  9. Real-Time Whole-Genome Sequencing for Surveillance of Listeria monocytogenes, France

    OpenAIRE

    Moura, Alexandra; Tourdjman, Mathieu; Leclercq, Alexandre; Hamelin, Estelle; Laurent, Edith; Fredriksen, Nathalie; Van Cauteren, Dieter; Bracq-Dieye, H?l?ne; Thouvenot, Pierre; Vales, Guillaume; Tessaud-Rita, Nathalie; Maury, Myl?ne M.; Alexandru, Andreea; Criscuolo, Alexis; Quevillon, Emmanuel

    2017-01-01

    During 2015?2016, we evaluated the performance of whole-genome sequencing (WGS) as a routine typing tool. Its added value for microbiological and epidemiologic surveillance of listeriosis was compared with that for pulsed-field gel electrophoresis (PFGE), the current standard method. A total of 2,743 Listeria monocytogenes isolates collected as part of routine surveillance were characterized in parallel by PFGE and core genome multilocus sequence typing (cgMLST) extracted from WGS. We investi...

  10. Prospective Whole-Genome Sequencing Enhances National Surveillance of Listeria monocytogenes

    OpenAIRE

    Kwong, Jason C.; Mercoulia, Karolina; Tomita, Takehiro; Easton, Marion; Li, Hua Y.; Bulach, Dieter M.; Stinear, Timothy P.; Seemann, Torsten; Howden, Benjamin P.

    2016-01-01

    Whole-genome sequencing (WGS) has emerged as a powerful tool for comparing bacterial isolates in outbreak detection and investigation. Here we demonstrate that WGS performed prospectively for national epidemiologic surveillance of Listeria monocytogenes has the capacity to be superior to our current approaches using pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable-number tandem-repeat analysis (MLVA), binary typing, and serotyping. Initially 423 ...

  11. Rapid Identification of Potential Drugs for Diabetic Nephropathy Using Whole-Genome Expression Profiles of Glomeruli

    Directory of Open Access Journals (Sweden)

    Jingsong Shi

    2016-01-01

    Full Text Available Objective. To investigate potential drugs for diabetic nephropathy (DN using whole-genome expression profiles and the Connectivity Map (CMAP. Methodology. Eighteen Chinese Han DN patients and six normal controls were included in this study. Whole-genome expression profiles of microdissected glomeruli were measured using the Affymetrix human U133 plus 2.0 chip. Differentially expressed genes (DEGs between late stage and early stage DN samples and the CMAP database were used to identify potential drugs for DN using bioinformatics methods. Results. (1 A total of 1065 DEGs (FDR 1.5 were found in late stage DN patients compared with early stage DN patients. (2 Piperlongumine, 15d-PGJ2 (15-delta prostaglandin J2, vorinostat, and trichostatin A were predicted to be the most promising potential drugs for DN, acting as NF-κB inhibitors, histone deacetylase inhibitors (HDACIs, PI3K pathway inhibitors, or PPARγ agonists, respectively. Conclusion. Using whole-genome expression profiles and the CMAP database, we rapidly predicted potential DN drugs, and therapeutic potential was confirmed by previously published studies. Animal experiments and clinical trials are needed to confirm both the safety and efficacy of these drugs in the treatment of DN.

  12. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    Science.gov (United States)

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-07-07

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. Copyright © 2016 Teng et al.

  13. Whole-genome sequencing and analysis of the Malaysian cynomolgus macaque (Macaca fascicularis) genome.

    Science.gov (United States)

    Higashino, Atsunori; Sakate, Ryuichi; Kameoka, Yosuke; Takahashi, Ichiro; Hirata, Makoto; Tanuma, Reiko; Masui, Tohru; Yasutomi, Yasuhiro; Osada, Naoki

    2012-07-02

    The genetic background of the cynomolgus macaque (Macaca fascicularis) is made complex by the high genetic diversity, population structure, and gene introgression from the closely related rhesus macaque (Macaca mulatta). Herein we report the whole-genome sequence of a Malaysian cynomolgus macaque male with more than 40-fold coverage, which was determined using a resequencing method based on the Indian rhesus macaque genome. We identified approximately 9.7 million single nucleotide variants (SNVs) between the Malaysian cynomolgus and the Indian rhesus macaque genomes. Compared with humans, a smaller nonsynonymous/synonymous SNV ratio in the cynomolgus macaque suggests more effective removal of slightly deleterious mutations. Comparison of two cynomolgus (Malaysian and Vietnamese) and two rhesus (Indian and Chinese) macaque genomes, including previously published macaque genomes, suggests that Indochinese cynomolgus macaques have been more affected by gene introgression from rhesus macaques. We further identified 60 nonsynonymous SNVs that completely differentiated the cynomolgus and rhesus macaque genomes, and that could be important candidate variants for determining species-specific responses to drugs and pathogens. The demographic inference using the genome sequence data revealed that Malaysian cynomolgus macaques have experienced at least three population bottlenecks. This list of whole-genome SNVs will be useful for many future applications, such as an array-based genotyping system for macaque individuals. High-quality whole-genome sequencing of the cynomolgus macaque genome may aid studies on finding genetic differences that are responsible for phenotypic diversity in macaques and may help control genetic backgrounds among individuals.

  14. Targeted analysis of whole genome sequence data to diagnose genetic cardiomyopathy.

    Science.gov (United States)

    Golbus, Jessica R; Puckelwartz, Megan J; Dellefave-Castillo, Lisa; Fahrenbach, John P; Nelakuditi, Viswateja; Pesce, Lorenzo L; Pytel, Peter; McNally, Elizabeth M

    2014-12-01

    Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of >50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift toward comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused on 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1 to 14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and segregation analysis, where available. Three of 3 previously identified primary mutations were detected by this analysis. In 6 subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and had additional pathological correlation to provide evidence for causality. For 2 subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. These pilot data demonstrate that ≈30 to 40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes. © 2014 American Heart Association, Inc.

  15. Personalized oncogenomics: clinical experience with malignant peritoneal mesothelioma using whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Brandon S Sheffield

    Full Text Available Peritoneal mesothelioma is a rare and sometimes lethal malignancy that presents a clinical challenge for both diagnosis and management. Recent studies have led to a better understanding of the molecular biology of peritoneal mesothelioma. Translation of the emerging data into better treatments and outcome is needed. From two patients with peritoneal mesothelioma, we derived whole genome sequences, RNA expression profiles, and targeted deep sequencing data. Molecular data were made available for translation into a clinical treatment plan. Treatment responses and outcomes were later examined in the context of molecular findings. Molecular studies presented here provide the first reported whole genome sequences of peritoneal mesothelioma. Mutations in known mesothelioma-related genes NF2, CDKN2A, LATS2, amongst others, were identified. Activation of MET-related signaling pathways was demonstrated in both cases. A hypermutated phenotype was observed in one case (434 vs. 18 single nucleotide variants and was associated with a favourable outcome despite sarcomatoid histology and multifocal disease. This study represents the first report of whole genome analyses of peritoneal mesothelioma, a key step in the understanding and treatment of this disease.

  16. Selective Whole-Genome Amplification Is a Robust Method That Enables Scalable Whole-Genome Sequencing of Plasmodium vivax from Unprocessed Clinical Samples

    Directory of Open Access Journals (Sweden)

    Annie N. Cowell

    2017-02-01

    Full Text Available Whole-genome sequencing (WGS of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA, which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax, the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens.

  17. Target-triggered catalytic hairpin assembly and TdT-catalyzed DNA polymerization for amplified electronic detection of thrombin in human serums.

    Science.gov (United States)

    Shi, Kai; Dou, Baoting; Yang, Jianmei; Yuan, Ruo; Xiang, Yun

    2017-01-15

    Specific and sensitive detection of protein biomarkers is of great importance in biomedical and bioanalytical applications. In this work, a dual amplified signal enhancement approach based on the integration of catalytic hairpin assembly (CHA) and terminal deoxynucleotidyl transferase (TdT)-mediated in situ DNA polymerization has been developed for highly sensitive and label-free electrochemical detection of thrombin in human serums. The presence of the target thrombin leads to the unfolding and capture of a significant number of hairpin signal probes with free 3'-OH termini on the sensor electrode. Subsequently, TdT can catalyze the elongation of the signal probes and formation of many G-quadruplex sequence replicates with the presence of dGTP and dATP at a molar ratio of 6:4. These G-quadruplex sequences bind hemin and generate drastically amplified current response for sensitive detection of thrombin in a completely label-free fashion. The sensor shows a linear range of 0.5pM-10.0nM and a detection limit of 0.12pM for thrombin. Moreover, the developed sensor can selectively discriminate the target thrombin against other non-target proteins and can be employed to monitor thrombin in human serum samples. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee

    DEFF Research Database (Denmark)

    Ellington, M J; Ekelund, O; Aarestrup, Frank Møller

    2017-01-01

    Whole genome sequencing (WGS) offers the potential to predict antimicrobial susceptibility from a single assay. The European Committee on Antimicrobial Susceptibility Testing established a subcommittee to review the current development status of WGS for bacterial antimicrobial susceptibility...

  19. Novel Degenerate PCR Method for Whole-Genome Amplification Applied to Peru Margin (ODP Leg 201) Subsurface Samples

    National Research Council Canada - National Science Library

    Martino, Amanda J; Rhodes, Matthew E; Biddle, Jennifer F; Brandt, Leah D; Tomsho, Lynn P; House, Christopher H

    2012-01-01

    A degenerate polymerase chain reaction (PCR)-based method of whole-genome amplification, designed to work fluidly with 454 sequencing technology, was developed and tested for use on deep marine subsurface DNA samples...

  20. Algorithms to Model Single Gene, Single Chromosome, and Whole Genome Copy Number Changes Jointly in Tumor Phylogenetics: e1003740

    National Research Council Canada - National Science Library

    Salim Akhter Chowdhury; Stanley E Shackney; Kerstin Heselmeyer-Haddad; Thomas Ried; Alejandro A Schäffer; Russell Schwartz

    2014-01-01

      We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome...

  1. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics

    National Research Council Canada - National Science Library

    Chowdhury, Salim Akhter; Shackney, Stanley E; Heselmeyer-Haddad, Kerstin; Ried, Thomas; Schäffer, Alejandro A; Schwartz, Russell

    2014-01-01

    We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome...

  2. Evidence and evolutionary analysis of ancient whole-genome duplication in barley predating the divergence from rice

    OpenAIRE

    Grosse Ivo; Waugh Robbie; Graner Andreas; Thiel Thomas; Close Timothy J; Stein Nils

    2009-01-01

    Abstract Background Well preserved genomic colinearity among agronomically important grass species such as rice, maize, Sorghum, wheat and barley provides access to whole-genome structure information even in species lacking a reference genome sequence. We investigated footprints of whole-genome duplication (WGD) in barley that shaped the cereal ancestor genome by analyzing shared synteny with rice using a ~2000 gene-based barley genetic map and the rice genome reference sequence. Results Base...

  3. Whole-Genome Sequencing Allows for Improved Identification of Persistent Listeria monocytogenes in Food-Associated Environments

    OpenAIRE

    Stasiewicz, Matthew J.; Oliver, Haley F.; Wiedmann, Martin; den Bakker, Henk C.

    2015-01-01

    While the food-borne pathogen Listeria monocytogenes can persist in food associated environments, there are no whole-genome sequence (WGS) based methods to differentiate persistent from sporadic strains. Whole-genome sequencing of 188 isolates from a longitudinal study of L. monocytogenes in retail delis was used to (i) apply single-nucleotide polymorphism (SNP)-based phylogenetics for subtyping of L. monocytogenes, (ii) use SNP counts to differentiate persistent from repeatedly reintroduced ...

  4. Novel Degenerate PCR Method for Whole-Genome Amplification Applied to Peru Margin (ODP Leg 201) Subsurface Samples.

    Science.gov (United States)

    Martino, Amanda J; Rhodes, Matthew E; Biddle, Jennifer F; Brandt, Leah D; Tomsho, Lynn P; House, Christopher H

    2012-01-01

    A degenerate polymerase chain reaction (PCR)-based method of whole-genome amplification, designed to work fluidly with 454 sequencing technology, was developed and tested for use on deep marine subsurface DNA samples. While optimized here for use with Roche 454 technology, the general framework presented may be applicable to other next generation sequencing systems as well (e.g., Illumina, Ion Torrent). The method, which we have called random amplification metagenomic PCR (RAMP), involves the use of specific primers from Roche 454 amplicon sequencing, modified by the addition of a degenerate region at the 3' end. It utilizes a PCR reaction, which resulted in no amplification from blanks, even after 50 cycles of PCR. After efforts to optimize experimental conditions, the method was tested with DNA extracted from cultured E. coli cells, and genome coverage was estimated after sequencing on three different occasions. Coverage did not vary greatly with the different experimental conditions tested, and was around 62% with a sequencing effort equivalent to a theoretical genome coverage of 14.10×. The GC content of the sequenced amplification product was within 2% of the predicted values for this strain of E. coli. The method was also applied to DNA extracted from marine subsurface samples from ODP Leg 201 site 1229 (Peru Margin), and results of a taxonomic analysis revealed microbial communities dominated by Proteobacteria, Chloroflexi, Firmicutes, Euryarchaeota, and Crenarchaeota, among others. These results were similar to those obtained previously for those samples; however, variations in the proportions of taxa identified illustrates well the generally accepted view that community analysis is sensitive to both the amplification technique used and the method of assigning sequences to taxonomic groups. Overall, we find that RAMP represents a valid methodology for amplifying metagenomes from low-biomass samples.

  5. Novel degenerate PCR method for whole genome amplification applied to Peru Margin (ODP Leg 201 subsurface samples

    Directory of Open Access Journals (Sweden)

    Amanda eMartino

    2012-01-01

    Full Text Available A degenerate PCR-based method of whole-genome amplification, designed to work fluidly with 454 sequencing technology, was developed and tested for use on deep marine subsurface DNA samples. The method, which we have called Random Amplification Metagenomic PCR (RAMP, involves the use of specific primers from Roche 454 amplicon sequencing, modified by the addition of a degenerate region at the 3’ end. It utilizes a PCR reaction, which resulted in no amplification from blanks, even after 50 cycles of PCR. After efforts to optimize experimental conditions, the method was tested with DNA extracted from cultured E. coli cells, and genome coverage was estimated after sequencing on three different occasions. Coverage did not vary greatly with the different experimental conditions tested, and was around 62% with a sequencing effort equivalent to a theoretical genome coverage of 14.10X. The GC content of the sequenced amplification product was within 2% of the predicted values for this strain of E. coli. The method was also applied to DNA extracted from marine subsurface samples from ODP Leg 201 site 1229 (Peru Margin, and results of a taxonomic analysis revealed microbial communities dominated by Proteobacteria, Chloroflexi, Firmicutes, Euryarchaeota, and Crenarchaeota, among others. These results were similar to those obtained previously for those samples; however, variations in the proportions of taxa show that community analysis can be sensitive to both the amplification technique used and the method of assigning sequences to taxonomic groups. Overall, we find that RAMP represents a valid methodology for amplifying metagenomes from low biomass samples.

  6. Taxonomic revision of Harveyi clade bacteria (family Vibrionaceae) based on analysis of whole genome sequences.

    Science.gov (United States)

    Urbanczyk, Henryk; Ogura, Yoshitoshi; Hayashi, Tetsuya

    2013-07-01

    Use of inadequate methods for classification of bacteria in the so-called Harveyi clade (family Vibrionaceae, Gammaproteobacteria) has led to incorrect assignment of strains and proliferation of synonymous species. In order to resolve taxonomic ambiguities within the Harveyi clade and to test usefulness of whole genome sequence data for classification of Vibrionaceae, draft genome sequences of 12 strains were determined and analysed. The sequencing included type strains of seven species: Vibrio sagamiensis NBRC 104589(T), Vibrio azureus NBRC 104587(T), Vibrio harveyi NBRC 15634(T), Vibrio rotiferianus LMG 21460(T), Vibrio campbellii NBRC 15631(T), Vibrio jasicida LMG 25398(T), and Vibrio owensii LMG 25443(T). Draft genome sequences of strain LMG 25430, previously designated the type strain of [Vibrio communis], and two strains (MWB 21 and 090810c) from the 'beijerinckii' lineage were also determined. Whole genomes of two additional strains (ATCC 25919 and 200612B) that previously could not be assigned to any Harveyi clade species were also sequenced. Analysis of the genome sequence data revealed a clear case of synonymy between V. owensii and [V. communis], confirming an earlier proposal to synonymize both species. Both strains from the 'beijerinckii' lineage were classified as V. jasicida, while the strains ATCC 25919 and 200612B were classified as V. owensii and V. campbellii, respectively. We also found that two strains, AND4 and Ex25, are closely related to Harveyi clade bacteria, but could not be assigned to any species of the family Vibrionaceae. The use of whole genome sequence data for the taxonomic classification of the Harveyi clade bacteria and other members of the family Vibrionaceae is also discussed.

  7. Whole genome sequence study of cannabis dependence in two independent cohorts.

    Science.gov (United States)

    Gizer, Ian R; Bizon, Chris; Gilder, David A; Ehlers, Cindy L; Wilhelmsen, Kirk C

    2017-01-23

    Recent advances in genome wide sequencing techniques and analytical methods allow for more comprehensive examinations of the genome than microarray-based genome-wide association studies (GWAS). The present report provides the first application of whole genome sequencing (WGS) to identify low frequency variants involved in cannabis dependence across two independent cohorts. The present study used low-coverage whole genome sequence data to conduct set-based association and enrichment analyses of low frequency variation in protein-coding regions as well as regulatory regions in relation to cannabis dependence. Two cohorts were studied: a population-based Native American tribal community consisting of 697 participants nested within large multi-generational pedigrees and a family-based sample of 1832 predominantly European ancestry participants largely nested within nuclear families. Participants in both samples were assessed for Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV) lifetime cannabis dependence, with 168 and 241 participants receiving a positive diagnosis in each sample, respectively. Sequence kernel association tests identified one protein-coding region, C1orf110 and one regulatory region in the MEF2B gene that achieved significance in a meta-analysis of both samples. A regulatory region within the PCCB gene, a gene previously associated with schizophrenia, exhibited a suggestive association. Finally, a significant enrichment of regions within or near genes with multiple splice variants or involved in cell adhesion or potassium channel activity were associated with cannabis dependence. This initial study demonstrates the potential utility of low pass whole genome sequencing for identifying genetic variants involved in the etiology of cannabis use disorders. © 2017 Society for the Study of Addiction.

  8. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma.

    Science.gov (United States)

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-02-04

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling.

    Science.gov (United States)

    Meinel, Thomas; Krause, Antje

    2012-01-01

    In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.

  10. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus.

    Directory of Open Access Journals (Sweden)

    Elizabeth M Driebe

    Full Text Available Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss.

  11. Whole-genome resequencing of 100 healthy individuals using DNA pooling.

    Science.gov (United States)

    Wang, Xiaobin; Sui, Weiguo; Wu, Weiqing; Hou, Xianliang; Ou, Minglin; Xiang, Yueying; Dai, Yong

    2016-11-01

    With the advent of next-generation sequencing technology, the cost of sequencing has significantly decreased. However, sequencing costs remain high for large-scale studies. In the present study, DNA pooling was applied as a cost-effective strategy for sequencing. The sequencing results for 100 healthy individuals obtained via whole-genome resequencing and using DNA pooling are presented in the present study. In order to minimise the likelihood of systematic bias in sampling, paired-end libraries with an insert size of 500 bp were prepared for all samples and then subjected to whole-genome sequencing using four lanes for each library and resulting in at least a 30-fold haploid coverage for each sample. The NCBI human genome build37 (hg19) was used as a reference genome for the present study and the short reads were aligned to the reference genome achieving 99.84% coverage. In addition, the average sequencing depth was 32.76. In total, ~3 million single-nucleotide polymorphisms were identified, of which 99.88% were in the NCBI dbSNP database. Furthermore, ~600,000 small insertion/deletions, 500,000 structure variants, 5,000 copy number variations and 13,000 single nucleotide variants were identified. According to the present study, the whole genome has been sequenced for a small sample subjects from southern China for the first time. Furthermore, new variation sites were identified by comparing with the reference sequence, and new knowledge of the human genome variation was added to the human genomic databases. Furthermore, the particular distribution regions of variation were illustrated by analyzing various sites of variation, such as single-nucleotide polymorphisms.

  12. P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool.

    Science.gov (United States)

    Peng, Shaoliang; Yang, Shunyun; Gao, Ming; Liao, Xiangke; Liu, Jie; Yang, Canqun; Wu, Chengkun; Yu, Wenqiang

    2017-03-14

    The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).

  13. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    Energy Technology Data Exchange (ETDEWEB)

    Fröhlich, Eleonore, E-mail: eleonore.froehlich@medunigraz.at [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Meindl, Claudia; Wagner, Karin [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Leitinger, Gerd [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Institute for Cell Biology, Histology and Embryology, Medical University of Graz, Harrachgasse 21, 8010 Graz (Austria); Roblegg, Eva [Institute of Pharmaceutical Sciences, Department of Pharmaceutical Technology, Karl-Franzens-University of Graz, Universitätsplatz 1, 8010 Graz (Austria)

    2014-10-15

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays for NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. - Highlights: • Regulated functions were screened using whole genome expression assays. • Polystyrene particles regulated more genes than short carbon nanotubes. • Protein coating of polystyrene particles did not change regulation pattern. • Functions regulated by microarray were confirmed by cell-based assay.

  14. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak.

    Science.gov (United States)

    Gardy, Jennifer L; Johnston, James C; Ho Sui, Shannan J; Cook, Victoria J; Shah, Lena; Brodkin, Elizabeth; Rempel, Shirley; Moore, Richard; Zhao, Yongjun; Holt, Robert; Varhol, Richard; Birol, Inanc; Lem, Marcus; Sharma, Meenu K; Elwood, Kevin; Jones, Steven J M; Brinkman, Fiona S L; Brunham, Robert C; Tang, Patrick

    2011-02-24

    An outbreak of tuberculosis occurred over a 3-year period in a medium-size community in British Columbia, Canada. The results of mycobacterial interspersed repetitive unit-variable-number tandem-repeat (MIRU-VNTR) genotyping suggested the outbreak was clonal. Traditional contact tracing did not identify a source. We used whole-genome sequencing and social-network analysis in an effort to describe the outbreak dynamics at a higher resolution. We sequenced the complete genomes of 32 Mycobacterium tuberculosis outbreak isolates and 4 historical isolates (from the same region but sampled before the outbreak) with matching genotypes, using short-read sequencing. Epidemiologic and genomic data were overlaid on a social network constructed by means of interviews with patients to determine the origins and transmission dynamics of the outbreak. Whole-genome data revealed two genetically distinct lineages of M. tuberculosis with identical MIRU-VNTR genotypes, suggesting two concomitant outbreaks. Integration of social-network and phylogenetic analyses revealed several transmission events, including those involving "superspreaders." Both lineages descended from a common ancestor and had been detected in the community before the outbreak, suggesting a social, rather than genetic, trigger. Further epidemiologic investigation revealed that the onset of the outbreak coincided with a recorded increase in crack cocaine use in the community. Through integration of large-scale bacterial whole-genome sequencing and social-network analysis, we show that a socioenvironmental factor--most likely increased crack cocaine use--triggered the simultaneous expansion of two extant lineages of M. tuberculosis that was sustained by key members of a high-risk social network. Genotyping and contact tracing alone did not capture the true dynamics of the outbreak. (Funded by Genome British Columbia and others.).

  15. Rapid whole genome sequencing for the detection and characterization of microorganisms directly from clinical samples

    DEFF Research Database (Denmark)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Pontén, Thomas

    2014-01-01

    Whole genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples this could further reduce diagnostic time and thereby improve control and treatment. A major bottle-neck is the availability of fast and reliable bioinformatics...... microbiology, WGS of isolated bacteria and by directly sequencing on pellets from the urine. A rapid method for analyzing the sequence data was developed. Bacteria were cultivated from 19 samples, but only in pure culture from 17. WGS improved the identification of the cultivated bacteria and almost complete...

  16. Whole-genome shotgun sequence of phenazine-producing endophytic Streptomyces kebangsaanensis SUK12

    Directory of Open Access Journals (Sweden)

    Juwairiah Remali

    2017-09-01

    Full Text Available Streptomyces sp. produces bioactive compounds with a broad spectrum of activities. Streptomyces kebangsaanesis SUK12 has been identified as a novel endophytic bacteria isolated from ethnomedicinal plant Portulaca olerace, and was found to produce the phenazine class of biologically active antimicrobial metabolites. The potential use of the phenazines has led to our research interest in determining the genome sequence of Streptomyces kebangsaanensis SUK12. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession number PRJNA269542. The raw sequence data are available [https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP105770].

  17. The Promise of Whole Genome Pathogen Sequencing for the Molecular Epidemiology of Emerging Aquaculture Pathogens

    Science.gov (United States)

    Bayliss, Sion C.; Verner-Jeffreys, David W.; Bartie, Kerry L.; Aanensen, David M.; Sheppard, Samuel K.; Adams, Alexandra; Feil, Edward J.

    2017-01-01

    Aquaculture is the fastest growing food-producing sector, and the sustainability of this industry is critical both for global food security and economic welfare. The management of infectious disease represents a key challenge. Here, we discuss the opportunities afforded by whole genome sequencing of bacterial and viral pathogens of aquaculture to mitigate disease emergence and spread. We outline, by way of comparison, how sequencing technology is transforming the molecular epidemiology of pathogens of public health importance, emphasizing the importance of community-oriented databases and analysis tools. PMID:28217117

  18. Long insert whole genome sequencing for copy number variant and translocation detection

    OpenAIRE

    Liang, Winnie S.; Aldrich, Jessica; Tembe, Waibhav; Kurdoglu, Ahmet; Cherni, Irene; Phillips, Lori; Reiman, Rebecca; Baker, Angela; Weiss, Glen J.; Carpten, John D.; Craig, David W.

    2013-01-01

    As next-generation sequencing continues to have an expanding presence in the clinic, the identification of the most cost-effective and robust strategy for identifying copy number changes and translocations in tumor genomes is needed. We hypothesized that performing shallow whole genome sequencing (WGS) of 900–1000-bp inserts (long insert WGS, LI-WGS) improves our ability to detect these events, compared with shallow WGS of 300–400-bp inserts. A priori analyses show that LI-WGS requires less s...

  19. Refining QTL with high-density SNP genotyping and whole genome sequence in three cattle breeds

    DEFF Research Database (Denmark)

    Sahana, Goutam; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2012-01-01

    method. Principal components were used to account for population structure. The QTL segregating in all three breeds were selected and a few of the most significant ones were followed in further analyses. The polymorphisms in the identified QTL regions were imputed using 90 whole genome sequences......Genome-wide association study was carried out in Nordic Holsteins, Nordic Red and Jersey breeds for functional traits using BovineHD Genotyping BreadChip (Illumina, San Diego, CA). The association analyses were carried out using both linear mixed model approach and a Bayesian variable selection...

  20. Whole-genome shotgun sequence of phenazine-producing endophytic Streptomyces kebangsaanensis SUK12.

    Science.gov (United States)

    Remali, Juwairiah; Loke, Kok-Keong; Ng, Chyan Leong; Aizat, Wan Mohd; Tiong, John; Zin, Noraziah Mohamad

    2017-09-01

    Streptomyces sp. produces bioactive compounds with a broad spectrum of activities. Streptomyces kebangsaanesis SUK12 has been identified as a novel endophytic bacteria isolated from ethnomedicinal plant Portulaca olerace, and was found to produce the phenazine class of biologically active antimicrobial metabolites. The potential use of the phenazines has led to our research interest in determining the genome sequence of Streptomyces kebangsaanensis SUK12. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession number PRJNA269542. The raw sequence data are available [https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP105770].

  1. Whole genome sequence of Pseudomonas aeruginosa F9676, an antagonistic bacterium isolated from rice seed.

    Science.gov (United States)

    Shi, Zhenyuan; Ren, Deyong; Hu, Shikai; Hu, Xingming; Wu, Liwen; Lin, Haiyan; Hu, Jiang; Zhang, Guangheng; Guo, Longbiao

    2015-10-10

    Pseudomonas aeruginosa is a group of bacteria, which can be isolated from diverse ecological niches. P. aeruginosa strain F9676 was first isolated from a rice seed sample in 2003. It showed strong antagonism against several plant pathogens. In this study, whole genome sequencing was carried out. The total genome size of F9676 is 6368,008bp with 5586 coding genes (CDS), 67 tRNAs and 3 rRNAs. The genome sequence of F9676 may shed a light on antagonism P. aeruginosa. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Whole genome analysis provides evidence for porcine-to-simian interspecies transmission of rotavirus-A.

    Science.gov (United States)

    Navarro, Ryan; Aung, Meiji Soe; Cruz, Katalina; Ketzis, Jennifer; Gallagher, Christa Ann; Beierschmitt, Amy; Malik, Yashpal Singh; Kobayashi, Nobumichi; Ghosh, Souvik

    2017-04-01

    We report here whole genome analysis of a porcine rotavirus-A (RVA) strain RVA/Pig-wt/KNA/ET8B/2015/G5P[13] detected in a diarrheic piglet, and nearly whole genome (except for VP4 gene) analysis of a simian RVA strain RVA/Simian-wt/KNA/08979/2015/G5P[X] detected in a non-diarrheic African green monkey (AGM) on the island of St. Kitts, Caribbean region. Strain ET8B exhibited a G5-P[13]-I5-R1-C1-M1-A8-N1-T7-E1-H1 genotype constellation that was identical to those of Brazilian porcine RVA G5P[13] strains RVA/Pig-wt/BRA/ROTA01/2013/G5P[13] and RVA/Pig-wt/BRA/ROTA07/2013/G5P[13], the only porcine G5P[13] RVAs that have been analyzed for the whole genome so far. Phylogenetically, all the 11 gene segments of ET8B were closely related to those of porcine and porcine-like human RVAs within the respective genotypes. Although the porcine G5P[13] RVAs exhibited identical genotype constellations, ET8B did not appear to share common evolutionary pathways with the Brazilian porcine G5P[13] RVAs. Interestingly, the VP2, VP3, VP6, VP7, and NSP1-NSP5 genes of simian RVA strain 08979 were closely related to those of porcine and porcine-like human RVA strains, exhibiting 99%-100% nucleotide sequence identities to cognate genes of co-circulating porcine RVA strain ET8B. On the other hand, the VP1 of 08979 appeared to be genetically divergent from porcine and human RVAs within the R1 genotype, and its exact origin could not be ascertained. Taken together, these observations suggested that simian strain 08979 might have been derived from interspecies transmission events involving transmission of ET8B-like RVAs from pigs to AGMs. In St. Kitts, AGMs often stray from the wild into livestock farms. Therefore, it may be possible that the AGM acquired the infection from a pig farm on the island. To our knowledge, this is the first report on detection of porcine-like RVAs in monkeys. Also, the present study is the first to report whole genomic analysis of a porcine RVA strain from the Caribbean

  3. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci

    DEFF Research Database (Denmark)

    Rasmusen, L. H.; Dargis, R.; Iversen, Katrine Højholt

    2016-01-01

    Identification of Mitis group streptococci (MGS) to the species level is challenging for routine microbiology laboratories. Correct identification is crucial for the diagnosis of infective endocarditis, identification of treatment failure, and/or infection relapse. Eighty MGS from Danish patients...... with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were...

  4. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation

    DEFF Research Database (Denmark)

    Zhao, Shancen; Zheng, Pingping; Dong, Shanshan

    2013-01-01

    The panda lineage dates back to the late Miocene and ultimately leads to only one extant species, the giant panda (Ailuropoda melanoleuca). Although global climate change and anthropogenic disturbances are recognized to shape animal population demography their contribution to panda population...... dynamics remains largely unknown. We sequenced the whole genomes of 34 pandas at an average 4.7-fold coverage and used this data set together with the previously deep-sequenced panda genome to reconstruct a continuous demographic history of pandas from their origin to the present. We identify two...

  5. Whole genome shotgun sequence of Bacillus amyloliquefaciens TF28, a biocontrol entophytic bacterium.

    Science.gov (United States)

    Zhang, Shumei; Jiang, Wei; Li, Jing; Meng, Liqiang; Cao, Xu; Hu, Jihua; Liu, Yushuai; Chen, Jingyu; Sha, Changqing

    2016-01-01

    Bacillus amyloliquefaciens TF28 is a biocontrol endophytic bacterium that is capable of inhibition of a broad range of plant pathogenic fungi. The strain has the potential to be developed into a biocontrol agent for use in agriculture. Here we report the whole-genome shotgun sequence of the strain. The genome size of B. amyloliquefaciens TF28 is 3,987,635 bp which consists of 3754 protein-coding genes, 65 tandem repeat sequences, 47 minisatellite DNA, 2 microsatellite DNA, 63 tRNA, 7rRNA, 6 sRNA, 3 prophage and CRISPR domains.

  6. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

    OpenAIRE

    Ruibang Luo; Yiu-Lun Wong; Wai-Chun Law; Lap-Kei Lee; Jeanno Cheung; Chi-Man Liu; Tak-Wah Lam

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome ...

  7. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome.

    Science.gov (United States)

    Chapman, Jarrod A; Mascher, Martin; Buluç, Aydın; Barry, Kerrie; Georganas, Evangelos; Session, Adam; Strnadova, Veronika; Jenkins, Jerry; Sehgal, Sunish; Oliker, Leonid; Schmutz, Jeremy; Yelick, Katherine A; Scholz, Uwe; Waugh, Robbie; Poland, Jesse A; Muehlbauer, Gary J; Stein, Nils; Rokhsar, Daniel S

    2015-01-31

    Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

  8. Whole-genome sequencing for identification of the source in hospital-acquired Legionnaires' disease

    DEFF Research Database (Denmark)

    Rosendahl Madsen, A M; Holm, A; Jensen, T G

    2017-01-01

    -genome sequencing to identify the source of infection in hospital-acquired Legionnaires' disease. Phylogenetic analyses showed close relatedness between one patient isolate and a strain found in hospital water, confirming suspicion of nosocomial infection. It was found that whole-genome sequencing can be a useful......Acquisition of Legionnaires' disease is a serious complication of hospitalization. Rapid determination of whether or not the infection is caused by strains of Legionella pneumophila in the hospital environment is crucial to avoid further cases. This study investigated the use of whole...

  9. Whole genome sequencing as a means to assess pathogenic mutations in medical genetics and cancer.

    Science.gov (United States)

    Royer-Bertrand, Beryl; Rivolta, Carlo

    2015-04-01

    The past decade has seen the emergence of next-generation sequencing (NGS) technologies, which have revolutionized the field of human molecular genetics. With NGS, significant portions of the human genome can now be assessed by direct sequence analysis, highlighting normal and pathological variants of our DNA. Recent advances have also allowed the sequencing of complete genomes, by a method referred to as whole genome sequencing (WGS). In this work, we review the use of WGS in medical genetics, with specific emphasis on the benefits and the disadvantages of this technique for detecting genomic alterations leading to Mendelian human diseases and to cancer.

  10. A green-cotyledon/stay-green mutant exemplifies the ancient whole-genome duplications in soybean.

    Science.gov (United States)

    Nakano, Michiharu; Yamada, Tetsuya; Masuda, Yu; Sato, Yutaka; Kobayashi, Hideki; Ueda, Hiroaki; Morita, Ryouhei; Nishimura, Minoru; Kitamura, Keisuke; Kusaba, Makoto

    2014-10-01

    The recent whole-genome sequencing of soybean (Glycine max) revealed that soybean experienced whole-genome duplications 59 million and 13 million years ago, and it has an octoploid-like genome in spite of its diploid nature. We analyzed a natural green-cotyledon mutant line, Tenshin-daiseitou. The physiological analysis revealed that Tenshin-daiseitou shows a non-functional stay-green phenotype in senescent leaves, which is similar to that of the mutant of Mendel's green-cotyledon gene I, the ortholog of SGR in pea. The identification of gene mutations and genetic segregation analysis suggested that defects in GmSGR1 and GmSGR2 were responsible for the green-cotyledon/stay-green phenotype of Tenshin-daiseitou, which was confirmed by RNA interference (RNAi) transgenic soybean experiments using GmSGR genes. The characterized green-cotyledon double mutant d1d2 was found to have the same mutations, suggesting that GmSGR1 and GmSGR2 are D1 and D2. Among the examined d1d2 strains, the d1d2 strain K144a showed a lower Chl a/b ratio in mature seeds than other strains but not in senescent leaves, suggesting a seed-specific genetic factor of the Chl composition in K144a. Analysis of the soybean genome sequence revealed four genomic regions with microsynteny to the Arabidopsis SGR1 region, which included the GmSGR1 and GmSGR2 regions. The other two regions contained GmSGR3a/GmSGR3b and GmSGR4, respectively, which might be pseudogenes or genes with a function that is unrelated to Chl degradation during seed maturation and leaf senescence. These GmSGR genes were thought to be produced by the two whole-genome duplications, and they provide a good example of such whole-genome duplication events in the evolution of the soybean genome. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  11. Identification of emergent blaCMY-2-carrying Proteus mirabilis lineages by whole-genome sequencing

    Directory of Open Access Journals (Sweden)

    M. Mac Aogáin

    2016-01-01

    Full Text Available Whole-genome sequencing of 24 Proteus mirabilis isolates revealed the clonal expansion of two cefoxitin-resistant strains among patients with community-onset infection. These strains harboured blaCMY-2 within a chromosomally located integrative and conjugative element and exhibited multidrug resistance phenotypes. A predominant strain, identified in 18 patients, also harboured the PGI-1 genomic island and associated resistance genes, accounting for its broader antibiotic resistance profile. The identification of these novel multidrug-resistant strains among community-onset infections suggests that they are endemic to this region and represent emergent P. mirabilis lineages of clinical significance.

  12. Fiber Amplifiers

    DEFF Research Database (Denmark)

    Rottwitt, Karsten

    2017-01-01

    The chapter provides a discussion of optical fiber amplifiers and through three sections provides a detailed treatment of three types of optical fiber amplifiers, erbium doped fiber amplifiers (EDFA), Raman amplifiers, and parametric amplifiers. Each section comprises the fundamentals including...... the basic physics and relevant in-depth theoretical modeling, amplifiers characteristics and performance data as a function of specific operation parameters. Typical applications in fiber optic communication systems and the improvement achievable through the use of fiber amplifiers are illustrated....

  13. Rediscovery by Whole Genome Sequencing: Classical Mutations and Genome Polymorphisms in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    McCluskey, Kevin; Wiest, Aric E.; Grigoriev, Igor V.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Baker, Scott E.

    2011-06-02

    Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.

  14. Comparative whole genome sequence analysis of wild-type and cidofovir-resistant monkeypoxvirus

    Directory of Open Access Journals (Sweden)

    Huggins John

    2010-05-01

    Full Text Available Abstract We performed whole genome sequencing of a cidofovir {[(S-1-(3-hydroxy-2-phosphonylmethoxy-propyl cytosine] [HPMPC]}-resistant (CDV-R strain of Monkeypoxvirus (MPV. Whole-genome comparison with the wild-type (WT strain revealed 55 single-nucleotide polymorphisms (SNPs and one tandem-repeat contraction. Over one-third of all identified SNPs were located within genes comprising the poxvirus replication complex, including the DNA polymerase, RNA polymerase, mRNA capping methyltransferase, DNA processivity factor, and poly-A polymerase. Four polymorphic sites were found within the DNA polymerase gene. DNA polymerase mutations observed at positions 314 and 684 in MPV were consistent with CDV-R loci previously identified in Vaccinia virus (VACV. These data suggest the mechanism of CDV resistance may be highly conserved across Orthopoxvirus (OPV species. SNPs were also identified within virulence genes such as the A-type inclusion protein, serine protease inhibitor-like protein SPI-3, Schlafen ATPase and thymidylate kinase, among others. Aberrant chain extension induced by CDV may lead to diverse alterations in gene expression and viral replication that may result in both adaptive and attenuating mutations. Defining the potential contribution of substitutions in the replication complex and RNA processing machinery reported here may yield further insight into CDV resistance and may augment current therapeutic development strategies.

  15. HAPLOWSER: a whole-genome haplotype browser for personal genome and metagenome.

    Science.gov (United States)

    Kim, Jong Hyun; Kim, Woo-Cheol; Waterman, Michael S; Park, Sanghyun; Li, Lei M

    2009-09-15

    Haplotype assembly is becoming a very important tool in genome sequencing of human and other organisms. Although haplotypes were previously inferred from genome assemblies, there has never been a comparative haplotype browser that depicts a global picture of whole-genome alignments among haplotypes of different organisms. We introduce a whole-genome HAPLotype brOWSER (HAPLOWSER), providing evolutionary perspectives from multiple aligned haplotypes and functional annotations. Haplowser enables the comparison of haplotypes from metagenomes, and associates conserved regions or the bases at the conserved regions with functional annotations and custom tracks. The associations are quantified for further analysis and presented as pie charts. Functional annotations and custom tracks that are projected onto haplotypes are saved as multiple files in FASTA format. Haplowser provides a user-friendly interface, and can display alignments of haplotypes with functional annotations at any resolution. Haplowser, written in Java, supports multiple platforms including Windows and Linux. Haplowser is publicly available at http://embio.yonsei.ac.kr/haplowser .

  16. Rapid construction of a whole-genome transposon insertion collection for Shewanella oneidensis by Knockout Sudoku

    Science.gov (United States)

    Baym, Michael; Shaket, Lev; Anzai, Isao A.; Adesina, Oluwakemi; Barstow, Buz

    2016-01-01

    Whole-genome knockout collections are invaluable for connecting gene sequence to function, yet traditionally, their construction has required an extraordinary technical effort. Here we report a method for the construction and purification of a curated whole-genome collection of single-gene transposon disruption mutants termed Knockout Sudoku. Using simple combinatorial pooling, a highly oversampled collection of mutants is condensed into a next-generation sequencing library in a single day, a 30- to 100-fold improvement over prior methods. The identities of the mutants in the collection are then solved by a probabilistic algorithm that uses internal self-consistency within the sequencing data set, followed by rapid algorithmically guided condensation to a minimal representative set of mutants, validation, and curation. Starting from a progenitor collection of 39,918 mutants, we compile a quality-controlled knockout collection of the electroactive microbe Shewanella oneidensis MR-1 containing representatives for 3,667 genes that is functionally validated by high-throughput kinetic measurements of quinone reduction. PMID:27830751

  17. Kernel-based whole-genome prediction of complex traits: a review

    Directory of Open Access Journals (Sweden)

    Gota eMorota

    2014-10-01

    Full Text Available Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways, thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  18. Independent Evolution of Winner Traits without Whole Genome Duplication in Dekkera Yeasts.

    Science.gov (United States)

    Guo, Yi-Cheng; Zhang, Lin; Dai, Shao-Xing; Li, Wen-Xing; Zheng, Jun-Juan; Li, Gong-Hua; Huang, Jing-Fei

    2016-01-01

    Dekkera yeasts have often been considered as alternative sources of ethanol production that could compete with S. cerevisiae. The two lineages of yeasts independently evolved traits that include high glucose and ethanol tolerance, aerobic fermentation, and a rapid ethanol fermentation rate. The Saccharomyces yeasts attained these traits mainly through whole genome duplication approximately 100 million years ago (Mya). However, the Dekkera yeasts, which were separated from S. cerevisiae approximately 200 Mya, did not undergo whole genome duplication (WGD) but still occupy a niche similar to S. cerevisiae. Upon analysis of two Dekkera yeasts and five closely related non-WGD yeasts, we found that a massive loss of cis-regulatory elements occurred in an ancestor of the Dekkera yeasts, which led to improved mitochondrial functions similar to the S. cerevisiae yeasts. The evolutionary analysis indicated that genes involved in the transcription and translation process exhibited faster evolution in the Dekkera yeasts. We detected 90 positively selected genes, suggesting that the Dekkera yeasts evolved an efficient translation system to facilitate adaptive evolution. Moreover, we identified that 12 vacuolar H+-ATPase (V-ATPase) function genes that were under positive selection, which assists in developing tolerance to high alcohol and high sugar stress. We also revealed that the enzyme PGK1 is responsible for the increased rate of glycolysis in the Dekkera yeasts. These results provide important insights to understand the independent adaptive evolution of the Dekkera yeasts and provide tools for genetic modification promoting industrial usage.

  19. Independent Evolution of Winner Traits without Whole Genome Duplication in Dekkera Yeasts.

    Directory of Open Access Journals (Sweden)

    Yi-Cheng Guo

    Full Text Available Dekkera yeasts have often been considered as alternative sources of ethanol production that could compete with S. cerevisiae. The two lineages of yeasts independently evolved traits that include high glucose and ethanol tolerance, aerobic fermentation, and a rapid ethanol fermentation rate. The Saccharomyces yeasts attained these traits mainly through whole genome duplication approximately 100 million years ago (Mya. However, the Dekkera yeasts, which were separated from S. cerevisiae approximately 200 Mya, did not undergo whole genome duplication (WGD but still occupy a niche similar to S. cerevisiae. Upon analysis of two Dekkera yeasts and five closely related non-WGD yeasts, we found that a massive loss of cis-regulatory elements occurred in an ancestor of the Dekkera yeasts, which led to improved mitochondrial functions similar to the S. cerevisiae yeasts. The evolutionary analysis indicated that genes involved in the transcription and translation process exhibited faster evolution in the Dekkera yeasts. We detected 90 positively selected genes, suggesting that the Dekkera yeasts evolved an efficient translation system to facilitate adaptive evolution. Moreover, we identified that 12 vacuolar H+-ATPase (V-ATPase function genes that were under positive selection, which assists in developing tolerance to high alcohol and high sugar stress. We also revealed that the enzyme PGK1 is responsible for the increased rate of glycolysis in the Dekkera yeasts. These results provide important insights to understand the independent adaptive evolution of the Dekkera yeasts and provide tools for genetic modification promoting industrial usage.

  20. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence.

    Science.gov (United States)

    Dewey, Frederick E; Chen, Rong; Cordero, Sergio P; Ormond, Kelly E; Caleshu, Colleen; Karczewski, Konrad J; Whirl-Carrillo, Michelle; Wheeler, Matthew T; Dudley, Joel T; Byrnes, Jake K; Cornejo, Omar E; Knowles, Joshua W; Woon, Mark; Sangkuhl, Katrin; Gong, Li; Thorn, Caroline F; Hebert, Joan M; Capriotti, Emidio; David, Sean P; Pavlovic, Aleksandra; West, Anne; Thakuria, Joseph V; Ball, Madeleine P; Zaranek, Alexander W; Rehm, Heidi L; Church, George M; West, John S; Bustamante, Carlos D; Snyder, Michael; Altman, Russ B; Klein, Teri E; Butte, Atul J; Ashley, Euan A

    2011-09-01

    Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.

  1. Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data.

    Science.gov (United States)

    Dewey, Frederick E; Grove, Megan E; Priest, James R; Waggott, Daryl; Batra, Prag; Miller, Clint L; Wheeler, Matthew; Zia, Amin; Pan, Cuiping; Karzcewski, Konrad J; Miyake, Christina; Whirl-Carrillo, Michelle; Klein, Teri E; Datta, Somalee; Altman, Russ B; Snyder, Michael; Quertermous, Thomas; Ashley, Euan A

    2015-10-01

    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

  2. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Science.gov (United States)

    Alkan, Can; Ventura, Mario; Archidiacono, Nicoletta; Rocchi, Mariano; Sahinalp, S Cenk; Eichler, Evan E

    2007-09-01

    The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  3. Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

    Science.gov (United States)

    Robins-Browne, Roy M; Holt, Kathryn E; Ingle, Danielle J; Hocking, Dianna M; Yang, Ji; Tauschek, Marija

    2016-01-01

    The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E.coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods.

  4. Molecular characterization of avian polyomavirus isolated from psittacine birds based on the whole genome sequence analysis.

    Science.gov (United States)

    Katoh, Hiroshi; Ohya, Kenji; Une, Yumi; Yamaguchi, Tsuyoshi; Fukushi, Hideto

    2009-07-02

    Seven avian polyomaviruses (APVs) were isolated from seven psittacine birds of four species. Their whole genome sequences were genetically analyzed. Comparing with the sequence of BFDV1 strain, nucleotide substitutions in the sequences of seven APV isolates were found at 63 loci and a high level of conservation of amino acid sequence in each viral protein (VP1, VP2, VP3, VP4, and t/T antigen) was predicted. An A-to-T nucleotide substitution was observed in non-control region of all seven APV sequences in comparison with BFDV1 strain. Two C-to-T nucleotide substitutions were also detected in non-coding regions of one isolate. A phylogenetic analysis of the whole genome sequences indicated that the sequences from the same species of bird were closely related. APV has been reported to have distinct tropism for cell cultures of various avian species. The present study indicated that a single amino acid substitution at position 221 in VP2 was essential for propagating in chicken embryonic fibroblast culture and this substitution was promoted by propagation on budgerigar embryonic fibroblast culture. For two isolates, three serial amino acids appeared to be deleted in VP4. However, this deletion had little effect on virus propagation.

  5. Analysis of a Streptococcus pyogenes puerperal sepsis cluster by use of whole-genome sequencing.

    Science.gov (United States)

    Ben Zakour, Nouri L; Venturini, Carola; Beatson, Scott A; Walker, Mark J

    2012-07-01

    Between June and November 2010, a concerning rise in the number of cases of puerperal sepsis, a postpartum pelvic bacterial infection contracted by women after childbirth, was observed in the New South Wales, Australia, hospital system. Group A streptococcus (GAS; Streptococcus pyogenes) isolates PS001 to PS011 were recovered from nine patients. Pulsed-field gel electrophoresis and emm sequence typing revealed that GAS of emm1.40, emm75.0, emm77.0, emm89.0, and emm89.9 were each recovered from a single patient, ruling out a single source of infection. However, emm28.8 GAS were recovered from four different patients. To investigate the relatedness of these emm28 isolates, whole-genome sequencing was undertaken and the genome sequences were compared to the genome sequence of the emm28.4 reference strain, MGAS6180. A total of 186 single nucleotide polymorphisms were identified, for which the phylogenetic reconstruction indicated an outbreak of a polyclonal nature. While two isolates collected from different hospitals were not closely related, isolates from two puerperal sepsis patients from the same hospital were indistinguishable, suggesting patient-to-patient transmission or infection from a common source. The results of this study indicate that traditional typing protocols, such as pulsed-field gel electrophoresis, may not be sensitive enough to allow fine epidemiological discrimination of closely related bacterial isolates. Whole-genome sequencing presents a valid alternative that allows accurate fine-scale epidemiological investigation of bacterial infectious disease.

  6. Whole genome sequence typing to investigate the Apophysomyces outbreak following a tornado in Joplin, Missouri, 2011.

    Science.gov (United States)

    Etienne, Kizee A; Gillece, John; Hilsabeck, Remy; Schupp, Jim M; Colman, Rebecca; Lockhart, Shawn R; Gade, Lalitha; Thompson, Elizabeth H; Sutton, Deanna A; Neblett-Fanfair, Robyn; Park, Benjamin J; Turabelidze, George; Keim, Paul; Brandt, Mary E; Deak, Eszter; Engelthaler, David M

    2012-01-01

    Case reports of Apophysomyces spp. in immunocompetent hosts have been a result of traumatic deep implantation of Apophysomyces spp. spore-contaminated soil or debris. On May 22, 2011 a tornado occurred in Joplin, MO, leaving 13 tornado victims with Apophysomyces trapeziformis infections as a result of lacerations from airborne material. We used whole genome sequence typing (WGST) for high-resolution phylogenetic SNP analysis of 17 outbreak Apophysomyces isolates and five additional temporally and spatially diverse Apophysomyces control isolates (three A. trapeziformis and two A. variabilis isolates). Whole genome SNP phylogenetic analysis revealed three clusters of genotypically related or identical A. trapeziformis isolates and multiple distinct isolates among the Joplin group; this indicated multiple genotypes from a single or multiple sources. Though no linkage between genotype and location of exposure was observed, WGST analysis determined that the Joplin isolates were more closely related to each other than to the control isolates, suggesting local population structure. Additionally, species delineation based on WGST demonstrated the need to reassess currently accepted taxonomic classifications of phylogenetic species within the genus Apophysomyces.

  7. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    Science.gov (United States)

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  8. Pathway Processor: A Tool for Integrating Whole-Genome Expression Results into Metabolic Networks

    Science.gov (United States)

    Grosu, Paul; Townsend, Jeffrey P.; Hartl, Daniel L.; Cavalieri, Duccio

    2002-01-01

    We have developed a new tool to visualize expression data on metabolic pathways and to evaluate which metabolic pathways are most affected by transcriptional changes in whole-genome expression experiments. Using the Fisher Exact Test, the method scores biochemical pathways according to the probability that as many or more genes in a pathway would be significantly altered in a given experiment by chance alone. This method has been validated on diauxic shift experiments and reproduces well known effects of carbon source on yeast metabolism. The analysis is implemented with Pathway Analyzer, one of the tools of Pathway Processor, a new statistical package for the analysis of whole-genome expression data. Results from multiple experiments can be compared, reducing the analysis from the full set of individual genes to a limited number of pathways of interest. The pathways are visualized with OpenDX, an open-source visualization software package, and the relationship between genes in the pathways can be examined in detail using Expression Mapper, the second program of the package. This program features a graphical output displaying differences in expression on metabolic charts of the biochemical pathways to which the open reading frames are assigned. [Supplementary materials are available at http://www.cgr.harvard.edu/cavalieri/pp.html and http://www.genome.org.] PMID:12097350

  9. Comparison of microbial DNA enrichment tools for metagenomic whole genome sequencing.

    Science.gov (United States)

    Thoendel, Matthew; Jeraldo, Patricio R; Greenwood-Quaintance, Kerryl E; Yao, Janet Z; Chia, Nicholas; Hanssen, Arlen D; Abdel, Matthew P; Patel, Robin

    2016-08-01

    Metagenomic whole genome sequencing for detection of pathogens in clinical samples is an exciting new area for discovery and clinical testing. A major barrier to this approach is the overwhelming ratio of human to pathogen DNA in samples with low pathogen abundance, which is typical of most clinical specimens. Microbial DNA enrichment methods offer the potential to relieve this limitation by improving this ratio. Two commercially available enrichment kits, the NEBNext Microbiome DNA Enrichment Kit and the Molzym MolYsis Basic kit, were tested for their ability to enrich for microbial DNA from resected arthroplasty component sonicate fluids from prosthetic joint infections or uninfected sonicate fluids spiked with Staphylococcus aureus. Using spiked uninfected sonicate fluid there was a 6-fold enrichment of bacterial DNA with the NEBNext kit and 76-fold enrichment with the MolYsis kit. Metagenomic whole genome sequencing of sonicate fluid revealed 13- to 85-fold enrichment of bacterial DNA using the NEBNext enrichment kit. The MolYsis approach achieved 481- to 9580-fold enrichment, resulting in 7 to 59% of sequencing reads being from the pathogens known to be present in the samples. These results demonstrate the usefulness of these tools when testing clinical samples with low microbial burden using next generation sequencing. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Expansion by whole genome duplication and evolution of the sox gene family in teleost fish.

    Science.gov (United States)

    Voldoire, Emilien; Brunet, Frédéric; Naville, Magali; Volff, Jean-Nicolas; Galiana, Delphine

    2017-01-01

    It is now recognized that several rounds of whole genome duplication (WGD) have occurred during the evolution of vertebrates, but the link between WGDs and phenotypic diversification remains unsolved. We have investigated in this study the impact of the teleost-specific WGD on the evolution of the sox gene family in teleostean fishes. The sox gene family, which encodes for transcription factors, has essential role in morphology, physiology and behavior of vertebrates and teleosts, the current largest group of vertebrates. We have first redrawn the evolution of all sox genes identified in eleven teleost genomes using a comparative genomic approach including phylogenetic and synteny analyses. We noticed, compared to tetrapods, an important expansion of the sox family: 58% (11/19) of sox genes are duplicated in teleost genomes. Furthermore, all duplicated sox genes, except sox17 paralogs, are derived from the teleost-specific WGD. Then, focusing on five sox genes, analyzing the evolution of coding and non-coding sequences, as well as the expression patterns in fish embryos and adult tissues, we demonstrated that these paralogs followed lineage-specific evolutionary trajectories in teleost genomes. This work, based on whole genome data from multiple teleostean species, supports the contribution of WGDs to the expansion of gene families, as well as to the emergence of genomic differences between lineages that might promote genetic and phenotypic diversity in teleosts.

  11. A comprehensive whole-genome integrated cytogenetic map for the alpaca (Lama pacos).

    Science.gov (United States)

    Avila, Felipe; Baily, Malorie P; Perelman, Polina; Das, Pranab J; Pontius, Joan; Chowdhary, Renuka; Owens, Elaine; Johnson, Warren E; Merriwether, David A; Raudsepp, Terje

    2014-01-01

    Genome analysis of the alpaca (Lama pacos, LPA) has progressed slowly compared to other domestic species. Here, we report the development of the first comprehensive whole-genome integrated cytogenetic map for the alpaca using fluorescence in situ hybridization (FISH) and CHORI-246 BAC library clones. The map is comprised of 230 linearly ordered markers distributed among all 36 alpaca autosomes and the sex chromosomes. For the first time, markers were assigned to LPA14, 21, 22, 28, and 36. Additionally, 86 genes from 15 alpaca chromosomes were mapped in the dromedary camel (Camelus dromedarius, CDR), demonstrating exceptional synteny and linkage conservation between the 2 camelid genomes. Cytogenetic mapping of 191 protein-coding genes improved and refined the known Zoo-FISH homologies between camelids and humans: we discovered new homologous synteny blocks (HSBs) corresponding to HSA1-LPA/CDR11, HSA4-LPA/CDR31 and HSA7-LPA/CDR36, and revised the location of breakpoints for others. Overall, gene mapping was in good agreement with the Zoo-FISH and revealed remarkable evolutionary conservation of gene order within many human-camelid HSBs. Most importantly, 91 FISH-mapped markers effectively integrated the alpaca whole-genome sequence and the radiation hybrid maps with physical chromosomes, thus facilitating the improvement of the sequence assembly and the discovery of genes of biological importance. © 2015 S. Karger AG, Basel.

  12. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  13. Integration of transcriptome and whole genomic resequencing data to identify key genes affecting swine fat deposition.

    Directory of Open Access Journals (Sweden)

    Kai Xing

    Full Text Available Fat deposition is highly correlated with the growth, meat quality, reproductive performance and immunity of pigs. Fatty acid synthesis takes place mainly in the adipose tissue of pigs; therefore, in this study, a high-throughput massively parallel sequencing approach was used to generate adipose tissue transcriptomes from two groups of Songliao black pigs that had opposite backfat thickness phenotypes. The total number of paired-end reads produced for each sample was in the range of 39.29-49.36 millions. Approximately 188 genes were differentially expressed in adipose tissue and were enriched for metabolic processes, such as fatty acid biosynthesis, lipid synthesis, metabolism of fatty acids, etinol, caffeine and arachidonic acid and immunity. Additionally, many genetic variations were detected between the two groups through pooled whole-genome resequencing. Integration of transcriptome and whole-genome resequencing data revealed important genomic variations among the differentially expressed genes for fat deposition, for example, the lipogenic genes. Further studies are required to investigate the roles of candidate genes in fat deposition to improve pig breeding programs.

  14. Whole-genome sequence comparison as a method for improving bacterial species definition.

    Science.gov (United States)

    Zhang, Wen; Du, Pengcheng; Zheng, Han; Yu, Weiwen; Wan, Li; Chen, Chen

    2014-01-01

    We compared pairs of 1,226 bacterial strains with whole genome sequences and calculated their average nucleotide identity (ANI) between genomes to determine whether whole genome comparison can be directly used for bacterial species definition. We found that genome comparisons of two bacterial strains from the same species (SGC) have a significantly higher ANI than those of two strains from different species (DGC), and that the ANI between the query and the reference genomes can be used to determine whether two genomes come from the same species. Bacterial species definition based on ANI with a cut-off value of 0.92 matched well (81.5%) with the current bacterial species definition. The ANI value was shown to be consistent with the standard for traditional bacterial species definition, and it could be used in bacterial taxonomy for species definition. A new bioinformatics program (ANItools) was also provided in this study for users to obtain the ANI value of any two bacterial genome pairs (http://genome.bioinfo-icdc.org/). This program can match a query strain to all bacterial genomes, and identify the highest ANI value of the strain at the species, genus and family levels respectively, providing valuable insights for species definition.

  15. Whole genome analysis of linezolid resistance in Streptococcus pneumoniae reveals resistance and compensatory mutations

    Directory of Open Access Journals (Sweden)

    Légaré Danielle

    2011-10-01

    Full Text Available Abstract Background Several mutations were present in the genome of Streptococcus pneumoniae linezolid-resistant strains but the role of several of these mutations had not been experimentally tested. To analyze the role of these mutations, we reconstituted resistance by serial whole genome transformation of a novel resistant isolate into two strains with sensitive background. We sequenced the parent mutant and two independent transformants exhibiting similar minimum inhibitory concentration to linezolid. Results Comparative genomic analyses revealed that transformants acquired G2576T transversions in every gene copy of 23S rRNA and that the number of altered copies correlated with the level of linezolid resistance and cross-resistance to florfenicol and chloramphenicol. One of the transformants also acquired a mutation present in the parent mutant leading to the overexpression of an ABC transporter (spr1021. The acquisition of these mutations conferred a fitness cost however, which was further enhanced by the acquisition of a mutation in a RNA methyltransferase implicated in resistance. Interestingly, the fitness of the transformants could be restored in part by the acquisition of altered copies of the L3 and L16 ribosomal proteins and by mutations leading to the overexpression of the spr1887 ABC transporter that were present in the original linezolid-resistant mutant. Conclusions Our results demonstrate the usefulness of whole genome approaches at detecting major determinants of resistance as well as compensatory mutations that alleviate the fitness cost associated with resistance.

  16. Computel: computation of mean telomere length from whole-genome next-generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Lilit Nersisyan

    Full Text Available Telomeres are the ends of eukaryotic chromosomes, consisting of consecutive short repeats that protect chromosome ends from degradation. Telomeres shorten with each cell division, leading to replicative cell senescence. Deregulation of telomere length homeostasis is associated with the development of various age-related diseases and cancers. A number of experimental techniques exist for telomere length measurement; however, until recently, the absence of tools for extracting telomere lengths from high-throughput sequencing data has significantly obscured the association of telomere length with molecular processes in normal and diseased conditions. We have developed Computel, a program in R for computing mean telomere length from whole-genome next-generation sequencing data. Computel is open source, and is freely available at https://github.com/lilit-nersisyan/computel. It utilizes a short-read alignment-based approach and integrates various popular tools for sequencing data analysis. We validated it with synthetic and experimental data, and compared its performance with the previously available software. The results have shown that Computel outperforms existing software in accuracy, independence of results from sequencing conditions, stability against inherent sequencing errors, and better ability to distinguish pure telomeric sequences from interstitial telomeric repeats. By providing a highly reliable methodology for determining telomere lengths from whole-genome sequencing data, Computel should help to elucidate the role of telomeres in cellular health and disease.

  17. Epigenetic regulation of subgenome dominance following whole genome triplication in Brassica rapa.

    Science.gov (United States)

    Cheng, Feng; Sun, Chao; Wu, Jian; Schnable, James; Woodhouse, Margaret R; Liang, Jianli; Cai, Chengcheng; Freeling, Michael; Wang, Xiaowu

    2016-07-01

    Subgenome dominance is an important phenomenon observed in allopolyploids after whole genome duplication, in which one subgenome retains more genes as well as contributes more to the higher expressing gene copy of paralogous genes. To dissect the mechanism of subgenome dominance, we systematically investigated the relationships of gene expression, transposable element (TE) distribution and small RNA targeting, relating to the multicopy paralogous genes generated from whole genome triplication in Brassica rapa. The subgenome dominance was found to be regulated by a relatively stable factor established previously, then inherited by and shared among B. rapa varieties. In addition, we found a biased distribution of TEs between flanking regions of paralogous genes. Furthermore, the 24-nt small RNAs target TEs and are negatively correlated to the dominant expression of individual paralogous gene pairs. The biased distribution of TEs among subgenomes and the targeting of 24-nt small RNAs together produce the dominant expression phenomenon at a subgenome scale. Based on these findings, we propose a bucket hypothesis to illustrate subgenome dominance and hybrid vigor. Our findings and hypothesis are valuable for the evolutionary study of polyploids, and may shed light on studies of hybrid vigor, which is common to most species. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  18. Kernel-based whole-genome prediction of complex traits: a review

    Science.gov (United States)

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics. PMID:25360145

  19. Two recombinant human interferon-beta 1a pharmaceutical preparations produce a similar transcriptional response determined using whole genome microarray analysis.

    Science.gov (United States)

    Prync, A E Sterin; Yankilevich, P; Barrero, P R; Bello, R; Marangunich, L; Vidal, A; Criscuolo, M; Benasayag, L; Famulari, A L; Domínguez, R O; Kauffman, M A; Diez, R A

    2008-02-01

    Recombinant human interferon-beta (IFN-b) is a well-established treatment for multiple sclerosis (MS). The regulatory process for marketing authorization of biosimilars is currently under debate in certain countries. In the EU, EMEA has clearly defined the process including overarching and product-specific guidelines, which includes clinical testing. Biosimilarity needs to be based on comparability criteria, including at least molecular characterization, biological activity relevant for the therapeutic effect and relative bioavailability ("bioequivalence"). In the case of such complex diseases as MS, where the effect of treatment is not so directly measurable, in vitro tools can provide additional data to support comparability. Genomic microarrays assays might be useful to compare multisource biopharmaceuticals. The aim of the present study was to compare the pharmacodynamic genomic effects (in terms of transcriptional regulation) of two recombinant human IFN-I(2)1a preparations on lymphocytes of multiple sclerosis patients using a whole genome microarray assay. We performed an ex vivo whole genome expression profiling of the effect of two preparations of IFN-I(2)1a on non-adherent mononuclears from five relapsing-remitting MS patients analyzing microarrays (CodeLink Human Whole Genome). Patients blood was drawn, PBMCs isolated and cultured in three different conditions: culture medium (control), 1,000 U/ml of IFN-I(2)1a (BLA- (STOFERON, Bio Sidus) and 1,000 U/ml of IFN-I(2)1a (REBIF, Serono) RNA was purified from non-adherent cells (mostly lymphocytes), amplified and hybridized. Raw data were generated by CodeLink proprietary software. Data normalization, quality control and analysis of differential gene expression between treatments were done using linear model for microarray data. Functional annotation analysis of IFN-I(2)1a MS treatment transcription was done using DAVID. Out of the approximately 45,000 human sequences examined, no evidence of differential

  20. Whole genome sequencing as a tool to investigate a cluster of seven cases of listeriosis in Austria and Germany, 2011–2013

    Science.gov (United States)

    Schmid, D; Allerberger, F; Huhulescu, S; Pietzka, A; Amar, C; Kleta, S; Prager, R; Preußel, K; Aichinger, E; Mellmann, A; Raoult, D

    2014-01-01

    A cluster of seven human cases of listeriosis occurred in Austria and in Germany between April 2011 and July 2013. The Listeria monocytogenes serovar (SV) 1/2b isolates shared pulsed-field gel electrophoresis (PFGE) and fluorescent amplified fragment length polymorphism (fAFLP) patterns indistinguishable from those from five food producers. The seven human isolates, a control strain with a different PFGE/fAFLP profile and ten food isolates were subjected to whole genome sequencing (WGS) in a blinded fashion. A gene-by-gene comparison (multilocus sequence typing (MLST)+) was performed, and the resulting whole genome allelic profiles were compared using SeqSphere+ software version 1.0. On analysis of 2298 genes, the four human outbreak isolates from 2012 to 2013 had different alleles at ≤6 genes, i.e. differed by ≤6 genes from each other; the dendrogram placed these isolates in between five Austrian unaged soft cheese isolates from producer A (≤19-gene difference from the human cluster) and two Austrian ready-to-eat meat isolates from producer B (≤8-gene difference from the human cluster). Both food products appeared on grocery bills prospectively collected by these outbreak cases after hospital discharge. Epidemiological results on food consumption and MLST+ clearly separated the three cases in 2011 from the four 2012–2013 outbreak cases (≥48 different genes). We showed that WGS is capable of discriminating L. monocytogenes SV1/2b clones not distinguishable by PFGE and fAFLP. The listeriosis outbreak described clearly underlines the potential of sequence-based typing methods to offer enhanced resolution and comparability of typing systems for public health applications. PMID:24698214

  1. Whole genome sequencing as a tool to investigate a cluster of seven cases of listeriosis in Austria and Germany, 2011-2013.

    Science.gov (United States)

    Schmid, D; Allerberger, F; Huhulescu, S; Pietzka, A; Amar, C; Kleta, S; Prager, R; Preußel, K; Aichinger, E; Mellmann, A

    2014-05-01

    A cluster of seven human cases of listeriosis occurred in Austria and in Germany between April 2011 and July 2013. The Listeria monocytogenes serovar (SV) 1/2b isolates shared pulsed-field gel electrophoresis (PFGE) and fluorescent amplified fragment length polymorphism (fAFLP) patterns indistinguishable from those from five food producers. The seven human isolates, a control strain with a different PFGE/fAFLP profile and ten food isolates were subjected to whole genome sequencing (WGS) in a blinded fashion. A gene-by-gene comparison (multilocus sequence typing (MLST)+) was performed, and the resulting whole genome allelic profiles were compared using SeqSphere(+) software version 1.0. On analysis of 2298 genes, the four human outbreak isolates from 2012 to 2013 had different alleles at ≤6 genes, i.e. differed by ≤6 genes from each other; the dendrogram placed these isolates in between five Austrian unaged soft cheese isolates from producer A (≤19-gene difference from the human cluster) and two Austrian ready-to-eat meat isolates from producer B (≤8-gene difference from the human cluster). Both food products appeared on grocery bills prospectively collected by these outbreak cases after hospital discharge. Epidemiological results on food consumption and MLST+ clearly separated the three cases in 2011 from the four 2012-2013 outbreak cases (≥48 different genes). We showed that WGS is capable of discriminating L. monocytogenes SV1/2b clones not distinguishable by PFGE and fAFLP. The listeriosis outbreak described clearly underlines the potential of sequence-based typing methods to offer enhanced resolution and comparability of typing systems for public health applications. © 2014 The Authors Clinical Microbiology and Infection © 2014 European Society of Clinical Microbiology and Infectious Diseases.

  2. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-10-24

    Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity

  3. Whole genome methylation profiles as independent markers of survival in stage IIIC melanoma patients

    Directory of Open Access Journals (Sweden)

    Sigalotti Luca

    2012-09-01

    Full Text Available Abstract Background The clinical course of cutaneous melanoma (CM can differ significantly for patients with identical stages of disease, defined clinico-pathologically, and no molecular markers differentiate patients with such a diverse prognosis. This study aimed to define the prognostic value of whole genome DNA methylation profiles in stage III CM. Methods Genome-wide methylation profiles were evaluated by the Illumina Human Methylation 27 BeadChip assay in short-term neoplastic cell cultures from 45 stage IIIC CM patients. Unsupervised K-means partitioning clustering was exploited to sort patients into 2 groups based on their methylation profiles. Methylation patterns related to the discovered groups were determined using the nearest shrunken centroid classification algorithm. The impact of genome-wide methylation patterns on overall survival (OS was assessed using Cox regression and Kaplan-Meier analyses. Results Unsupervised K-means partitioning by whole genome methylation profiles identified classes with significantly different OS in stage IIIC CM patients. Patients with a “favorable” methylation profile had increased OS (P = 0.001, log-rank = 10.2 by Kaplan-Meier analysis. Median OS of stage IIIC patients with a “favorable” vs. “unfavorable” methylation profile were 31.5 and 10.4 months, respectively. The 5 year OS for stage IIIC patients with a “favorable” methylation profile was 41.2% as compared to 0% for patients with an “unfavorable” methylation profile. Among the variables examined by multivariate Cox regression analysis, classification defined by methylation profile was the only predictor of OS (Hazard Ratio = 2.41, for “unfavorable” methylation profile; 95% Confidence Interval: 1.02-5.70; P = 0.045. A 17 gene methylation signature able to correctly assign prognosis (overall error rate = 0 in stage IIIC patients on the basis of distinct methylation-defined groups was also identified

  4. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic

  5. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    Directory of Open Access Journals (Sweden)

    Alexander T Dilthey

    2016-10-01

    Full Text Available Genetic variation at the Human Leucocyte Antigen (HLA genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG framework. First, we construct a PRG for 46 (mostly HLA genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1 and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data. Of 158 alleles tested, we correctly infer 157 alleles (99.4%. We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample remain a

  6. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy.

    Science.gov (United States)

    Bouwman, Aniek C; Veerkamp, Roel F

    2014-10-03

    The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken breeders, who have to choose wisely how to spend their sequencing efforts over all the breeds or lines they evaluate. Sequence data from cattle breeds was used, because there are currently relatively many individuals from several breeds sequenced within the 1,000 Bull Genomes project. The advantage of whole-genome sequence data is that it carries the causal mutations, but the question is whether it is possible to impute the causal variants accurately. This study therefore focussed on imputation accuracy of variants with low minor allele frequency and breed specific variants. Imputation accuracy was assessed for chromosome 1 and 29 as the correlation between observed and imputed genotypes. For chromosome 1, the average imputation accuracy was 0.70 with a reference population of 20 Holstein, and increased to 0.83 when the reference population was increased by including 3 other dairy breeds with 20 animals each. When the same amount of animals from the Holstein breed were added the accuracy improved to 0.88, while adding the 3 other breeds to the reference population of 80 Holstein improved the average imputation accuracy marginally to 0.89. For chromosome 29, the average imputation accuracy was lower. Some variants benefitted from the inclusion of other breeds in the reference population, initially determined by the MAF of the variant in each breed, but even Holstein specific variants did gain imputation accuracy from the multi-breed reference population. This study shows that splitting sequencing effort over multiple breeds and combining the reference populations is a good strategy for imputation from high-density SNP panels towards whole-genome sequence when reference

  7. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    Science.gov (United States)

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available

  8. Whole-Genome Thermodynamic Analysis Reduces siRNA Off-Target Effects

    Science.gov (United States)

    Chen, Xi; Liu, Peng; Chou, Hui-Hsien

    2013-01-01

    Small interfering RNAs (siRNAs) are important tools for knocking down targeted genes, and have been widely applied to biological and biomedical research. To design siRNAs, two important aspects must be considered: the potency in knocking down target genes and the off-target effect on any nontarget genes. Although many studies have produced useful tools to design potent siRNAs, off-target prevention has mostly been delegated to sequence-level alignment tools such as BLAST. We hypothesize that whole-genome thermodynamic analysis can identify potential off-targets with higher precision and help us avoid siRNAs that may have strong off-target effects. To validate this hypothesis, two siRNA sets were designed to target three human genes IDH1, ITPR2 and TRIM28. They were selected from the output of two popular siRNA design tools, siDirect and siDesign. Both siRNA design tools have incorporated sequence-level screening to avoid off-targets, thus their output is believed to be optimal. However, one of the sets we tested has off-target genes predicted by Picky, a whole-genome thermodynamic analysis tool. Picky can identify off-target genes that may hybridize to a siRNA within a user-specified melting temperature range. Our experiments validated that some off-target genes predicted by Picky can indeed be inhibited by siRNAs. Similar experiments were performed using commercially available siRNAs and a few off-target genes were also found to be inhibited as predicted by Picky. In summary, we demonstrate that whole-genome thermodynamic analysis can identify off-target genes that are missed in sequence-level screening. Because Picky prediction is deterministic according to thermodynamics, if a siRNA candidate has no Picky predicted off-targets, it is unlikely to cause off-target effects. Therefore, we recommend including Picky as an additional screening step in siRNA design. PMID:23484018

  9. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Ahrenfeldt, Johanne; Bellod Cisneros, Jose Luis

    2016-01-01

    web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes...... and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services...... and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https...

  10. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  11. A Danish Salmonella Bareilly outbreak investigated by the use of whole genome sequencing

    DEFF Research Database (Denmark)

    Torpdahl, M.; Kiil, K.; Litrup, E.

    2013-01-01

    In 2012, we saw an increase of the Salmonella serotype Bareilly isolated from human infections. Bareilly is a rare serotype in Denmark, isolated from human infections between 2 and 9 times annually over the last 10 years. As a routine in rare serotypes, we use PFGE as the molecular method...... and broilers differed by two bands When using PFGE in outbreak investigation there are some interpretative implications that have to be considered. There are differences on how important band changes are when defining clusters of different serotypes. Some outbreaks have been reported to include PFGE profiles...... with several band changes and others are defined by one PFGE profile thereby excluding closely related profiles. We decided to investigate whether whole genome sequencing (WGS) could resolve this issue and be useful in outbreak investigations. Several analyses were performed, including a SNP tree based...

  12. Whole-genome sequencing of a malignant granular cell tumor with metabolic response to pazopanib

    Science.gov (United States)

    Wei, Lei; Liu, Song; Conroy, Jeffrey; Wang, Jianmin; Papanicolau-Sengos, Antonios; Glenn, Sean T.; Murakami, Mitsuko; Liu, Lu; Hu, Qiang; Conroy, Jacob; Miles, Kiersten Marie; Nowak, David E.; Liu, Biao; Qin, Maochun; Bshara, Wiam; Omilian, Angela R.; Head, Karen; Bianchi, Michael; Burgher, Blake; Darlak, Christopher; Kane, John; Merzianu, Mihai; Cheney, Richard; Fabiano, Andrew; Salerno, Kilian; Talati, Chetasi; Khushalani, Nikhil I.; Trump, Donald L.; Johnson, Candace S.; Morrison, Carl D.

    2015-01-01

    Granular cell tumors are an uncommon soft tissue neoplasm. Malignant granular cell tumors comprise T transitions, particularly when immediately preceded by a 5′ G. A loss-of-function mutation was detected in a newly recognized tumor suppressor candidate, BRD7. No mutations were found in known targets of pazopanib. However, we identified a receptor tyrosine kinase pathway mutation in GFRA2 that warrants further evaluation. To the best of our knowledge, this is only the second reported case of a malignant granular cell tumor exhibiting a response to pazopanib, and the first whole-genome sequencing of this uncommon tumor type. The findings provide insight into the genetic basis of malignant granular cell tumors and identify potential targets for further investigation. PMID:27148567

  13. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation

    DEFF Research Database (Denmark)

    Zhao, Shancen; Zheng, Pingping; Dong, Shanshan

    2013-01-01

    The panda lineage dates back to the late Miocene and ultimately leads to only one extant species, the giant panda (Ailuropoda melanoleuca). Although global climate change and anthropogenic disturbances are recognized to shape animal population demography their contribution to panda population...... dynamics remains largely unknown. We sequenced the whole genomes of 34 pandas at an average 4.7-fold coverage and used this data set together with the previously deep-sequenced panda genome to reconstruct a continuous demographic history of pandas from their origin to the present. We identify two...... panda populations that show genetic adaptation to their environments. However, in all three populations, anthropogenic activities have negatively affected pandas for 3,000 years....

  14. Whole-genome regression and prediction methods applied to plant and animal breeding.

    Science.gov (United States)

    de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L

    2013-02-01

    Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.

  15. Clinical decision support for whole genome sequence information leveraging a service-oriented architecture: a prototype.

    Science.gov (United States)

    Welch, Brandon M; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

    2014-01-01

    Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time.

  16. Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Nielsen, Eva M.; Kaas, Rolf Sommer

    2014-01-01

    Salmonella enterica is a common cause of minor and large food borne outbreaks. To achieve successful and nearly ‘real-time’ monitoring and identification of outbreaks, reliable sub-typing is essential. Whole genome sequencing (WGS) shows great promises for using as a routine epidemiological typing...... analyses were also compared to PFGE reveling that WGS typing achieved the greater performance than the traditional method. In conclusion, for S. Typhimurium, SNP analysis and nucleotide difference approach of WGS data seem to be the superior methods for epidemiological typing compared to other phylogenetic...... analytic approaches that may be used on WGS. These approaches were also superior to the more classical typing method, PFGE. Our study also indicates that WGS alone is insufficient to determine whether strains are related or un-related to outbreaks. This still requires the combination of epidemiological...

  17. Bioinformatics Workflow for Clinical Whole Genome Sequencing at Partners HealthCare Personalized Medicine

    Directory of Open Access Journals (Sweden)

    Ellen A. Tsai

    2016-02-01

    Full Text Available Effective implementation of precision medicine will be enhanced by a thorough understanding of each patient’s genetic composition to better treat his or her presenting symptoms or mitigate the onset of disease. This ideally includes the sequence information of a complete genome for each individual. At Partners HealthCare Personalized Medicine, we have developed a clinical process for whole genome sequencing (WGS with application in both healthy individuals and those with disease. In this manuscript, we will describe our bioinformatics strategy to efficiently process and deliver genomic data to geneticists for clinical interpretation. We describe the handling of data from FASTQ to the final variant list for clinical review for the final report. We will also discuss our methodology for validating this workflow and the cost implications of running WGS.

  18. SPlinted Ligation Adapter Tagging (SPLAT), a novel library preparation method for whole genome bisulphite sequencing

    Science.gov (United States)

    Manlig, Erika; Wahlberg, Per

    2017-01-01

    Abstract Sodium bisulphite treatment of DNA combined with next generation sequencing (NGS) is a powerful combination for the interrogation of genome-wide DNA methylation profiles. Library preparation for whole genome bisulphite sequencing (WGBS) is challenging due to side effects of the bisulphite treatment, which leads to extensive DNA damage. Recently, a new generation of methods for bisulphite sequencing library preparation have been devised. They are based on initial bisulphite treatment of the DNA, followed by adaptor tagging of single stranded DNA fragments, and enable WGBS using low quantities of input DNA. In this study, we present a novel approach for quick and cost effective WGBS library preparation that is based on splinted adaptor tagging (SPLAT) of bisulphite-converted single-stranded DNA. Moreover, we validate SPLAT against three commercially available WGBS library preparation techniques, two of which are based on bisulphite treatment prior to adaptor tagging and one is a conventional WGBS method. PMID:27899585

  19. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David

    2008-01-01

    the Clostridium tetani plasmid p88, where homologues for toxin genes can be easily visualized in other sequenced Clostridium genomes, and for a Clostridium botulinum genome, compared to 14 other Clostridium genomes. DNA structural information is also included in the atlas to visualize the DNA chromosomal context......The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced...... genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including...

  20. Accurate and robust prediction of genetic relationship from whole-genome sequences.

    Directory of Open Access Journals (Sweden)

    Hong Li

    Full Text Available Computing the genetic relationship between two humans is important to studies in genetics, genomics, genealogy, and forensics. Relationship algorithms may be sensitive to noise, such as that arising from sequencing errors or imperfect reference genomes. We developed an algorithm for estimation of genetic relationship by averaged blocks (GRAB that is designed for whole-genome sequencing (WGS data. GRAB segments the genome into blocks, calculates the fraction of blocks sharing identity, and then uses a classification tree to infer 1st- to 5th- degree relationships and unrelated individuals. We evaluated GRAB on simulated and real sequenced families, and compared it with other software. GRAB achieves similar performance, and does not require knowledge of population background or phasing. GRAB can be used in workflows for identifying unreported relationships, validating reported relationships in family-based studies, and detection of sample-tracking errors or duplicate inclusion. The software is available at familygenomics.systemsbiology.net/grab.

  1. Identification of genomic regions associated with female fertility in Danish Jersey using whole genome sequence data

    DEFF Research Database (Denmark)

    Höglund, Johanna; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2015-01-01

    sires from Denmark with official breeding values for female fertility traits. The association analyses were carried out in two steps: first the cattle genome was scanned for quantitative trait loci using a sire model for FTI using imputed whole genome sequence variants; second the significant...... (AIS), 56-day non-return rate (NRR), number of days from first to last insemination (IFL), and number of days between calving and first insemination (ICF). The objective of this study was to identify associations between sequence variants and fertility traits in Jersey cattle based on 1,225 Jersey...... for cows on BTA20, BTA23 and BTA25, IFL for heifers on BTA7 and QTL9-2 on BTA9, NRR for heifers on BTA7 and BTA23, and NRR for cows on BTA23. Conclusion: The genome wide association study presented here revealed 6 genomic regions associated with FTI. Screening these 6 QTL regions for the underlying female...

  2. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+").

    Science.gov (United States)

    Antwerpen, Markus H; Prior, Karola; Mellmann, Alexander; Höppner, Sebastian; Splettstoesser, Wolf D; Harmsen, Dag

    2015-01-01

    The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  3. Sequence variants from whole genome sequencing a large group of Icelanders.

    Science.gov (United States)

    Gudbjartsson, Daniel F; Sulem, Patrick; Helgason, Hannes; Gylfason, Arnaldur; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Kong, Augustine; Helgason, Agnar; Masson, Gisli; Magnusson, Olafur Th; Thorsteinsdottir, Unnur; Stefansson, Kari

    2015-01-01

    We have accumulated considerable data on the genetic makeup of the Icelandic population by sequencing the whole genomes of 2,636 Icelanders to depth of at least 10X and by chip genotyping 101,584 more. The sequencing was done with Illumina technology. The median sequencing depth was 20X and 909 individuals were sequenced to a depth of at least 30X. We found 20 million single nucleotide polymorphisms (SNPs) and 1.5 million insertions/deletions (indels) that passed stringent quality control. Almost all the common SNPs (derived allele frequency (DAF) over 2%) that we identified in Iceland have been observed by either dbSNP (build 137) or the Exome Sequencing Project (ESP) while only 60 and 20% of rare (DAFgenome, have been observed in the public databases. Features of our variant data, such as the transition/transversion ratio and the length distribution of indels, are similar to published reports.

  4. Whole genome sequence of Pantoea ananatis R100, an antagonistic bacterium isolated from rice seed.

    Science.gov (United States)

    Wu, Liwen; Liu, Ruifang; Niu, Yaofang; Lin, Haiyan; Ye, Weijun; Guo, Longbiao; Hu, Xingming

    2016-05-10

    Pantoea ananatis is a group of bacteria, which was first reported as plant pathogen. Recently, several papers also described its biocontrol ability. In 2003, P. ananatis R100, which showed strong antagonism against several plant pathogens, was isolated from rice seeds. In this study, whole genome sequence of this strain was determined by SMRT Cell technology. The total genome size of R100 is 4,857,861bp with 4659 coding genes (CDS), 82 tRNAs and 22 rRNAs. The genome sequence of R100 may shed a light on the research of antagonism P. ananatis. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  5. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation

    DEFF Research Database (Denmark)

    Michaelson, Jacob J.; Shi, Yujian; Gujral, Madhusudan

    2012-01-01

    investigated global patterns of germline mutation by whole-genome sequencing of monozygotic twins concordant for ASD and their parents. Mutation rates varied widely throughout the genome (by 100-fold) and could be explained by intrinsic characteristics of DNA sequence and chromatin structure. Dense clusters...... of mutations within individual genomes were attributable to compound mutation or gene conversion. Hypermutability was a characteristic of genes involved in ASD and other diseases. In addition, genes impacted by mutations in this study were associated with ASD in independent exome-sequencing data sets. Our......De novo mutation plays an important role in autism spectrum disorders (ASDs). Notably, pathogenic copy number variants (CNVs) are characterized by high mutation rates. We hypothesize that hypermutability is a property of ASD genes and may also include nucleotide-substitution hot spots. We...

  6. Revisiting the genotyping scheme for varicella-zoster viruses based on whole-genome comparisons.

    Science.gov (United States)

    Jensen, Nancy J; Rivailler, Pierre; Tseng, Hung Fu; Quinlivan, Mark L; Radford, Kay; Folster, Jennifer; Harpaz, Rafael; LaRussa, Philip; Jacobsen, Steven; Scott Schmid, D

    2017-06-01

    We report whole-genome sequences (WGSs) for four varicella-zoster virus (VZV) samples from a shingles study conducted by Kaiser Permanente of Southern California. Comparative genomics and phylogenetic analysis of all published VZV WGSs revealed that strain KY037798 is in clade IX, which shall henceforth be designated clade 9. Previously published single nucleotide polymorphisms (SNP)-based genotyping schemes fail to discriminate between clades 6 and VIII and employ positions that are not clade-specific. We provide an updated list of clade-specific positions that supersedes the list determined at the 2008 VZV nomenclature meeting. Finally, we propose a new targeted genotyping scheme that will discriminate the circulating VZV clades with at least a twofold redundancy. Genotyping strategies using a limited set of targeted SNPs will continue to provide an efficient 'first pass' method for VZV strain surveillance as vaccination programmes for varicella and zoster influence the dynamics of VZV transmission.

  7. A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter

    Directory of Open Access Journals (Sweden)

    Samuel K. Sheppard

    2012-04-01

    Full Text Available Campylobacteriosis remains a major human public health problem world-wide. Genetic analyses of Campylobacter isolates, and particularly molecular epidemiology, have been central to the study of this disease, particularly the characterization of Campylobacter genotypes isolated from human infection, farm animals, and retail food. These studies have demonstrated that Campylobacter populations are highly structured, with distinct genotypes associated with particular wild or domestic animal sources, and that chicken meat is the most likely source of most human infection in countries such as the UK. The availability of multiple whole genome sequences from Campylobacter isolates presents the prospect of identifying those genes or allelic variants responsible for host-association and increased human disease risk, but the diversity of Campylobacter genomes present challenges for such analyses. We present a gene-by-gene approach for investigating the genetic basis of phenotypes in diverse bacteria such as Campylobacter, implemented with the BIGSdb software on the pubMLST.org/campylobacter website.

  8. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates.

    Science.gov (United States)

    Berthelot, Camille; Brunet, Frédéric; Chalopin, Domitille; Juanchich, Amélie; Bernard, Maria; Noël, Benjamin; Bento, Pascal; Da Silva, Corinne; Labadie, Karine; Alberti, Adriana; Aury, Jean-Marc; Louis, Alexandra; Dehais, Patrice; Bardou, Philippe; Montfort, Jérôme; Klopp, Christophe; Cabau, Cédric; Gaspin, Christine; Thorgaard, Gary H; Boussaha, Mekki; Quillet, Edwige; Guyomard, René; Galiana, Delphine; Bobe, Julien; Volff, Jean-Nicolas; Genêt, Carine; Wincker, Patrick; Jaillon, Olivier; Roest Crollius, Hugues; Guiguen, Yann

    2014-04-22

    Vertebrate evolution has been shaped by several rounds of whole-genome duplications (WGDs) that are often suggested to be associated with adaptive radiations and evolutionary innovations. Due to an additional round of WGD, the rainbow trout genome offers a unique opportunity to investigate the early evolutionary fate of a duplicated vertebrate genome. Here we show that after 100 million years of evolution the two ancestral subgenomes have remained extremely collinear, despite the loss of half of the duplicated protein-coding genes, mostly through pseudogenization. In striking contrast is the fate of miRNA genes that have almost all been retained as duplicated copies. The slow and stepwise rediploidization process characterized here challenges the current hypothesis that WGD is followed by massive and rapid genomic reorganizations and gene deletions.

  9. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences

    KAUST Repository

    Coll, Francesc

    2015-05-27

    Mycobacterium tuberculosis drug resistance (DR) challenges effective tuberculosis disease control. Current molecular tests examine limited numbers of mutations, and although whole genome sequencing approaches could fully characterise DR, data complexity has restricted their clinical application. A library (1,325 mutations) predictive of DR for 15 anti-tuberculosis drugs was compiled and validated for 11 of them using genomic-phenotypic data from 792 strains. A rapid online ‘TB-Profiler’ tool was developed to report DR and strain-type profiles directly from raw sequences. Using our DR mutation library, in silico diagnostic accuracy was superior to some commercial diagnostics and alternative databases. The library will facilitate sequence-based drug-susceptibility testing.

  10. Tracing Mycobacterium tuberculosis transmission by whole genome sequencing in a high incidence setting

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Soborg, B; Koch, A

    2016-01-01

    In East Greenland, a dramatic increase of tuberculosis (TB) incidence has been observed in recent years. Classical genotyping suggests a genetically similar Mycobacterium tuberculosis (Mtb) strain population as cause, however, precise transmission patterns are unclear. We performed whole genome...... sequencing (WGS) of Mtb isolates from 98% of culture-positive TB cases through 21 years (n = 182) which revealed four genomic clusters of the Euro-American lineage (mainly sub-lineage 4.8 (n = 134)). The time to the most recent common ancestor of lineage 4.8 strains was found to be 100 years. This sub...... and the uniformity of circulating Mtb strains indicated that the majority of East Greenlandic TB cases originated from one or few strains introduced within the last century. Thereby, the study shows the consequences of even short interruptions in TB control efforts in previously TB high incidence areas...

  11. Real-Time Whole-Genome Sequencing for Surveillance of Listeria monocytogenes, France.

    Science.gov (United States)

    Moura, Alexandra; Tourdjman, Mathieu; Leclercq, Alexandre; Hamelin, Estelle; Laurent, Edith; Fredriksen, Nathalie; Van Cauteren, Dieter; Bracq-Dieye, Hélène; Thouvenot, Pierre; Vales, Guillaume; Tessaud-Rita, Nathalie; Maury, Mylène M; Alexandru, Andreea; Criscuolo, Alexis; Quevillon, Emmanuel; Donguy, Marie-Pierre; Enouf, Vincent; de Valk, Henriette; Brisse, Sylvain; Lecuit, Marc

    2017-09-01

    During 2015-2016, we evaluated the performance of whole-genome sequencing (WGS) as a routine typing tool. Its added value for microbiological and epidemiologic surveillance of listeriosis was compared with that for pulsed-field gel electrophoresis (PFGE), the current standard method. A total of 2,743 Listeria monocytogenes isolates collected as part of routine surveillance were characterized in parallel by PFGE and core genome multilocus sequence typing (cgMLST) extracted from WGS. We investigated PFGE and cgMLST clusters containing human isolates. Discrimination of isolates was significantly higher by cgMLST than by PFGE (pWGS-based typing should replace PFGE as the primary typing method for L. monocytogenes.

  12. Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs)

    Science.gov (United States)

    Sims, Gregory E.; Kim, Sung-Hou

    2011-01-01

    A whole-genome phylogeny of the Escherichia coli/Shigella group was constructed by using the feature frequency profile (FFP) method. This alignment-free approach uses the frequencies of l-mer features of whole genomes to infer phylogenic distances. We present two phylogenies that accentuate different aspects of E. coli/Shigella genomic evolution: (i) one based on the compositions of all possible features of length l = 24 (∼8.4 million features), which are likely to reveal the phenetic grouping and relationship among the organisms and (ii) the other based on the compositions of core features with low frequency and low variability (∼0.56 million features), which account for ∼69% of all commonly shared features among 38 taxa examined and are likely to have genome-wide lineal evolutionary signal. Shigella appears as a single clade when all possible features are used without filtering of noncore features. However, results using core features show that Shigella consists of at least two distantly related subclades, implying that the subclades evolved into a single clade because of a high degree of convergence influenced by mobile genetic elements and niche adaptation. In both FFP trees, the basal group of the E. coli/Shigella phylogeny is the B2 phylogroup, which contains primarily uropathogenic strains, suggesting that the E. coli/Shigella ancestor was likely a facultative or opportunistic pathogen. The extant commensal strains diverged relatively late and appear to be the result of reductive evolution of genomes. We also identify clade distinguishing features and their associated genomic regions within each phylogroup. Such features may provide useful information for understanding evolution of the groups and for quick diagnostic identification of each phylogroup. PMID:21536867

  13. Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing.

    Science.gov (United States)

    Park, Doori; Jung, Je Won; Choi, Beom-Soon; Jayakodi, Murukarthick; Lee, Jeongsoo; Lim, Jongsung; Yu, Yeisoo; Choi, Yong-Soo; Lee, Myeong-Lyeol; Park, Yoonseong; Choi, Ik-Young; Yang, Tae-Jin; Edwards, Owain R; Nah, Gyoungju; Kwon, Hyung Wook

    2015-01-02

    The honey bee is an important model system for increasing understanding of molecular and neural mechanisms underlying social behaviors relevant to the agricultural industry and basic science. The western honey bee, Apis mellifera, has served as a model species, and its genome sequence has been published. In contrast, the genome of the Asian honey bee, Apis cerana, has not yet been sequenced. A. cerana has been raised in Asian countries for thousands of years and has brought considerable economic benefits to the apicultural industry. A cerana has divergent biological traits compared to A. mellifera and it has played a key role in maintaining biodiversity in eastern and southern Asia. Here we report the first whole genome sequence of A. cerana. Using de novo assembly methods, we produced a 238 Mbp draft of the A. cerana genome and generated 10,651 genes. A.cerana-specific genes were analyzed to better understand the novel characteristics of this honey bee species. Seventy-two percent of the A. cerana-specific genes had more than one GO term, and 1,696 enzymes were categorized into 125 pathways. Genes involved in chemoreception and immunity were carefully identified and compared to those from other sequenced insect models. These included 10 gustatory receptors, 119 odorant receptors, 10 ionotropic receptors, and 160 immune-related genes. This first report of the whole genome sequence of A. cerana provides resources for comparative sociogenomics, especially in the field of social insect communication. These important tools will contribute to a better understanding of the complex behaviors and natural biology of the Asian honey bee and to anticipate its future evolutionary trajectory.

  14. Views of American OB/GYNs on the ethics of prenatal whole-genome sequencing.

    Science.gov (United States)

    Bayefsky, Michelle J; White, Amina; Wakim, Paul; Hull, Sara Chandros; Wasserman, David; Chen, Stephanie; Berkman, Benjamin E

    2016-12-01

    Given public demand for genetic information, the potential to perform prenatal whole-genome sequencing (PWGS) non-invasively in the future, and decreasing costs of whole-genome sequencing, it is likely that OB/GYN practice will include PWGS. The goal of this project was to explore OB/GYNs' views on the ethical issues surrounding PWGS and their preparedness for counseling patients on its use. A national survey was administered to 2500 members of American Congress of Obstetricians and Gynecologists. A total of 1114 respondents completed the survey (response rate = 45%). OB/GYNs are most concerned with ordering non-medical fetal genetic information, are worried about increasing parental anxiety, and feel it is appropriate to be directive when counseling parents about PWGS. Furthermore, most OB/GYNs have limited knowledge of genetics, rely heavily on genetic counselors and would like more guidance regarding the clinical adoption of PWGS. OB/GYNs do not completely accept or reject PWGS, but a substantial number have significant ethical and practical concerns. They are most concerned with issues that will directly affect their practices and interactions with patients, such as increasing parental anxiety and costs of care. Professional guidance would be instrumental in directing the adoption of PWGS and alleviating the ethical burden posed by PWGS on individual OB/GYNs. Published 2016. This article is a U.S. Government work and is in the public domain in the USA. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

  15. Whole genome investigation of a divergent clade of the pathogen Streptococcus suis

    Directory of Open Access Journals (Sweden)

    Abiyad eBaig

    2015-11-01

    Full Text Available Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN and cpn60 did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70, of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species.

  16. Practical Issues in Implementing Whole-Genome-Sequencing in Routine Diagnostic Microbiology.

    Science.gov (United States)

    Rossen, John W A; Friedrich, Alexander W; Moran-Gilad, Jacob

    2017-11-05

    next generation sequencing (NGS) is increasingly being used in clinical microbiology. Like every new technology that is being adopted in microbiology, the integration of NGS into clinical and routine workflows needs to be carefully managed. to review the practical aspects of implementing bacterial whole genome sequencing (WGS) in routine diagnostic laboratories. review of the literature and expert opinion. in this review, we discuss when and how to integrate whole genome sequencing (WGS) in the routine workflow of the clinical laboratory. In addition, as the microbiology laboratories have to adhere to various national and international regulations and criteria for their accreditation, we deliberate on quality control issues for using WGS in microbiology, including the importance of proficiency testing. Furthermore, the current and future place of this technology in the diagnostic hierarchy of microbiology is described as well as the necessity of maintaining backwards compatibility with already established methods. Finally, we speculate on the question whether WGS can entirely replace routine microbiology in the future and the tension between the fact that most sequencers are designed to process multiple samples in parallel whereas for optimal diagnosis a one-by-one processing of the samples is preferred. Special reference is made to the cost and turnaround time of WGS in diagnostic laboratories. further development is required to improve the workflow for WGS, particularly shorten the turnaround time, reduce costs and streamline downstream data analyses. Only when these processes will reach maturity, reliance on WGS for routine patient management and infection control management will become feasible, enabling the transformation of clinical microbiology into a genome-based and personalised diagnostic field. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  17. The Whole-Genome and Transcriptome of the Manila Clam (Ruditapes philippinarum).

    Science.gov (United States)

    Mun, Seyoung; Kim, Yun-Ji; Markkandan, Kesavan; Shin, Wonseok; Oh, Sumin; Woo, Jiyoung; Yoo, Jongsu; An, Hyesuck; Han, Kyudong

    2017-06-01

    The manila clam, Ruditapes philippinarum, is an important bivalve species in worldwide aquaculture including Korea. The aquaculture production of R. philippinarum is under threat from diverse environmental factors including viruses, microorganisms, parasites, and water conditions with subsequently declining production. In spite of its importance as a marine resource, the reference genome of R. philippinarum for comprehensive genetic studies is largely unexplored. Here, we report the de novo whole-genome and transcriptome assembly of R. philippinarum across three different tissues (foot, gill, and adductor muscle), and provide the basic data for advanced studies in selective breeding and disease control in order to obtain successful aquaculture systems. An approximately 2.56 Gb high quality whole-genome was assembled with various library construction methods. A total of 108,034 protein coding gene models were predicted and repetitive elements including simple sequence repeats and noncoding RNAs were identified to further understanding of the genetic background of R. philippinarum for genomics-assisted breeding. Comparative analysis with the bivalve marine invertebrates uncover that the gene family related to complement C1q was enriched. Furthermore, we performed transcriptome analysis with three different tissues in order to support genome annotation and then identified 41,275 transcripts which were annotated. The R. philippinarum genome resource will markedly advance a wide range of potential genetic studies, a reference genome for comparative analysis of bivalve species and unraveling mechanisms of biological processes in molluscs. We believe that the R. philippinarum genome will serve as an initial platform for breeding better-quality clams using a genomic approach. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly

    Science.gov (United States)

    Li, Heng

    2012-01-01

    Motivation: Eugene Myers in his string graph paper suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion (INDEL) calling, can also be achieved with unitigs. Results: To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly can be a beneficial complement to the standard variant calling pipeline for whole-genome resequencing. In the methodological aspects, we propose FMD-index for forward–backward extension of DNA sequences, a fast algorithm for finding all super-maximal exact matches and one-pass construction of unitigs from an FMD-index. Availability: http://github.com/lh3/fermi Contact: hengli@broadinstitute.org PMID:22569178

  19. Whole genome sequencing and methylome analysis of the wild guinea pig.

    Science.gov (United States)

    Weyrich, Alexandra; Schüllermann, Tino; Heeger, Felix; Jeschek, Marie; Mazzoni, Camila J; Chen, Wei; Schumann, Kathrin; Fickel, Joerns

    2014-11-28

    DNA methylation is a heritable mechanism that acts in response to environmental changes, lifestyle and diseases by influencing gene expression in eukaryotes. Epigenetic studies of wild organisms are mandatory to understand their role in e.g. adaptational processes in the great variety of ecological niches. However, strategies to address those questions on a methylome scale are widely missing. In this study we present such a strategy and describe a whole genome sequence and methylome analysis of the wild guinea pig. We generated a full Wild guinea pig (Cavia aperea) genome sequence with enhanced coverage of methylated regions, benefiting from the available sequence of the domesticated relative Cavia porcellus. This new genome sequence was then used as reference to map the sequence reads of bisulfite treated Wild guinea pig sequencing libraries to investigate DNA-methylation patterns at nucleotide-specific level, by using our here described method, named 'DNA-enrichment-bisulfite-sequencing' (MEBS). The results achieved using MEBS matched those of standard methods in other mammalian model species. The technique is cost efficient, and incorporates both methylation enrichment results and a nucleotide-specific resolution even without a whole genome sequence available. Thus MEBS can be easily applied to extend methylation enrichment studies to a nucleotide-specific level. The approach is suited to study methylomes of not yet sequenced mammals at single nucleotide resolution. The strategy is transferable to other mammalian species by applying the nuclear genome sequence of a close relative. It is therefore of interest for studies on a variety of wild species trying to answer evolutionary, adaptational, ecological or medical questions by epigenetic mechanisms.

  20. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    Science.gov (United States)

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  1. Light whole genome sequence for SNP discovery across domestic cat breeds

    Directory of Open Access Journals (Sweden)

    Driscoll Carlos

    2010-06-01

    Full Text Available Abstract Background The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline coronavirus -- FECV, feline immunodeficiency virus - FIV that are homologues to human scourges (cancer, SARS, and AIDS respectively. However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP map is required in order to accomplish disease and phenotype association discovery. Description To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. Conclusions These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.

  2. Whole-genome array CGH evaluation for replacing prenatal karyotyping in Hong Kong.

    Directory of Open Access Journals (Sweden)

    Anita S Y Kan

    Full Text Available OBJECTIVE: To evaluate the effectiveness of whole-genome array comparative genomic hybridization (aCGH in prenatal diagnosis in Hong Kong. METHODS: Array CGH was performed on 220 samples recruited prospectively as the first-tier test study. In addition 150 prenatal samples with abnormal fetal ultrasound findings found to have normal karyotypes were analyzed as a 'further-test' study using NimbleGen CGX-135K oligonucleotide arrays. RESULTS: Array CGH findings were concordant with conventional cytogenetic results with the exception of one case of triploidy. It was found in the first-tier test study that aCGH detected 20% (44/220 clinically significant copy number variants (CNV, of which 21 were common aneuploidies and 23 had other chromosomal imbalances. There were 3.2% (7/220 samples with CNVs detected by aCGH but not by conventional cytogenetics. In the 'further-test' study, the additional diagnostic yield of detecting chromosome imbalance was 6% (9/150. The overall detection for CNVs of unclear clinical significance was 2.7% (10/370 with 0.9% found to be de novo. Eleven loci of common CNVs were found in the local population. CONCLUSION: Whole-genome aCGH offered a higher resolution diagnostic capacity than conventional karyotyping for prenatal diagnosis either as a first-tier test or as a 'further-test' for pregnancies with fetal ultrasound anomalies. We propose replacing conventional cytogenetics with aCGH for all pregnancies undergoing invasive diagnostic procedures after excluding common aneuploidies and triploidies by quantitative fluorescent PCR. Conventional cytogenetics can be reserved for visualization of clinically significant CNVs.

  3. A model for carbohydrate metabolism in the diatom Phaeodactylum tricornutum deduced from comparative whole genome analysis.

    Directory of Open Access Journals (Sweden)

    Peter G Kroth

    Full Text Available BACKGROUND: Diatoms are unicellular algae responsible for approximately 20% of global carbon fixation. Their evolution by secondary endocytobiosis resulted in a complex cellular structure and metabolism compared to algae with primary plastids. METHODOLOGY/PRINCIPAL FINDINGS: The whole genome sequence of the diatom Phaeodactylum tricornutum has recently been completed. We identified and annotated genes for enzymes involved in carbohydrate pathways based on extensive EST support and comparison to the whole genome sequence of a second diatom, Thalassiosira pseudonana. Protein localization to mitochondria was predicted based on identified similarities to mitochondrial localization motifs in other eukaryotes, whereas protein localization to plastids was based on the presence of signal peptide motifs in combination with plastid localization motifs previously shown to be required in diatoms. We identified genes potentially involved in a C4-like photosynthesis in P. tricornutum and, on the basis of sequence-based putative localization of relevant proteins, discuss possible differences in carbon concentrating mechanisms and CO(2 fixation between the two diatoms. We also identified genes encoding enzymes involved in photorespiration with one interesting exception: glycerate kinase was not found in either P. tricornutum or T. pseudonana. Various Calvin cycle enzymes were found in up to five different isoforms, distributed between plastids, mitochondria and the cytosol. Diatoms store energy either as lipids or as chrysolaminaran (a beta-1,3-glucan outside of the plastids. We identified various beta-glucanases and large membrane-bound glucan synthases. Interestingly most of the glucanases appear to contain C-terminal anchor domains that may attach the enzymes to membranes. CONCLUSIONS/SIGNIFICANCE: Here we present a detailed synthesis of carbohydrate metabolism in diatoms based on the genome sequences of Thalassiosira pseudonana and Phaeodactylum tricornutum

  4. Whole genomic constellation of the first human G8 rotavirus strain detected in Japan.

    Science.gov (United States)

    Agbemabiese, Chantal Ama; Nakagomi, Toyoko; Doan, Yen Hai; Nakagomi, Osamu

    2015-10-01

    Human G8 Rotavirus A (RVA) strains are commonly detected in Africa but are rarely detected in Japan and elsewhere in the world. In this study, the whole genome sequence of the first human G8 RVA strain designated AU109 isolated in a child with acute gastroenteritis in 1994 was determined in order to understand how the strain was generated including the host species origin of its genes. The genotype constellation of AU109 was G8-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2. Phylogenetic analyses of the 11 genome segments revealed that its VP7 and VP1 genes were closely related to those of a Hungarian human G8P[14] RVA strain and these genes shared the most recent common ancestors in 1988 and 1982, respectively. AU109 possessed an NSP2 gene closely related to those of Chinese sheep and goat RVA strains. The remaining eight genome segments were closely related to Japanese human G2P[4] strains which circulated around 1985-1990. Bayesian evolutionary analyses revealed that the NSP2 gene of AU109 and those of the Chinese sheep and goat RVA strains diverged from a common ancestor around 1937. In conclusion, AU109 was generated through genetic reassortment event where Japanese DS-1-like G2P[4] strains circulating around 1985-1990 obtained the VP7, VP1 and NSP2 genes from unknown ruminant G8 RVA strains. These observations highlight the need for comprehensive examination of the whole genomes of RVA strains of less explored host species. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Whole-genome typing and characterization of blaVIM19-harbouring ST383 Klebsiella pneumoniae by PFGE, whole-genome mapping and WGS.

    Science.gov (United States)

    Sabirova, Julia S; Xavier, Basil Britto; Coppens, Jasmine; Zarkotou, Olympia; Lammens, Christine; Janssens, Lore; Burggrave, Ronald; Wagner, Trevor; Goossens, Herman; Malhotra-Kumar, Surbhi

    2016-06-01

    We utilized whole-genome mapping (WGM) and WGS to characterize 12 clinical carbapenem-resistant Klebsiella pneumoniae strains (TGH1-TGH12). All strains were screened for carbapenemase genes by PCR, and typed by MLST, PFGE (XbaI) and WGM (AflII) (OpGen, USA). WGS (Illumina) was performed on TGH8 and TGH10. Reads were de novo assembled and annotated [SPAdes, Rapid Annotation Subsystem Technology (RAST)]. Contigs were aligned directly, and after in silico AflII restriction, with corresponding WGMs (MapSolver, OpGen; BioNumerics, Applied Maths). All 12 strains were ST383. Of the 12 strains, 11 were carbapenem resistant, 7 harboured blaKPC-2 and 11 harboured blaVIM-19. Varying the parameters for assigning WGM clusters showed that these were comparable to STs and to the eight PFGE types or subtypes (difference of three or more bands). A 95% similarity coefficient assigned all 12 WGMs to a single cluster, whereas a 99% similarity coefficient (or ≥10 unmatched-fragment difference) assigned the 12 WGMs to eight (sub)clusters. Based on a difference of three or more bands between PFGE profiles, the Simpson's diversity indices (SDIs) of WGM (0.94, Jackknife pseudo-values CI: 0.883-0.996) and PFGE (0.93, Jackknife pseudo-values CI: 0.828-1.000) were similar (P = 0.649). However, the discriminatory power of WGM was significantly higher (SDI: 0.94, Jackknife pseudo-values CI: 0.883-0.996) than that of PFGE profiles typed on a difference of seven or more bands (SDI: 0.53, Jackknife pseudo-values CI: 0.212-0.849) (P = 0.007). This study demonstrates the application of WGM to understanding the epidemiology of hospital-associated K. pneumoniae. Utilizing a combination of WGM and WGS, we also present here the first longitudinal genomic characterization of the highly dynamic carbapenem-resistant ST383 K. pneumoniae clone that is rapidly gaining importance in Europe. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial

  6. A Site Specific Model And Analysis Of The Neutral Somatic Mutation Rate In Whole-Genome Cancer Data

    DEFF Research Database (Denmark)

    Bertl, Johanna; Guo, Qianyun; Rasmussen, Malene Juul

    2017-01-01

    Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation ra...

  7. Whole Genome Sequencing of High-Risk Families to Identify New Mutational Mechanisms of Breast Cancer Predisposition

    Science.gov (United States)

    2015-12-01

    principal discipline(s) of the project? Our approach integrated whole genome sequencing with experimental biology and with application and development of...pathogenicity of genetic variants. Bioinformatics. 31:761-763. 13 Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day IN, Gaunt TR, Campbell C. (2015

  8. ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Ye, Jia; Li, Songgang

    2005-01-01

    We describe an algorithm, ReAS, to recover ancestral sequences for transposable elements (TEs) from the unassembled reads of a whole genome shotgun. The main assumptions are that these TEs must exist at high copy numbers across the genome and must not be so old that they are no longer recognizable...

  9. Direct DNA Extraction from Mycobacterium tuberculosis Frozen Stocks as a Reculture-Independent Approach to Whole-Genome Sequencing

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Zallet, J; Lillebaek, T

    2015-01-01

    Culturing before DNA extraction represents a major time-consuming step in whole-genome sequencing of slow-growing bacteria, such as Mycobacterium tuberculosis. We report a workflow to extract DNA from frozen isolates without reculturing. Prepared libraries and sequence data were comparable...

  10. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their preci...

  11. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland.

    Science.gov (United States)

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten; Lefort, François

    2016-10-06

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture. Copyright © 2016 Crovadore et al.

  12. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus.

    Science.gov (United States)

    Lee, Yookyung; Lim, Sooyeon; Rhee, Moon-Soo; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-03-01

    Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000.

  13. Whole-genome sequence of Dermabacter vaginalis AD1-86T, isolated from vaginal fluid of Korean woman.

    Science.gov (United States)

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-12-01

    Dermabacter vaginalis AD1-86T was isolated from the vaginal fluid of a Korean woman. Whole genome sequencing analysis was conducted using a PacBio RS II platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_CP012117.

  14. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle

    NARCIS (Netherlands)

    Veerkamp, Roel F.; Bouwman, Aniek C.; Schrooten, Chris; Calus, Mario P.L.

    2016-01-01

    Background: Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a

  15. A whole genome sequence of ‘Candidatus Liberibacter asiaticus’ from Guangdong, China, where HLB was first described

    Science.gov (United States)

    Citrus Huanglongbing (HLB, yellow shoot disease) has been endemic in Guangdong Province, China, for >100 years. “Candidatus Liberibacter asiaticus” (CLas) is a putative pathogen of HLB and currently unculturable. Here, a draft whole genome sequence of CLas strain A4 from Guangdong is presented. Stra...

  16. Whole-Genome Sequences of Two Campylobacter coli Isolates from the Antimicrobial Resistance Monitoring Program in Colombia.

    Science.gov (United States)

    Bernal, Johan F; Donado-Godoy, Pilar; Valencia, María Fernanda; León, Maribel; Gómez, Yolanda; Rodríguez, Fernando; Agarwala, Richa; Landsman, David; Mariño-Ramírez, Leonardo

    2016-03-17

    Campylobacter coli, along with Campylobacter jejuni, is a major agent of gastroenteritis and acute enterocolitis in humans. We report the whole-genome sequences of two multidrug-resistance C. coli strains, isolated from the Colombian poultry chain. The isolates contain a variety of antimicrobial resistance genes for aminoglycosides, lincosamides, fluoroquinolones, and tetracycline. Copyright © 2016 Bernal et al.

  17. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Thorup Nielsen, Mette

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely...

  18. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections.

    Directory of Open Access Journals (Sweden)

    Pimlapas Leekitcharoenphon

    Full Text Available Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections.

  19. Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

    NARCIS (Netherlands)

    Pandit, Aridaman; de Boer, Rob J|info:eu-repo/dai/nl/074214152

    2014-01-01

    BACKGROUND: Following transmission, HIV-1 evolves into a diverse population, and next generation sequencing enables us to detect variants occurring at low frequencies. Studying viral evolution at the level of whole genomes was hitherto not possible because next generation sequencing delivers

  20. Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation

    OpenAIRE

    Jackson, Brendan R.; Tarr, Cheryl; Strain, Errol; Jackson, Kelly A.; Conrad, Amanda; Carleton, Heather; Katz, Lee S.; Stroika, Steven; Gould, L. Hannah; Mody, Rajal K.; Silk, Benjamin J.; Beal, Jennifer; Chen, Yi; Timme, Ruth; Doyle, Matthew

    2016-01-01

    Implementation of whole-genome sequencing (WGS)–based surveillance for Listeria monocytogenes in 2013 greatly improved detection and investigation of listeriosis outbreaks in the United States. Lessons from this intervention can guide WGS-based surveillance for other foodborne pathogens.

  1. Rapid Bacterial Whole-Genome Sequencing to Enhance Diagnostic and Public Health Microbiology

    Science.gov (United States)

    Reuter, Sandra; Ellington, Matthew J.; Cartwright, Edward J. P.; Köser, Claudio U.; Török, M. Estée; Gouliouris, Theodore; Harris, Simon R.; Brown, Nicholas M.; Holden, Matthew T. G.; Quail, Mike; Parkhill, Julian; Smith, Geoffrey P.; Bentley, Stephen D.; Peacock, Sharon J.

    2014-01-01

    IMPORTANCE The latest generation of benchtop DNA sequencing platforms can provide an accurate whole-genome sequence (WGS) for a broad range of bacteria in less than a day. These could be used to more effectively contain the spread of multidrug-resistant pathogens. OBJECTIVE To compare WGS with standard clinical microbiology practice for the investigation of nosocomial outbreaks caused by multidrug-resistant bacteria, the identification of genetic determinants of antimicrobial resistance, and typing of other clinically important pathogens. DESIGN, SETTING, AND PARTICIPANTS A laboratory-based study of hospital inpatients with a range of bacterial infections at Cambridge University Hospitals NHS Foundation Trust, a secondary and tertiary referral center in England, comparing WGS with standard diagnostic microbiology using stored bacterial isolates and clinical information. MAIN OUTCOMES AND MEASURES Specimens were taken and processed as part of routine clinical care, and cultured isolates stored and referred for additional reference laboratory testing as necessary. Isolates underwent DNA extraction and library preparation prior to sequencing on the Illumina MiSeq platform. Bioinformatic analyses were performed by persons blinded to the clinical, epidemiologic, and antimicrobial susceptibility data. RESULTS We investigated 2 putative nosocomial outbreaks, one caused by vancomycin-resistant Enterococcus faecium and the other by carbapenem-resistant Enterobacter cloacae; WGS accurately discriminated between outbreak and nonoutbreak isolates and was superior to conventional typing methods. We compared WGS with standard methods for the identification of the mechanism of carbapenem resistance in a range of gram-negative bacteria (Acinetobacter baumannii, E cloacae, Escherichia coli, and Klebsiella pneumoniae). This demonstrated concordance between phenotypic and genotypic results, and the ability to determine whether resistance was attributable to the presence of

  2. Effective normalization for copy number variation detection from whole genome sequencing.

    Science.gov (United States)

    Janevski, Angel; Varadan, Vinay; Kamalakaran, Sitharthan; Banerjee, Nilanjana; Dimitrova, Nevenka

    2012-01-01

    Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls

  3. Whole-genome gene expression profiling of formalin-fixed, paraffin-embedded tissue samples.

    Directory of Open Access Journals (Sweden)

    Craig April

    2009-12-01

    Full Text Available We have developed a gene expression assay (Whole-Genome DASL, capable of generating whole-genome gene expression profiles from degraded samples such as formalin-fixed, paraffin-embedded (FFPE specimens.We demonstrated a similar level of sensitivity in gene detection between matched fresh-frozen (FF and FFPE samples, with the number and overlap of probes detected in the FFPE samples being approximately 88% and 95% of that in the corresponding FF samples, respectively; 74% of the differentially expressed probes overlapped between the FF and FFPE pairs. The WG-DASL assay is also able to detect 1.3-1.5 and 1.5-2 -fold changes in intact and FFPE samples, respectively. The dynamic range for the assay is approximately 3 logs. Comparing the WG-DASL assay with an in vitro transcription-based labeling method yielded fold-change correlations of R(2 approximately 0.83, while fold-change comparisons with quantitative RT-PCR assays yielded R(2 approximately 0.86 and R(2 approximately 0.55 for intact and FFPE samples, respectively. Additionally, the WG-DASL assay yielded high self-correlations (R(2>0.98 with low intact RNA inputs ranging from 1 ng to 100 ng; reproducible expression profiles were also obtained with 250 pg total RNA (R(2 approximately 0.92, with approximately 71% of the probes detected in 100 ng total RNA also detected at the 250 pg level. When FFPE samples were assayed, 1 ng total RNA yielded self-correlations of R(2 approximately 0.80, while still maintaining a correlation of R(2 approximately 0.75 with standard FFPE inputs (200 ng.Taken together, these results show that WG-DASL assay provides a reliable platform for genome-wide expression profiling in archived materials. It also possesses utility within clinical settings where only limited quantities of samples may be available (e.g. microdissected material or when minimally invasive procedures are performed (e.g. biopsied specimens.

  4. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction.

    Science.gov (United States)

    Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S

    2015-06-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome

  5. Whole genome identification of Mycobacterium tuberculosis vaccine candidates by comprehensive data mining and bioinformatic analyses

    Directory of Open Access Journals (Sweden)

    Sadoff Jerald C

    2008-05-01

    Full Text Available Abstract Background Mycobacterium tuberculosis, the causative agent of tuberculosis (TB, infects ~8 million annually culminating in ~2 million deaths. Moreover, about one third of the population is latently infected, 10% of which develop disease during lifetime. Current approved prophylactic TB vaccines (BCG and derivatives thereof are of variable efficiency in adult protection against pulmonary TB (0%–80%, and directed essentially against early phase infection. Methods A genome-scale dataset was constructed by analyzing published data of: (1 global gene expression studies under conditions which simulate intra-macrophage stress, dormancy, persistence and/or reactivation; (2 cellular and humoral immunity, and vaccine potential. This information was compiled along with revised annotation/bioinformatic characterization of selected gene products and in silico mapping of T-cell epitopes. Protocols for scoring, ranking and prioritization of the antigens were developed and applied. Results Cross-matching of literature and in silico-derived data, in conjunction with the prioritization scheme and biological rationale, allowed for selection of 189 putative vaccine candidates from the entire genome. Within the 189 set, the relative distribution of antigens in 3 functional categories differs significantly from their distribution in the whole genome, with reduction in the Conserved hypothetical category (due to improved annotation and enrichment in Lipid and in Virulence categories. Other prominent representatives in the 189 set are the PE/PPE proteins; iron sequestration, nitroreductases and proteases, all within the Intermediary metabolism and respiration category; ESX secretion systems, resuscitation promoting factors and lipoproteins, all within the Cell wall category. Application of a ranking scheme based on qualitative and quantitative scores, resulted in a list of 45 best-scoring antigens, of which: 74% belong to the dormancy

  6. Whole-genome sequencing reveals the mechanisms for evolution of streptomycin resistance in Lactobacillus plantarum.

    Science.gov (United States)

    Zhang, Fuxin; Gao, Jiayuan; Wang, Bini; Huo, Dongxue; Wang, Zhaoxia; Zhang, Jiachao; Shao, Yuyu

    2018-01-31

    In this research, we investigated the evolution of streptomycin resistance in Lactobacillus plantarum ATCC14917, which was passaged in medium containing a gradually increasing concentration of streptomycin. After 25 d, the minimum inhibitory concentration (MIC) of L. plantarum ATCC14917 had reached 131,072 µg/mL, which was 8,192-fold higher than the MIC of the original parent isolate. The highly resistant L. plantarum ATCC14917 isolate was then passaged in antibiotic-free medium to determine the stability of resistance. The MIC value of the L. plantarum ATCC14917 isolate decreased to 2,048 µg/mL after 35 d but remained constant thereafter, indicating that resistance was irreversible even in the absence of selection pressure. Whole-genome sequencing of parent isolates, control isolates, and isolates following passage was used to study the resistance mechanism of L. plantarum ATCC14917 to streptomycin and adaptation in the presence and absence of selection pressure. Five mutated genes (single nucleotide polymorphisms and structural variants) were verified in highly resistant L. plantarum ATCC14917 isolates, which were related to ribosomal protein S12, LPXTG-motif cell wall anchor domain protein, LrgA family protein, Ser/Thr phosphatase family protein, and a hypothetical protein that may correlate with resistance to streptomycin. After passage in streptomycin-free medium, only the mutant gene encoding ribosomal protein S12 remained; the other 4 mutant genes had reverted to the wild type as found in the parent isolate. Although the MIC value of L. plantarum ATCC14917 was reduced in the absence of selection pressure, it remained 128-fold higher than the MIC value of the parent isolate, indicating that ribosomal protein S12 may play an important role in streptomycin resistance. Using the mobile elements database, we demonstrated that streptomycin resistance-related genes in L. plantarum ATCC14917 were not located on mobile elements. This research offers a way of

  7. Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

    Science.gov (United States)

    van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

    2017-10-01

    Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially (P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related.IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is

  8. Whole genome sequencing of Saccharomyces cerevisiae: from genotype to phenotype for improved metabolic engineering applications

    Directory of Open Access Journals (Sweden)

    Asadollahi Mohammad A

    2010-12-01

    Full Text Available Abstract Background The need for rapid and efficient microbial cell factory design and construction are possible through the enabling technology, metabolic engineering, which is now being facilitated by systems biology approaches. Metabolic engineering is often complimented by directed evolution, where selective pressure is applied to a partially genetically engineered strain to confer a desirable phenotype. The exact genetic modification or resulting genotype that leads to the improved phenotype is often not identified or understood to enable further metabolic engineering. Results In this work we performed whole genome high-throughput sequencing and annotation can be used to identify single nucleotide polymorphisms (SNPs between Saccharomyces cerevisiae strains S288c and CEN.PK113-7D. The yeast strain S288c was the first eukaryote sequenced, serving as the reference genome for the Saccharomyces Genome Database, while CEN.PK113-7D is a preferred laboratory strain for industrial biotechnology research. A total of 13,787 high-quality SNPs were detected between both strains (reference strain: S288c. Considering only metabolic genes (782 of 5,596 annotated genes, a total of 219 metabolism specific SNPs are distributed across 158 metabolic genes, with 85 of the SNPs being nonsynonymous (e.g., encoding amino acid modifications. Amongst metabolic SNPs detected, there was pathway enrichment in the galactose uptake pathway (GAL1, GAL10 and ergosterol biosynthetic pathway (ERG8, ERG9. Physiological characterization confirmed a strong deficiency in galactose uptake and metabolism in S288c compared to CEN.PK113-7D, and similarly, ergosterol content in CEN.PK113-7D was significantly higher in both glucose and galactose supplemented cultivations compared to S288c. Furthermore, DNA microarray profiling of S288c and CEN.PK113-7D in both glucose and galactose batch cultures did not provide a clear hypothesis for major phenotypes observed, suggesting that

  9. Whole-genome duplication and the functional diversification of teleost fish hemoglobins.

    Science.gov (United States)

    Opazo, Juan C; Butts, G Tyler; Nery, Mariana F; Storz, Jay F; Hoffmann, Federico G

    2013-01-01

    Subsequent to the two rounds of whole-genome duplication that occurred in the common ancestor of vertebrates, a third genome duplication occurred in the stem lineage of teleost fishes. This teleost-specific genome duplication (TGD) is thought to have provided genetic raw materials for the physiological, morphological, and behavioral diversification of this highly speciose group. The extreme physiological versatility of teleost fish is manifest in their diversity of blood-gas transport traits, which reflects the myriad solutions that have evolved to maintain tissue O(2) delivery in the face of changing metabolic demands and environmental O(2) availability during different ontogenetic stages. During the course of development, regulatory changes in blood-O(2) transport are mediated by the expression of multiple, functionally distinct hemoglobin (Hb) isoforms that meet the particular O(2)-transport challenges encountered by the developing embryo or fetus (in viviparous or oviparous species) and in free-swimming larvae and adults. The main objective of the present study was to assess the relative contributions of whole-genome duplication, large-scale segmental duplication, and small-scale gene duplication in producing the extraordinary functional diversity of teleost Hbs. To accomplish this, we integrated phylogenetic reconstructions with analyses of conserved synteny to characterize the genomic organization and evolutionary history of the globin gene clusters of teleosts. These results were then integrated with available experimental data on functional properties and developmental patterns of stage-specific gene expression. Our results indicate that multiple α- and β-globin genes were present in the common ancestor of gars (order Lepisoteiformes) and teleosts. The comparative genomic analysis revealed that teleosts possess a dual set of TGD-derived globin gene clusters, each of which has undergone lineage-specific changes in gene content via repeated duplication and

  10. Rapid whole-genome sequencing for detection and characterization of microorganisms directly from clinical samples.

    Science.gov (United States)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Ponten, Thomas; Lund, Ole; Svendsen, Christina Aaby; Frimodt-Møller, Niels; Aarestrup, Frank M

    2014-01-01

    Whole-genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples, this could further reduce diagnostic times and thereby improve control and treatment. A major bottleneck is the availability of fast and reliable bioinformatic tools. This study was conducted to evaluate the applicability of WGS directly on clinical samples and to develop easy-to-use bioinformatic tools for the analysis of sequencing data. Thirty-five random urine samples from patients with suspected urinary tract infections were examined using conventional microbiology, WGS of isolated bacteria, and direct sequencing on pellets from the urine samples. A rapid method for analyzing the sequence data was developed. Bacteria were cultivated from 19 samples but in pure cultures from only 17 samples. WGS improved the identification of the cultivated bacteria, and almost complete agreement was observed between phenotypic and predicted antimicrobial susceptibilities. Complete agreement was observed between species identification, multilocus sequence typing, and phylogenetic relationships for Escherichia coli and Enterococcus faecalis isolates when the results of WGS of cultured isolates and urine samples were directly compared. Sequencing directly from the urine enabled bacterial identification in polymicrobial samples. Additional putative pathogenic strains were observed in some culture-negative samples. WGS directly on clinical samples can provide clinically relevant information and drastically reduce diagnostic times. This may prove very useful, but the need for data analysis is still a hurdle to clinical implementation. To overcome this problem, a publicly available bioinformatic tool was developed in this study.

  11. Kuwaiti population subgroup of nomadic Bedouin ancestry—Whole genome sequence and analysis

    Directory of Open Access Journals (Sweden)

    Sumi Elsa John

    2015-03-01

    Full Text Available Kuwaiti native population comprises three distinct genetic subgroups of Persian, “city-dwelling” Saudi Arabian tribe, and nomadic “tent-dwelling” Bedouin ancestry. Bedouin subgroup is characterized by presence of 17% African ancestry; it owes it origin to nomadic tribes of the deserts of Arabian Peninsula and North Africa. By sequencing whole genome of a Kuwaiti male from this subgroup at 41X coverage, we report 3,752,878 SNPs, 411,839 indels, and 8451 structural variations. Neighbor-joining tree, based on shared variant positions carrying disease-risk alleles between the Bedouin and other continental genomes, places Bedouin genome at the nexus of African, Asian, and European genomes in concordance with geographical location of Kuwait and Peninsula. In congruence with participant's medical history for morbid obesity and bronchial asthma, risk alleles are seen at deleterious SNPs associated with obesity and asthma. Many of the observed deleterious ‘novel’ variants lie in genes associated with autosomal recessive disorders characteristic of the region.

  12. Diversity through duplication: whole-genome sequencing reveals novel gene retrocopies in the human population.

    Science.gov (United States)

    Richardson, Sandra R; Salvador-Palomeque, Carmen; Faulkner, Geoffrey J

    2014-05-01

    Gene retrocopies are generated by reverse transcription and genomic integration of mRNA. As such, retrocopies present an important exception to the central dogma of molecular biology, and have substantially impacted the functional landscape of the metazoan genome. While an estimated 8,000-17,000 retrocopies exist in the human genome reference sequence, the extent of variation between individuals in terms of retrocopy content has remained largely unexplored. Three recent studies by Abyzov et al., Ewing et al. and Schrider et al. have exploited 1,000 Genomes Project Consortium data, as well as other sources of whole-genome sequencing data, to uncover novel gene retrocopies. Here, we compare the methods and results of these three studies, highlight the impact of retrocopies in human diversity and genome evolution, and speculate on the potential for somatic gene retrocopies to impact cancer etiology and genetic diversity among individual neurons in the mammalian brain. © 2014 The Authors. Bioessays published by WILEY Periodicals, Inc.

  13. Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate.

    Directory of Open Access Journals (Sweden)

    Benjamin Georgi

    2014-03-01

    Full Text Available Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders.

  14. Using whole genome sequencing to study American foulbrood epidemiology in honeybees

    Science.gov (United States)

    Ågren, Joakim; Schäfer, Marc Oliver

    2017-01-01

    American foulbrood (AFB), caused by Paenibacillus larvae, is a devastating disease in honeybees. In most countries, the disease is controlled through compulsory burning of symptomatic colonies causing major economic losses in apiculture. The pathogen is endemic to honeybees world-wide and is readily transmitted via the movement of hive equipment or bees. Molecular epidemiology of AFB currently largely relies on placing isolates in one of four ERIC-genotypes. However, a more powerful alternative is multi-locus sequence typing (MLST) using whole-genome sequencing (WGS), which allows for high-resolution studies of disease outbreaks. To evaluate WGS as a tool for AFB-epidemiology, we applied core genome MLST (cgMLST) on isolates from a recent outbreak of AFB in Sweden. The high resolution of the cgMLST allowed different bacterial clones involved in the disease outbreak to be identified and to trace the source of infection. The source was found to be a beekeeper who had sold bees to two other beekeepers, proving the epidemiological link between them. No such conclusion could have been made using conventional MLST or ERIC-typing. This is the first time that WGS has been used to study the epidemiology of AFB. The results show that the technique is very powerful for high-resolution tracing of AFB-outbreaks. PMID:29140998

  15. Using whole genome sequencing to study American foulbrood epidemiology in honeybees.

    Directory of Open Access Journals (Sweden)

    Joakim Ågren

    Full Text Available American foulbrood (AFB, caused by Paenibacillus larvae, is a devastating disease in honeybees. In most countries, the disease is controlled through compulsory burning of symptomatic colonies causing major economic losses in apiculture. The pathogen is endemic to honeybees world-wide and is readily transmitted via the movement of hive equipment or bees. Molecular epidemiology of AFB currently largely relies on placing isolates in one of four ERIC-genotypes. However, a more powerful alternative is multi-locus sequence typing (MLST using whole-genome sequencing (WGS, which allows for high-resolution studies of disease outbreaks. To evaluate WGS as a tool for AFB-epidemiology, we applied core genome MLST (cgMLST on isolates from a recent outbreak of AFB in Sweden. The high resolution of the cgMLST allowed different bacterial clones involved in the disease outbreak to be identified and to trace the source of infection. The source was found to be a beekeeper who had sold bees to two other beekeepers, proving the epidemiological link between them. No such conclusion could have been made using conventional MLST or ERIC-typing. This is the first time that WGS has been used to study the epidemiology of AFB. The results show that the technique is very powerful for high-resolution tracing of AFB-outbreaks.

  16. Whole Genome Sequencing-Based Mapping and Candidate Identification of Mutations from Fixed Zebrafish Tissue

    Directory of Open Access Journals (Sweden)

    Nicholas E. Sanchez

    2017-10-01

    Full Text Available As forward genetic screens in zebrafish become more common, the number of mutants that cannot be identified by gross morphology or through transgenic approaches, such as many nervous system defects, has also increased. Screening for these difficult-to-visualize phenotypes demands techniques such as whole-mount in situ hybridization (WISH or antibody staining, which require tissue fixation. To date, fixed tissue has not been amenable for generating libraries for whole genome sequencing (WGS. Here, we describe a method for using genomic DNA from fixed tissue and a bioinformatics suite for WGS-based mapping of zebrafish mutants. We tested our protocol using two known zebrafish mutant alleles, gpr126st49 and egr2bfh227, both of which cause myelin defects. As further proof of concept we mapped a novel mutation, stl64, identified in a zebrafish WISH screen for myelination defects. We linked stl64 to chromosome 1 and identified a candidate nonsense mutation in the F-box and WD repeat domain containing 7 (fbxw7 gene. Importantly, stl64 mutants phenocopy previously described fbxw7vu56 mutants, and knockdown of fbxw7 in wild-type animals produced similar defects, demonstrating that stl64 disrupts fbxw7. Together, these data show that our mapping protocol can map and identify causative lesions in mutant screens that require tissue fixation for phenotypic analysis.

  17. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  18. Comparative whole-genome analysis reveals artificial selection effects on Ustilago esculenta genome.

    Science.gov (United States)

    Ye, Zihong; Pan, Yao; Zhang, Yafen; Cui, Haifeng; Jin, Gulei; McHardy, Alice C; Fan, Longjiang; Yu, Xiaoping

    2017-07-19

    Ustilago esculenta, infects Zizania latifolia, and induced host stem swollen to be a popular vegetable called Jiaobai in China. It is the long-standing artificial selection that maximizes the occurrence of favourable Jiaobai, and thus maintaining the plant-fungi interaction and modulating the fungus evolving from plant pathogen to entophyte. In this study, whole genome of U. esculenta was sequenced and transcriptomes of the fungi and its host were analysed. The 20.2 Mb U. esculenta draft genome of 6,654 predicted genes including mating, primary metabolism, secreted proteins, shared a high similarity to related Smut fungi. But U. esculenta prefers RNA silencing not repeat-induced point in defence and has more introns per gene, indicating relatively slow evolution rate. The fungus also lacks some genes in amino acid biosynthesis pathway which were filled by up-regulated host genes and developed distinct amino acid response mechanism to balance the infection-resistance interaction. Besides, U. esculenta lost some surface sensors, important virulence factors and host range-related effectors to maintain the economic endophytic life. The elucidation of the U. esculenta genomic information as well as expression profiles can not only contribute to more comprehensive insights into the molecular mechanism underlying artificial selection but also into smut fungi-host interactions. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  19. Whole-genome sequencing of Berkshire (European native pig) provides insights into its origin and domestication.

    Science.gov (United States)

    Li, Mingzhou; Tian, Shilin; Yeung, Carol K L; Meng, Xuehong; Tang, Qianzi; Niu, Lili; Wang, Xun; Jin, Long; Ma, Jideng; Long, Keren; Zhou, Chaowei; Cao, Yinchuan; Zhu, Li; Bai, Lin; Tang, Guoqing; Gu, Yiren; Jiang, An'an; Li, Xuewei; Li, Ruiqiang

    2014-04-14

    Domesticated organisms have experienced strong selective pressures directed at genes or genomic regions controlling traits of biological, agricultural or medical importance. The genome of native and domesticated pigs provide a unique opportunity for tracing the history of domestication and identifying signatures of artificial selection. Here we used whole-genome sequencing to explore the genetic relationships among the European native pig Berkshire and breeds that are distributed worldwide, and to identify genomic footprints left by selection during the domestication of Berkshire. Numerous nonsynonymous SNPs-containing genes fall into olfactory-related categories, which are part of a rapidly evolving superfamily in the mammalian genome. Phylogenetic analyses revealed a deep phylogenetic split between European and Asian pigs rather than between domestic and wild pigs. Admixture analysis exhibited higher portion of Chinese genetic material for the Berkshire pigs, which is consistent with the historical record regarding its origin. Selective sweep analyses revealed strong signatures of selection affecting genomic regions that harbor genes underlying economic traits such as disease resistance, pork yield, fertility, tameness and body length. These discoveries confirmed the history of origin of Berkshire pig by genome-wide analysis and illustrate how domestication has shaped the patterns of genetic variation.

  20. Evaluation of artificial selection in Standard Poodles using whole-genome sequencing.

    Science.gov (United States)

    Friedenberg, Steven G; Meurs, Kathryn M; Mackay, Trudy F C

    2016-12-01

    Identifying regions of artificial selection within dog breeds may provide insights into genetic variation that underlies breed-specific traits or diseases-particularly if these traits or disease predispositions are fixed within a breed. In this study, we searched for runs of homozygosity (ROH) and calculated the d i statistic (which is based upon F ST) to identify regions of artificial selection in Standard Poodles using high-coverage, whole-genome sequencing data of 15 Standard Poodles and 49 dogs across seven other breeds. We identified consensus ROH regions ≥1 Mb in length and common to at least ten Standard Poodles covering 0.6 % of the genome, and d i regions that most distinguish Standard Poodles from other breeds covering 3.7 % of the genome. Within these regions, we identified enriched gene pathways related to olfaction, digestion, and taste, as well as pathways related to adrenal hormone biosynthesis, T cell function, and protein ubiquitination that could contribute to the pathogenesis of some Poodle-prevalent autoimmune diseases. We also validated variants related to hair coat and skull morphology that have previously been identified as being under selective pressure in Poodles, and flagged additional polymorphisms in genes such as ITGA2B, CBX4, and TNXB that may represent strong candidates for other common Poodle disorders.

  1. Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits.

    Science.gov (United States)

    Kessner, Darren; Novembre, John

    2015-04-01

    Evolve and resequence studies combine artificial selection experiments with massively parallel sequencing technology to study the genetic basis for complex traits. In these experiments, individuals are selected for extreme values of a trait, causing alleles at quantitative trait loci (QTL) to increase or decrease in frequency in the experimental population. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework that explicitly models whole genomes of individuals, quantitative traits, and selection based on individual trait values. We find that explicitly modeling QTL provides qualitatively different insights than considering independent loci with constant selection coefficients. Specifically, we observe how interference between QTL under selection affects the trajectories and lengthens the fixation times of selected alleles. We also show that a substantial portion of the genetic variance of the trait (50-100%) can be explained by detected QTL in as little as 20 generations of selection, depending on the trait architecture and experimental design. Furthermore, we show that power depends crucially on the opportunity for recombination during the experiment. Finally, we show that an increase in power is obtained by leveraging founder haplotype information to obtain allele frequency estimates. Copyright © 2015 by the Genetics Society of America.

  2. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

    Directory of Open Access Journals (Sweden)

    Ruibang Luo

    2014-06-01

    Full Text Available This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels, BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads, or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  3. Unique features of a Japanese 'Candidatus Liberibacter asiaticus' strain revealed by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Hiroshi Katoh

    Full Text Available Citrus greening (huanglongbing is the most destructive disease of citrus worldwide. It is spread by citrus psyllids and is associated with phloem-limited bacteria of three species of α-Proteobacteria, namely, 'Candidatus Liberibacter asiaticus', 'Ca. L. americanus', and 'Ca. L. africanus'. Recent findings suggested that some Japanese strains lack the bacteriophage-type DNA polymerase region (DNA pol, in contrast to the Floridian psy62 strain. The whole genome sequence of the pol-negative 'Ca. L. asiaticus' Japanese isolate Ishi-1 was determined by metagenomic analysis of DNA extracted from 'Ca. L. asiaticus'-infected psyllids and leaf midribs. The 1.19-Mb genome has an average 36.32% GC content. Annotation revealed 13 operons encoding rRNA and 44 tRNA genes, but no typical bacterial pathogenesis-related genes were located within the genome, similar to the Floridian psy62 and Chinese gxpsy. In contrast to other 'Ca. L. asiaticus' strains, the genome of the Japanese Ishi-1 strain lacks a prophage-related region.

  4. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    Science.gov (United States)

    Yuen, Ryan KC; Merico, Daniele; Bookman, Matt; Howe, Jennifer L; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D’Abate, Lia; Chan, Ada JS; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson WL; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W

    2017-01-01

    We are performing whole genome sequencing (WGS) of families with Autism Spectrum Disorder (ASD) to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved. Here, we report WGS of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible in a cloud platform, and through an internet portal with controlled access. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertion/deletions (indels) or copy number variations (CNVs) per ASD subject. We identified 18 new candidate ASD-risk genes such as MED13 and PHF3, and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (p=6×10−4). In 294/2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried CNV/chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD. PMID:28263302

  5. A proposed clinical decision support architecture capable of supporting whole genome sequence information.

    Science.gov (United States)

    Welch, Brandon M; Loya, Salvador Rodriguez; Eilbeck, Karen; Kawamoto, Kensaku

    2014-04-04

    Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine.

  6. Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks

    Science.gov (United States)

    Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.

    2017-05-01

    Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.

  7. ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis.

    Science.gov (United States)

    Riaz, Tiayyba; Shehzad, Wasim; Viari, Alain; Pompanon, François; Taberlet, Pierre; Coissac, Eric

    2011-11-01

    Using non-conventional markers, DNA metabarcoding allows biodiversity assessment from complex substrates. In this article, we present ecoPrimers, a software for identifying new barcode markers and their associated PCR primers. ecoPrimers scans whole genomes to find such markers without a priori knowledge. ecoPrimers optimizes two quality indices measuring taxonomical range and discrimination to select the most efficient markers from a set of reference sequences, according to specific experimental constraints such as marker length or specifically targeted taxa. The key step of the algorithm is the identification of conserved regions among reference sequences for anchoring primers. We propose an efficient algorithm based on data mining, that allows the analysis of huge sets of sequences. We evaluate the efficiency of ecoPrimers by running it on three different sequence sets: mitochondrial, chloroplast and bacterial genomes. Identified barcode markers correspond either to barcode regions already in use for plants or animals, or to new potential barcodes. Results from empirical experiments carried out on a promising new barcode for analyzing vertebrate diversity fully agree with expectations based on bioinformatics analysis. These tests demonstrate the efficiency of ecoPrimers for inferring new barcodes fitting with diverse experimental contexts. ecoPrimers is available as an open source project at: http://www.grenoble.prabi.fr/trac/ecoPrimers.

  8. Supersize me: how whole-genome sequencing and big data are transforming epidemiology.

    Science.gov (United States)

    Kao, Rowland R; Haydon, Daniel T; Lycett, Samantha J; Murcia, Pablo R

    2014-05-01

    In epidemiology, the identification of 'who infected whom' allows us to quantify key characteristics such as incubation periods, heterogeneity in transmission rates, duration of infectiousness, and the existence of high-risk groups. Although invaluable, the existence of many plausible infection pathways makes this difficult, and epidemiological contact tracing either uncertain, logistically prohibitive, or both. The recent advent of next-generation sequencing technology allows the identification of traceable differences in the pathogen genome that are transforming our ability to understand high-resolution disease transmission, sometimes even down to the host-to-host scale. We review recent examples of the use of pathogen whole-genome sequencing for the purpose of forensic tracing of transmission pathways, focusing on the particular problems where evolutionary dynamics must be supplemented by epidemiological information on the most likely timing of events as well as possible transmission pathways. We also discuss potential pitfalls in the over-interpretation of these data, and highlight the manner in which a confluence of this technology with sophisticated mathematical and statistical approaches has the potential to produce a paradigm shift in our understanding of infectious disease transmission and control. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications.

    Science.gov (United States)

    Tank, David C; Eastman, Jonathan M; Pennell, Matthew W; Soltis, Pamela S; Soltis, Douglas E; Hinchliff, Cody E; Brown, Joseph W; Sessa, Emily B; Harmon, Luke J

    2015-07-01

    Our growing understanding of the plant tree of life provides a novel opportunity to uncover the major drivers of angiosperm diversity. Using a time-calibrated phylogeny, we characterized hot and cold spots of lineage diversification across the angiosperm tree of life by modeling evolutionary diversification using stepwise AIC (MEDUSA). We also tested the whole-genome duplication (WGD) radiation lag-time model, which postulates that increases in diversification tend to lag behind established WGD events. Diversification rates have been incredibly heterogeneous throughout the evolutionary history of angiosperms and reveal a pattern of 'nested radiations' - increases in net diversification nested within other radiations. This pattern in turn generates a negative relationship between clade age and diversity across both families and orders. We suggest that stochastically changing diversification rates across the phylogeny explain these patterns. Finally, we demonstrate significant statistical support for the WGD radiation lag-time model. Across angiosperms, nested shifts in diversification led to an overall increasing rate of net diversification and declining relative extinction rates through time. These diversification shifts are only rarely perfectly associated with WGD events, but commonly follow them after a lag period. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  10. Prokaryotic Phylogenies Inferred from Whole-Genome Sequence and Annotation Data

    Directory of Open Access Journals (Sweden)

    Wei Du

    2013-01-01

    Full Text Available Phylogenetic trees are used to represent the evolutionary relationship among various groups of species. In this paper, a novel method for inferring prokaryotic phylogenies using multiple genomic information is proposed. The method is called CGCPhy and based on the distance matrix of orthologous gene clusters between whole-genome pairs. CGCPhy comprises four main steps. First, orthologous genes are determined by sequence similarity, genomic function, and genomic structure information. Second, genes involving potential HGT events are eliminated, since such genes are considered to be the highly conserved genes across different species and the genes located on fragments with abnormal genome barcode. Third, we calculate the distance of the orthologous gene clusters between each genome pair in terms of the number of orthologous genes in conserved clusters. Finally, the neighbor-joining method is employed to construct phylogenetic trees across different species. CGCPhy has been examined on different datasets from 617 complete single-chromosome prokaryotic genomes and achieved applicative accuracies on different species sets in agreement with Bergey's taxonomy in quartet topologies. Simulation results show that CGCPhy achieves high average accuracy and has a low standard deviation on different datasets, so it has an applicative potential for phylogenetic analysis.

  11. New perspectives on microbial community distortion after whole-genome amplification.

    Directory of Open Access Journals (Sweden)

    Alexander J Probst

    Full Text Available Whole-genome amplification (WGA has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the effects of WGA on 31 different microbial communities from five biotopes that also included low-biomass samples from drinking water and groundwater. Our findings provide evidence that microbiome segregation by biotope was possible despite WGA treatment. Nevertheless, samples from different biotopes revealed different levels of distortion, with genomic GC content significantly correlated with WGA perturbation. Certain phylogenetic clades revealed a homogenous trend across various sample types, for instance Alpha- and Betaproteobacteria showed a decrease in their abundance after WGA treatment. On the other hand, Enterobacteriaceae, an important biomarker group for fecal contamination in groundwater and drinking water, were strongly affected by WGA treatment without a predictable pattern. These novel results describe the impact of WGA on low-biomass samples and may highlight issues to be aware of when designing future metagenomic studies that necessitate preceding WGA treatment.

  12. Use of whole genome shotgun metagenomics: a practical guide for the microbiome-minded physician scientist.

    Science.gov (United States)

    Ma, Jun; Prince, Amanda; Aagaard, Kjersti M

    2014-01-01

    Whole genome shotgun sequencing (WGS) has been increasingly recognized as the most comprehensive and robust approach for metagenomics research. When compared with 16S-based metagenomics, it offers the advantage of identification of species level taxonomy and the estimation of metabolic pathway activities from human and environmental samples. Several large-scale metagenomic projects have been recently conducted or are currently underway utilizing WGS. With the generation of vast amounts of data, the bioinformatics and computational analysis of WGS results become vital for the success of a metagenomics study. However, each step in the WGS data analysis, including metagenome assembly, gene prediction, taxonomy identification, function annotation, and pathway analysis, is complicated by the shear amount of data. Algorithms and tools have been developed specifically to handle WGS-generated metagenomics data with the hope of reducing the requirement on computational time and storage space. Here, we present an overview of the current state of metagenomics through WGS sequencing, challenges frequently encountered, and up-to-date solutions. Several applications that are uniquely applicable to microbiome studies in reproductive and perinatal medicine are also discussed. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  13. Whole genome duplications and a 'function' for junk DNA? Facts and hypotheses.

    Directory of Open Access Journals (Sweden)

    Reiner A Veitia

    Full Text Available BACKGROUND: The lack of correlation between genome size and organismal complexity is understood in terms of the massive presence of repetitive and non-coding DNA. This non-coding subgenome has long been called "junk" DNA. However, it might have important functions. Generation of junk DNA depends on proliferation of selfish DNA elements and on local or global DNA duplication followed by genic non-functionalization. METHODOLOGY/PRINCIPAL FINDINGS: Evidence from genomic analyses and experimental data indicates that Whole Genome Duplications (WGD are often followed by a return to the diploid state, through DNA deletions and intra/interchromosomal rearrangements. We use simple theoretical models and simulations to explore how a WGD accompanied by sequence deletions might affect the dosage balance often required among several gene products involved in regulatory processes. We find that potential genomic deletions leading to changes in nuclear and cell volume might potentially perturb gene dosage balance. CONCLUSIONS/SIGNIFICANCE: The potentially negative impact of DNA deletions can be buffered if deleted genic DNA is, at least temporarily, replaced by repetitive DNA so that the nuclear/cell volume remains compatible with normal living. Thus, we speculate that retention of non-functionalized non-coding DNA, and replacement of deleted DNA through proliferation of selfish elements, might help avoid dosage imbalances in cycles of polyploidization and diploidization, which are particularly frequent in plants.

  14. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis.

    Directory of Open Access Journals (Sweden)

    Mingyu Gan

    Full Text Available Mixed infection by multiple Mycobacterium tuberculosis (MTB strains is associated with poor treatment outcome of tuberculosis (TB. Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates.

  15. Care and cost consequences of pediatric whole genome sequencing compared to chromosome microarray.

    Science.gov (United States)

    Hayeems, Robin Z; Bhawra, Jasmin; Tsiplova, Kate; Meyn, M Stephen; Monfared, Nasim; Bowdin, Sarah; Stavropoulos, D James; Marshall, Christian R; Basran, Raveen; Shuman, Cheryl; Ito, Shinya; Cohn, Iris; Hum, Courtney; Girdea, Marta; Brudno, Michael; Cohn, Ronald D; Scherer, Stephen W; Ungar, Wendy J

    2017-11-20

    The clinical use of whole-genome sequencing (WGS) is expected to alter pediatric medical management. The study aimed to describe the type and cost of healthcare activities following pediatric WGS compared to chromosome microarray (CMA). Healthcare activities prompted by WGS and CMA were ascertained for 101 children with developmental delay over 1 year. Activities following receipt of non-diagnostic CMA were compared to WGS diagnostic and non-diagnostic results. Activities were costed in 2016 Canadian dollars (CDN). Ongoing care accounted for 88.6% of post-test activities. The mean number of lab tests was greater following CMA than WGS (0.55 vs. 0.09; p = 0.007). The mean number of specialist visits was greater following WGS than CMA (0.41 vs. 0; p = 0.016). WGS results (diagnostic vs. non-diagnostic) modified the effect of test type on mean number of activities (p WGS exceeded $557CDN for 10% of cases. In complex pediatric care, CMA prompted additional diagnostic investigations while WGS prompted tailored care guided by genotypic variants. Costs for prompted activities were low for the majority and constitute a small proportion of total test costs. Optimal use of WGS depends on robust evaluation of downstream care and cost consequences.

  16. Prospective use of whole genome sequencing (WGS) detected a multi-country outbreak of Salmonella Enteritidis.

    Science.gov (United States)

    Inns, T; Ashton, P M; Herrera-Leon, S; Lighthill, J; Foulkes, S; Jombart, T; Rehman, Y; Fox, A; Dallman, T; DE Pinna, E; Browning, L; Coia, J E; Edeghere, O; Vivancos, R

    2017-01-01

    Since April 2015, whole genome sequencing (WGS) has been the routine test for Salmonella identification, surveillance and outbreak investigation at the national reference laboratory in England and Wales. In May 2015, an outbreak of Salmonella Enteritidis cases was detected using WGS data and investigated. UK cases were interviewed to obtain a food history and links between suppliers were mapped to produce a food chain network for chicken eggs. The association between the food chain network and the phylogeny was explored using a network comparison approach. Food and environmental samples were taken from premises linked to cases and tested for Salmonella. Within the outbreak single nucleotide polymorphism defined cluster, 136 cases were identified in the UK and 18 in Spain. One isolate from a food containing chicken eggs was within the outbreak cluster. There was a significant association between the chicken egg food chain of UK cases and phylogeny of outbreak isolates. This is the first published Salmonella outbreak to be prospectively detected using WGS. This outbreak in the UK was linked with contemporaneous cases in Spain by WGS. We conclude that UK and Spanish cases were exposed to a common source of Salmonella-contaminated chicken eggs.

  17. Preferences for the provision of whole genome sequencing services among young adults.

    Directory of Open Access Journals (Sweden)

    Christopher H Wade

    Full Text Available As whole genome sequencing (WGS becomes increasingly available, clinicians will be faced with conveying complex information to individuals at different stages in life. The purpose of this study is to characterize the views of young adults toward obtaining WGS, learning different types of genomic information, and having choice about which results are disclosed.A mixed-methods descriptive study was conducted with a diverse group of 18 and 19-years-olds (N = 145. Participants watched an informational video about WGS and then completed an online survey.Participants held a positive attitude toward obtaining WGS and learning about a range of health conditions and traits. Increased interest in learning WGS information was significantly associated with anticipated capacity to handle the emotional consequences if a serious risk was found (β = 0.13, P = .04. Young adults wanted the ability to choose what types of genomic risk information would be returned and expressed decreased willingness to undergo WGS if clinicians made these decisions (t(138 = -7.14, P <.01. Qualitative analysis showed that young adults emphasized procedural factors in WGS decision-making and that perceived health benefits of WGS had a substantial role in testing preferences and anticipated usage of WGS results.Clinicians are likely to encounter enthusiasm for obtaining WGS results among young adults and may need to develop strategies for ensuring that this preference is adequately informed.

  18. The implications of whole-genome sequencing in the control of tuberculosis

    Science.gov (United States)

    Lee, Robyn S.

    2015-01-01

    The availability of whole-genome sequencing (WGS) as a tool for the diagnosis and clinical management of tuberculosis (TB) offers considerable promise in the fight against this stubborn epidemic. However, like other new technologies, the best application of WGS remains to be determined, for both conceptual and technical reasons. In this review, we consider the potential value of WGS in the clinical laboratory for the detection of Mycobacterium tuberculosis and the prediction of antibiotic resistance. We also discuss issues pertaining to data generation, interpretation and dissemination, given that WGS has to date been generally performed in research labs where results are not necessarily packaged in a clinician-friendly format. Although WGS is far more accessible now than it was in the past, the transition from a research tool to study TB into a clinical test to manage this disease may require further fine-tuning. Improvements will likely come through iterative efforts that involve both the laboratories ready to move TB into the genomic era and the front-line clinical/public health staff who will be interpreting the results to inform management decisions. PMID:27034776

  19. Preferences for the provision of whole genome sequencing services among young adults.

    Science.gov (United States)

    Wade, Christopher H; Elliott, Kailyn R

    2017-01-01

    As whole genome sequencing (WGS) becomes increasingly available, clinicians will be faced with conveying complex information to individuals at different stages in life. The purpose of this study is to characterize the views of young adults toward obtaining WGS, learning different types of genomic information, and having choice about which results are disclosed. A mixed-methods descriptive study was conducted with a diverse group of 18 and 19-years-olds (N = 145). Participants watched an informational video about WGS and then completed an online survey. Participants held a positive attitude toward obtaining WGS and learning about a range of health conditions and traits. Increased interest in learning WGS information was significantly associated with anticipated capacity to handle the emotional consequences if a serious risk was found (β = 0.13, P = .04). Young adults wanted the ability to choose what types of genomic risk information would be returned and expressed decreased willingness to undergo WGS if clinicians made these decisions (t(138) = -7.14, P WGS decision-making and that perceived health benefits of WGS had a substantial role in testing preferences and anticipated usage of WGS results. Clinicians are likely to encounter enthusiasm for obtaining WGS results among young adults and may need to develop strategies for ensuring that this preference is adequately informed.

  20. Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic.

    Science.gov (United States)

    Foley, Samantha B; Rios, Jonathan J; Mgbemena, Victoria E; Robinson, Linda S; Hampel, Heather L; Toland, Amanda E; Durham, Leslie; Ross, Theodora S

    2015-01-01

    Despite the potential of whole-genome sequencing (WGS) to improve patient diagnosis and care, the empirical value of WGS in the cancer genetics clinic is unknown. We performed WGS on members of two cohorts of cancer genetics patients: those with BRCA1/2 mutations (n = 176) and those without (n = 82). Initial analysis of potentially pathogenic variants (PPVs, defined as nonsynonymous variants with allele frequency WGS will provide useful clinical results. This is despite the fact that a majority of PPVs were novel missense variants likely to be classified as variants of unknown significance (VUS). Furthermore, previously reported pathogenic missense variants did not always associate with their predicted diseases in our patients. This suggests that the clinical use of WGS will require large-scale efforts to consolidate WGS and patient data to improve accuracy of interpretation of rare variants. While loss-of-function (LoF) variants represented only a small fraction of PPVs, WGS identified additional cancer risk LoF PPVs in patients with known BRCA1/2 mutations and led to cancer risk diagnoses in 21% of non-BRCA cancer genetics patients after expanding our analysis to 3209 ClinVar genes. These data illustrate how WGS can be used to improve our ability to discover patients' cancer genetic risks.

  1. Comparative analysis of whole genome sequencing-based telomere length measurement techniques.

    Science.gov (United States)

    Lee, Michael; Napier, Christine E; Yang, Sile F; Arthur, Jonathan W; Reddel, Roger R; Pickett, Hilda A

    2017-02-01

    Telomeres are regions of repetitive DNA at the ends of human chromosomes that function to maintain the integrity of the genome. Telomere attrition is associated with cellular ageing, whilst telomere maintenance is a prerequisite for malignant transformation. Whole genome sequencing (WGS) captures sequence information from the entire genome, including the telomeres, and is increasingly being applied in research and in the clinic. Several bioinformatics tools have been designed to determine telomere content and length from WGS data, and include Motif_counter, TelSeq, Computel, qMotif, and Telomerecat. These tools utilise different approaches to identify, quantify and normalise telomeric reads; however, it is not known how they compare to one another. Here we describe the details and utility of each tool, and directly compare WGS telomere length output with laboratory-based telomere length measurements. In addition, we evaluate the accessibility, practicality, speed, and additional features of each tool. Each tool was tested using a range of telomere read extraction criteria, to determine the optimal parameters for the specific WGS read length. The aim of this article is to improve the accessibility of WGS telomere length measurement tools, which have the potential to be applied to WGS cohorts for clinical as well as research benefit. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Intragenic DOK7 deletion detected by whole-genome sequencing in congenital myasthenic syndromes.

    Science.gov (United States)

    Azuma, Yoshiteru; Töpf, Ana; Evangelista, Teresinha; Lorenzoni, Paulo José; Roos, Andreas; Viana, Pedro; Inagaki, Hidehito; Kurahashi, Hiroki; Lochmüller, Hanns

    2017-06-01

    To identify the genetic cause in a patient affected by ptosis and exercise-induced muscle weakness and diagnosed with congenital myasthenic syndromes (CMS) using whole-genome sequencing (WGS). Candidate gene screening and WGS analysis were performed in the case. Allele-specific PCR was subsequently performed to confirm the copy number variation (CNV) that was suspected from the WGS results. In addition to the previously reported frameshift mutation c.1124_1127dup, an intragenic 6,261 bp deletion spanning from the 5' untranslated region to intron 2 of the DOK7 gene was identified by WGS in the patient with CMS. The heterozygous deletion was suspected based on reduced coverage on WGS and confirmed by allele-specific PCR. The breakpoints had microhomology and an inverted repeat, which may have led to the development of the deletion during DNA replication. We report a CMS case with identification of the breakpoints of the intragenic DOK7 deletion using WGS analysis. This case illustrates that CNVs undetected by Sanger sequencing may be identified by WGS and highlights their relevance in the molecular diagnosis of a treatable neurologic condition such as CMS.

  3. Long insert whole genome sequencing for copy number variant and translocation detection.

    Science.gov (United States)

    Liang, Winnie S; Aldrich, Jessica; Tembe, Waibhav; Kurdoglu, Ahmet; Cherni, Irene; Phillips, Lori; Reiman, Rebecca; Baker, Angela; Weiss, Glen J; Carpten, John D; Craig, David W

    2014-01-01

    As next-generation sequencing continues to have an expanding presence in the clinic, the identification of the most cost-effective and robust strategy for identifying copy number changes and translocations in tumor genomes is needed. We hypothesized that performing shallow whole genome sequencing (WGS) of 900-1000-bp inserts (long insert WGS, LI-WGS) improves our ability to detect these events, compared with shallow WGS of 300-400-bp inserts. A priori analyses show that LI-WGS requires less sequencing compared with short insert WGS to achieve a target physical coverage, and that LI-WGS requires less sequence coverage to detect a heterozygous event with a power of 0.99. We thus developed an LI-WGS library preparation protocol based off of Illumina's WGS library preparation protocol and illustrate the feasibility of performing LI-WGS. We additionally applied LI-WGS to three separate tumor/normal DNA pairs collected from patients diagnosed with different cancers to demonstrate our application of LI-WGS on actual patient samples for identification of somatic copy number alterations and translocations. With the evolution of sequencing technologies and bioinformatics analyses, we show that modifications to current approaches may improve our ability to interrogate cancer genomes.

  4. Kuwaiti population subgroup of nomadic Bedouin ancestry-Whole genome sequence and analysis.

    Science.gov (United States)

    John, Sumi Elsa; Thareja, Gaurav; Hebbar, Prashantha; Behbehani, Kazem; Thanaraj, Thangavel Alphonse; Alsmadi, Osama

    2015-03-01

    Kuwaiti native population comprises three distinct genetic subgroups of Persian, "city-dwelling" Saudi Arabian tribe, and nomadic "tent-dwelling" Bedouin ancestry. Bedouin subgroup is characterized by presence of 17% African ancestry; it owes it origin to nomadic tribes of the deserts of Arabian Peninsula and North Africa. By sequencing whole genome of a Kuwaiti male from this subgroup at 41X coverage, we report 3,752,878 SNPs, 411,839 indels, and 8451 structural variations. Neighbor-joining tree, based on shared variant positions carrying disease-risk alleles between the Bedouin and other continental genomes, places Bedouin genome at the nexus of African, Asian, and European genomes in concordance with geographical location of Kuwait and Peninsula. In congruence with participant's medical history for morbid obesity and bronchial asthma, risk alleles are seen at deleterious SNPs associated with obesity and asthma. Many of the observed deleterious 'novel' variants lie in genes associated with autosomal recessive disorders characteristic of the region.

  5. Clostridium botulinum Group II Isolate Phylogenomic Profiling Using Whole-Genome Sequence Data.

    Science.gov (United States)

    Weedmark, K A; Mabon, P; Hayden, K L; Lambert, D; Van Domselaar, G; Austin, J W; Corbett, C R

    2015-09-01

    Clostridium botulinum group II isolates (n = 163) from different geographic regions, outbreaks, and neurotoxin types and subtypes were characterized in silico using whole-genome sequence data. Two clusters representing a variety of botulinum neurotoxin (BoNT) types and subtypes were identified by multilocus sequence typing (MLST) and core single nucleotide polymorphism (SNP) analysis. While one cluster included BoNT/B4/F6/E9 and nontoxigenic members, the other comprised a wide variety of different BoNT/E subtype isolates and a nontoxigenic strain. In silico MLST and core SNP methods were consistent in terms of clade-level isolate classification; however, core SNP analysis showed higher resolution capability. Furthermore, core SNP analysis correctly distinguished isolates by outbreak and location. This study illustrated the utility of next-generation sequence-based typing approaches for isolate characterization and source attribution and identified discrete SNP loci and MLST alleles for isolate comparison. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  6. Long-read, whole-genome shotgun sequence data for five model organisms.

    Science.gov (United States)

    Kim, Kristi E; Peluso, Paul; Babayan, Primo; Yeadon, P Jane; Yu, Charles; Fisher, William W; Chin, Chen-Shan; Rapicavoli, Nicole A; Rank, David R; Li, Joachim; Catcheside, David E A; Celniker, Susan E; Phillippy, Adam M; Bergman, Casey M; Landolin, Jane M

    2014-01-01

    Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.

  7. Insights into three whole-genome duplications gleaned from the Paramecium caudatum genome sequence.

    Science.gov (United States)

    McGrath, Casey L; Gout, Jean-Francois; Doak, Thomas G; Yanagi, Akira; Lynch, Michael

    2014-08-01

    Paramecium has long been a model eukaryote. The sequence of the Paramecium tetraurelia genome reveals a history of three successive whole-genome duplications (WGDs), and the sequences of P. biaurelia and P. sexaurelia suggest that these WGDs are shared by all members of the aurelia species complex. Here, we present the genome sequence of P. caudatum, a species closely related to the P. aurelia species group. P. caudatum shares only the most ancient of the three WGDs with the aurelia complex. We found that P. caudatum maintains twice as many paralogs from this early event as the P. aurelia species, suggesting that post-WGD gene retention is influenced by subsequent WGDs and supporting the importance of selection for dosage in gene retention. The availability of P. caudatum as an outgroup allows an expanded analysis of the aurelia intermediate and recent WGD events. Both the Guanine+Cytosine (GC) content and the expression level of preduplication genes are significant predictors of duplicate retention. We find widespread asymmetrical evolution among aurelia paralogs, which is likely caused by gradual pseudogenization rather than by neofunctionalization. Finally, cases of divergent resolution of intermediate WGD duplicates between aurelia species implicate this process acts as an ongoing reinforcement mechanism of reproductive isolation long after a WGD event. Copyright © 2014 by the Genetics Society of America.

  8. A field guide to whole-genome sequencing, assembly and annotation

    Science.gov (United States)

    Ekblom, Robert; Wolf, Jochen B W

    2014-01-01

    Genome sequencing projects were long confined to biomedical model organisms and required the concerted effort of large consortia. Rapid progress in high-throughput sequencing technology and the simultaneous development of bioinformatic tools have democratized the field. It is now within reach for individual research groups in the eco-evolutionary and conservation community to generate de novo draft genome sequences for any organism of choice. Because of the cost and considerable effort involved in such an endeavour, the important first step is to thoroughly consider whether a genome sequence is necessary for addressing the biological question at hand. Once this decision is taken, a genome project requires careful planning with respect to the organism involved and the intended quality of the genome draft. Here, we briefly review the state of the art within this field and provide a step-by-step introduction to the workflow involved in genome sequencing, assembly and annotation with particular reference to large and complex genomes. This tutorial is targeted at scientists with a background in conservation genetics, but more generally, provides useful practical guidance for researchers engaging in whole-genome sequencing projects. PMID:25553065

  9. Whole genome duplication: challenges and considerations associated with sequence orthology assignment in Salmoninae.

    Science.gov (United States)

    Moghadam, H K; Ferguson, M M; Danzmann, R G

    2011-09-01

    To illustrate some of the challenges and considerations in assigning correct orthology necessary for any comparative genomic investigation among salmonids, sequence data from the non-coding regions of different chromosomes in three members of the subfamily Salmoninae, rainbow trout Oncorhynchus mykiss, Atlantic salmon Salmo salar and Arctic charr Salvelinus alpinus, were compared. By analysing c. 55 distinct loci, corresponding to c. 142 kbp sequence information per species, 18 duplicated patterns representative of the two sequential rounds of teleost-specific whole genome duplications (i.e. 3R and 4R WGD) were identified. Sequence similarities between the 4R paralogues were c. 90%, which was slightly lower than those of the 4R orthologues and c. 60% for the 3R products. Through careful examination of the sequence data, however, only 14 loci could reliably be assigned as true orthologues. Locus-specific trees were constructed through maximum parsimony, maximum likelihood and neighbour-joining methods and were rooted using the information from a close relative, lake whitefish Coregonus clupeaformis. All approaches generated congruent trees supporting the {Coregonus [Salmo (Oncorhynchus, Salvelinus)]} topology. The general phenotypic characteristics of sequences, however, were highly suggestive of the basal position of Oncorhynchus, raising the hypothesis of an accelerated rate of nucleotide evolution in this species. © 2011 The Authors. Journal of Fish Biology © 2011 The Fisheries Society of the British Isles.

  10. Whole Genome Sequence Analysis of Pig Respiratory Bacterial Pathogens with Elevated Minimum Inhibitory Concentrations for Macrolides.

    Science.gov (United States)

    Dayao, Denise Ann Estarez; Seddon, Jennifer M; Gibson, Justine S; Blackall, Patrick J; Turni, Conny

    2016-10-01

    Macrolides are often used to treat and control bacterial pathogens causing respiratory disease in pigs. This study analyzed the whole genome sequences of one clinical isolate of Actinobacillus pleuropneumoniae, Haemophilus parasuis, Pasteurella multocida, and Bordetella bronchiseptica, all isolated from Australian pigs to identify the mechanism underlying the elevated minimum inhibitory concentrations (MICs) for erythromycin, tilmicosin, or tulathromycin. The H. parasuis assembled genome had a nucleotide transition at position 2059 (A to G) in the six copies of the 23S rRNA gene. This mutation has previously been associated with macrolide resistance but this is the first reported mechanism associated with elevated macrolide MICs in H. parasuis. There was no known macrolide resistance mechanism identified in the other three bacterial genomes. However, strA and sul2, aminoglycoside and sulfonamide resistance genes, respectively, were detected in one contiguous sequence (contig 1) of A. pleuropneumoniae assembled genome. This contig was identical to plasmids previously identified in Pasteurellaceae. This study has provided one possible explanation of elevated MICs to macrolides in H. parasuis. Further studies are necessary to clarify the mechanism causing the unexplained macrolide resistance in other Australian pig respiratory pathogens including the role of efflux systems, which were detected in all analyzed genomes.

  11. Whole-genome fingerprint of the DNA methylome during human B-cell differentiation

    Science.gov (United States)

    Kulis, Marta; Merkel, Angelika; Heath, Simon; Queirós, Ana C.; Schuyler, Ronald P.; Castellano, Giancarlo; Beekman, Renée; Raineri, Emanuele; Esteve, Anna; Clot, Guillem; Verdaguer-Dot, Nuria; Duran-Ferrer, Martí; Russiñol, Nuria; Vilarrasa-Blasi, Roser; Ecker, Simone; Pancaldi, Vera; Rico, Daniel; Agueda, Lidia; Blanc, Julie; Richardson, David; Clarke, Laura; Datta, Avik; Pascual, Marien; Agirre, Xabier; Prosper, Felipe; Alignani, Diego; Paiva, Bruno; Caron, Gersende; Fest, Thierry; Muench, Marcus O.; Fomin, Marina E.; Lee, Seung-Tae; Wiemels, Joseph L.; Valencia, Alfonso; Gut, Marta; Flicek, Paul; Stunnenberg, Hendrik G.; Siebert, Reiner; Küppers, Ralf; Gut, Ivo G.; Campo, Elías; Martín-Subero, José I.

    2017-01-01

    We analyzed the DNA methylome of ten subpopulations spanning the entire B-cell differentiation program by whole-genome bisulfite sequencing and high-density microarrays. We observed that non-CpG methylation disappeared upon B-cell commitment whereas CpG methylation changed extensively during B-cell maturation, showing an accumulative pattern and affecting around 30% of all measured CpGs. Early differentiation stages mainly displayed enhancer demethylation, which was associated with upregulation of key B-cell transcription factors and affected multiple genes involved in B-cell biology. Late differentiation stages, in contrast, showed extensive demethylation of heterochromatin and methylation gain of polycomb-repressed areas, and did not affect genes with apparent functional impact in B cells. This signature, which has been previously linked to aging and cancer, was particularly widespread in mature cells with extended life span. Comparing B-cell neoplasms with their normal counterparts, we identified that they frequently acquire methylation changes in regions undergoing dynamic methylation already during normal B-cell differentiation. PMID:26053498

  12. The "most wanted" taxa from the human microbiome for whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Anthony A Fodor

    Full Text Available The goal of the Human Microbiome Project (HMP is to generate a comprehensive catalog of human-associated microorganisms including reference genomes representing the most common species. Toward this goal, the HMP has characterized the microbial communities at 18 body habitats in a cohort of over 200 healthy volunteers using 16S rRNA gene (16S sequencing and has generated nearly 1,000 reference genomes from human-associated microorganisms. To determine how well current reference genome collections capture the diversity observed among the healthy microbiome and to guide isolation and future sequencing of microbiome members, we compared the HMP's 16S data sets to several reference 16S collections to create a 'most wanted' list of taxa for sequencing. Our analysis revealed that the diversity of commonly occurring taxa within the HMP cohort microbiome is relatively modest, few novel taxa are represented by these OTUs and many common taxa among HMP volunteers recur across different populations of healthy humans. Taken together, these results suggest that it should be possible to perform whole-genome sequencing on a large fraction of the human microbiome, including the 'most wanted', and that these sequences should serve to support microbiome studies across multiple cohorts. Also, in stark contrast to other taxa, the 'most wanted' organisms are poorly represented among culture collections suggesting that novel culture- and single-cell-based methods will be required to isolate these organisms for sequencing.

  13. Whole Genome Expression Profiling and Signal Pathway Screening of MSCs in Ankylosing Spondylitis

    Directory of Open Access Journals (Sweden)

    Yuxi Li

    2014-01-01

    Full Text Available The pathogenesis of dysfunctional immunoregulation of mesenchymal stem cells (MSCs in ankylosing spondylitis (AS is thought to be a complex process that involves multiple genetic alterations. In this study, MSCs derived from both healthy donors and AS patients were cultured in normal media or media mimicking an inflammatory environment. Whole genome expression profiling analysis of 33,351 genes was performed and differentially expressed genes related to AS were analyzed by GO term analysis and KEGG pathway analysis. Our results showed that in normal media 676 genes were differentially expressed in AS, 354 upregulated and 322 downregulated, while in an inflammatory environment 1767 genes were differentially expressed in AS, 1230 upregulated and 537 downregulated. GO analysis showed that these genes were mainly related to cellular processes, physiological processes, biological regulation, regulation of biological processes, and binding. In addition, by KEGG pathway analysis, 14 key genes from the MAPK signaling and 8 key genes from the TLR signaling pathway were identified as differentially regulated. The results of qRT-PCR verified the expression variation of the 9 genes mentioned above. Our study found that in an inflammatory environment ankylosing spondylitis pathogenesis may be related to activation of the MAPK and TLR signaling pathways.

  14. Whole-genome sequencing of quartet families with autism spectrum disorder.

    Science.gov (United States)

    Yuen, Ryan K C; Thiruvahindrapuram, Bhooma; Merico, Daniele; Walker, Susan; Tammimies, Kristiina; Hoang, Ny; Chrysler, Christina; Nalpathamkalam, Thomas; Pellecchia, Giovanna; Liu, Yi; Gazzellone, Matthew J; D'Abate, Lia; Deneault, Eric; Howe, Jennifer L; Liu, Richard S C; Thompson, Ann; Zarrei, Mehdi; Uddin, Mohammed; Marshall, Christian R; Ring, Robert H; Zwaigenbaum, Lonnie; Ray, Peter N; Weksberg, Rosanna; Carter, Melissa T; Fernandez, Bridget A; Roberts, Wendy; Szatmari, Peter; Scherer, Stephen W

    2015-02-01

    Autism spectrum disorder (ASD) is genetically heterogeneous, with evidence for hundreds of susceptibility loci. Previous microarray and exome-sequencing studies have examined portions of the genome in simplex families (parents and one ASD-affected child) having presumed sporadic forms of the disorder. We used whole-genome sequencing (WGS) of 85 quartet families (parents and two ASD-affected siblings), consisting of 170 individuals with ASD, to generate a comprehensive data resource encompassing all classes of genetic variation (including noncoding variants) and accompanying phenotypes, in apparently familial forms of ASD. By examining de novo and rare inherited single-nucleotide and structural variations in genes previously reported to be associated with ASD or other neurodevelopmental disorders, we found that some (69.4%) of the affected siblings carried different ASD-relevant mutations. These siblings with discordant mutations tended to demonstrate more clinical variability than those who shared a risk variant. Our study emphasizes that substantial genetic heterogeneity exists in ASD, necessitating the use of WGS to delineate all genic and non-genic susceptibility variants in research and in clinical diagnostics.

  15. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder.

    Science.gov (United States)

    C Yuen, Ryan K; Merico, Daniele; Bookman, Matt; L Howe, Jennifer; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D'Abate, Lia; Chan, Ada J S; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson W L; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W

    2017-04-01

    We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible on a cloud platform and through a controlled-access internet portal. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertions and deletions or copy number variations per ASD subject. We identified 18 new candidate ASD-risk genes and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (P = 6 × 10(-4)). In 294 of 2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried copy number variations and/or chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD.

  16. Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic

    Directory of Open Access Journals (Sweden)

    Samantha B. Foley

    2015-01-01

    Full Text Available Despite the potential of whole-genome sequencing (WGS to improve patient diagnosis and care, the empirical value of WGS in the cancer genetics clinic is unknown. We performed WGS on members of two cohorts of cancer genetics patients: those with BRCA1/2 mutations (n = 176 and those without (n = 82. Initial analysis of potentially pathogenic variants (PPVs, defined as nonsynonymous variants with allele frequency < 1% in ESP6500 in 163 clinically-relevant genes suggested that WGS will provide useful clinical results. This is despite the fact that a majority of PPVs were novel missense variants likely to be classified as variants of unknown significance (VUS. Furthermore, previously reported pathogenic missense variants did not always associate with their predicted diseases in our patients. This suggests that the clinical use of WGS will require large-scale efforts to consolidate WGS and patient data to improve accuracy of interpretation of rare variants. While loss-of-function (LoF variants represented only a small fraction of PPVs, WGS identified additional cancer risk LoF PPVs in patients with known BRCA1/2 mutations and led to cancer risk diagnoses in 21% of non-BRCA cancer genetics patients after expanding our analysis to 3209 ClinVar genes. These data illustrate how WGS can be used to improve our ability to discover patients' cancer genetic risks.

  17. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing.

    Science.gov (United States)

    Helman, Elena; Lawrence, Michael S; Stewart, Chip; Sougnez, Carrie; Getz, Gad; Meyerson, Matthew

    2014-07-01

    Retrotransposons constitute a major source of genetic variation, and somatic retrotransposon insertions have been reported in cancer. Here, we applied TranspoSeq, a computational framework that identifies retrotransposon insertions from sequencing data, to whole genomes from 200 tumor/normal pairs across 11 tumor types as part of The Cancer Genome Atlas (TCGA) Pan-Cancer Project. In addition to novel germline polymorphisms, we find 810 somatic retrotransposon insertions primarily in lung squamous, head and neck, colorectal, and endometrial carcinomas. Many somatic retrotransposon insertions occur in known cancer genes. We find that high somatic retrotransposition rates in tumors are associated with high rates of genomic rearrangement and somatic mutation. Finally, we developed TranspoSeq-Exome to interrogate an additional 767 tumor samples with hybrid-capture exome data and discovered 35 novel somatic retrotransposon insertions into exonic regions, including an insertion into an exon of the PTEN tumor suppressor gene. The results of this large-scale, comprehensive analysis of retrotransposon movement across tumor types suggest that somatic retrotransposon insertions may represent an important class of structural variation in cancer. © 2014 Helman et al.; Published by Cold Spring Harbor Laboratory Press.

  18. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

    Science.gov (United States)

    Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  19. Whole-Genome Sequencing Uncovers the Genetic Basis of Chronic Mountain Sickness in Andean Highlanders

    Science.gov (United States)

    Zhou, Dan; Udpa, Nitin; Ronen, Roy; Stobdan, Tsering; Liang, Junbin; Appenzeller, Otto; Zhao, Huiwen W.; Yin, Yi; Du, Yuanping; Guo, Lixia; Cao, Rui; Wang, Yu; Jin, Xin; Huang, Chen; Jia, Wenlong; Cao, Dandan; Guo, Guangwu; Gamboa, Jorge L.; Villafuerte, Francisco; Callacondo, David; Xue, Jin; Liu, Siqi; Frazer, Kelly A.; Li, Yingrui; Bafna, Vineet; Haddad, Gabriel G.

    2013-01-01

    The hypoxic conditions at high altitudes present a challenge for survival, causing pressure for adaptation. Interestingly, many high-altitude denizens (particularly in the Andes) are maladapted, with a condition known as chronic mountain sickness (CMS) or Monge disease. To decode the genetic basis of this disease, we sequenced and compared the whole genomes of 20 Andean subjects (10 with CMS and 10 without). We discovered 11 regions genome-wide with significant differences in haplotype frequencies consistent with selective sweeps. In these regions, two genes (an erythropoiesis regulator, SENP1, and an oncogene, ANP32D) had a higher transcriptional response to hypoxia in individuals with CMS relative to those without. We further found that downregulating the orthologs of these genes in flies dramatically enhanced survival rates under hypoxia, demonstrating that suppression of SENP1 and ANP32D plays an essential role in hypoxia tolerance. Our study provides an unbiased framework to identify and validate the genetic basis of adaptation to high altitudes and identifies potentially targetable mechanisms for CMS treatment. PMID:23954164

  20. Whole genome PCR scanning (WGPS) of Coxiella burnetii strains from ruminants.

    Science.gov (United States)

    Sidi-Boumedine, Karim; Adam, Gilbert; Angen, Øysten; Aspán, Anna; Bossers, Alex; Roest, Hendrik-Jan; Prigent, Myriam; Thiéry, Richard; Rousset, Elodie

    2015-01-01

    Coxiella burnetii is the causative agent of Q fever, a zoonosis that spreads from ruminants to humans via the inhalation of aerosols contaminated by livestock's birth products. This study aimed to compare the genomes of strains isolated from ruminants by "Whole Genome PCR Scanning (WGPS)" in order to identify genomic differences. C. burnetii isolated from different ruminant hosts were compared to the Nine Mile reference strain using WGPS. The identified genomic regions of differences (RDs) were confirmed by sequencing. A set of 219 primers for amplification of 10 kbp segments covering the entire genome was obtained. The analyses revealed the presence of: i) conserved genomic regions, ii) genomic polymorphism including insertions and deletions and iii) amplification failures in some cases as well. WGPS, a descriptive approach, allowed the identification and localization of divergent genetic loci from various strains of C. burnetii which consisted of deletions, insertions and maybe genomic rearrangements. It also substantiates the role played by the IS1111 element in the genomic plasticity of C. burnetii. We believe that this approach could be combined with new sequencing technologies, as a selective/directed sequencing approach, particularly when repeated sequences are present in the analysed genomes. Copyright © 2015 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  1. Whole genome expression profiling in chewing-tobacco-associated oral cancers: a pilot study.

    Science.gov (United States)

    Chakrabarti, Sanjukta; Multani, Shaleen; Dabholkar, Jyoti; Saranath, Dhananjaya

    2015-03-01

    The current study was undertaken with a view to identify differential biomarkers in chewing-tobacco-associated oral cancer tissues in patients of Indian ethnicity. The gene expression profile was analyzed in oral cancer tissues as compared to clinically normal oral buccal mucosa. We examined 30 oral cancer tissues and 27 normal oral tissues with 16 paired samples from contralateral site of the patient and 14 unpaired samples from different oral cancer patients, for whole genome expression using high-throughput IlluminaSentrix Human Ref-8 v2 Expression BeadChip array. The cDNA microarray analysis identified 425 differentially expressed genes with >1.5-fold expression in the oral cancer tissues as compared to normal tissues in the oral cancer patients. Overexpression of 255 genes and downregulation of 170 genes (p TNFSF13B, TMPRSS11A); signal transduction (FOLR2, MME, HTR3B); invasion and metastasis (SPP1, TNFAIP6, EPHB6); differentiation (CLEC4A, ELF5); angiogenesis (CXCL1); apoptosis (GLIPR1, WISP1, DAPL1); and immune responses (CD300A, IFIT2, TREM2); and metabolism (NNMT; ALDH3A1). Besides, several of the genes have been differentially expressed in human cancers including oral cancer. Our data indicated differentially expressed genes in oral cancer tissues and may identify prognostic and therapeutic biomarkers in oral cancers, postvalidation in larger numbers and varied population samples.

  2. A novel strategy for clustering major depression individuals using whole-genome sequencing variant data.

    Science.gov (United States)

    Yu, Chenglong; Baune, Bernhard T; Licinio, Julio; Wong, Ma-Li

    2017-03-13

    Major depressive disorder (MDD) is highly prevalent, resulting in an exceedingly high disease burden. The identification of generic risk factors could lead to advance prevention and therapeutics. Current approaches examine genotyping data to identify specific variations between cases and controls. Compared to genotyping, whole-genome sequencing (WGS) allows for the detection of private mutations. In this proof-of-concept study, we establish a conceptually novel computational approach that clusters subjects based on the entirety of their WGS. Those clusters predicted MDD diagnosis. This strategy yielded encouraging results, showing that depressed Mexican-American participants were grouped closer; in contrast ethnically-matched controls grouped away from MDD patients. This implies that within the same ancestry, the WGS data of an individual can be used to check whether this individual is within or closer to MDD subjects or to controls. We propose a novel strategy to apply WGS data to clinical medicine by facilitating diagnosis through genetic clustering. Further studies utilising our method should examine larger WGS datasets on other ethnical groups.

  3. Whole genome sequencing and complete genetic analysis reveals novel pathways to glycopeptide resistance in Staphylococcus aureus.

    Directory of Open Access Journals (Sweden)

    Adriana Renzoni

    Full Text Available The precise mechanisms leading to the emergence of low-level glycopeptide resistance in Staphylococcus aureus are poorly understood. In this study, we used whole genome deep sequencing to detect differences between two isogenic strains: a parental strain and a stable derivative selected stepwise for survival on 4 µg/ml teicoplanin, but which grows at higher drug concentrations (MIC 8 µg/ml. We uncovered only three single nucleotide changes in the selected strain. Nonsense mutations occurred in stp1, encoding a serine/threonine phosphatase, and in yjbH, encoding a post-transcriptional negative regulator of the redox/thiol stress sensor and global transcriptional regulator, Spx. A missense mutation (G45R occurred in the histidine kinase sensor of cell wall stress, VraS. Using genetic methods, all single, pairwise combinations, and a fully reconstructed triple mutant were evaluated for their contribution to low-level glycopeptide resistance. We found a synergistic cooperation between dual phospho-signalling systems and a subtle contribution from YjbH, suggesting the activation of oxidative stress defences via Spx. To our knowledge, this is the first genetic demonstration of multiple sensor and stress pathways contributing simultaneously to glycopeptide resistance development. The multifactorial nature of glycopeptide resistance in this strain suggests a complex reprogramming of cell physiology to survive in the face of drug challenge.

  4. Exonic remnants of whole-genome duplication reveal cis-regulatory function of coding exons.

    Science.gov (United States)

    Dong, Xianjun; Navratilova, Pavla; Fredman, David; Drivenes, Øyvind; Becker, Thomas S; Lenhard, Boris

    2010-03-01

    Using a comparative genomics approach to reconstruct the fate of genomic regulatory blocks (GRBs) and identify exonic remnants that have survived the disappearance of their host genes after whole-genome duplication (WGD) in teleosts, we discover a set of 38 candidate cis-regulatory coding exons (RCEs) with predicted target genes. These elements demonstrate evolutionary separation of overlapping protein-coding and regulatory information after WGD in teleosts. We present evidence that the corresponding mammalian exons are still under both coding and non-coding selection pressure, are more conserved than other protein coding exons in the host gene and several control sets, and share key characteristics with highly conserved non-coding elements in the same regions. Their dual function is corroborated by existing experimental data. Additionally, we show examples of human exon remnants stemming from the vertebrate 2R WGD. Our findings suggest that long-range cis-regulatory inputs for developmental genes are not limited to non-coding regions, but can also overlap the coding sequence of unrelated genes. Thus, exonic regulatory elements in GRBs might be functionally equivalent to those in non-coding regions, calling for a re-evaluation of the sequence space in which to look for long-range regulatory elements and experimentally test their activity.

  5. Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank; Platt, Darren

    2006-02-06

    The DOE Joint Genome Institute has sequenced over 50 eukaryotic genomes, ranging in size from 15 MB to 1.6 GB, over a wide range of organism types. In the course of doing so, it has become clear that a substantial fraction of these data sets contains bonus organisms, usually prokaryotes, in addition to the desired genome. While some of these additional organisms are extraneous contamination, they are sometimes symbionts, and so can be of biological interest. Therefore, it is desirable to assemble the bonus organisms along with the main genome. This transforms the problem into one of metagenomic assembly, which is considerably more challenging than traditional whole-genome shotgun (WGS) assembly. The different organisms will usually be present at different sequence depths, which is difficult to handle in most WGS assemblers. In addition, with multiple distinct genomes present, chimerism can produce cross-organism combinations. Finally, there is no guarantee that only a single bonus organism will be present. For example, one JGI project contained at least two different prokaryotic contaminants, plus a 145 KB plasmid of unknown origin. We have developed techniques to routinely identify and handle such bonus organisms in a high-throughput sequencing environment. Approaches include screening and partitioning the unassembled data, and iterative subassemblies. These methods are applicable not only to bonus organisms, but also to desired components such as organelles. These procedures have the additional benefit of identifying, and allowing for the removal of, cloning artifacts such as E.coli and spurious vector inclusions.

  6. Whole-genome sequencing of a laboratory-evolved yeast strain

    Directory of Open Access Journals (Sweden)

    Dunham Maitreya J

    2010-02-01

    Full Text Available Abstract Background Experimental evolution of microbial populations provides a unique opportunity to study evolutionary adaptation in response to controlled selective pressures. However, until recently it has been difficult to identify the precise genetic changes underlying adaptation at a genome-wide scale. New DNA sequencing technologies now allow the genome of parental and evolved strains of microorganisms to be rapidly determined. Results We sequenced >93.5% of the genome of a laboratory-evolved strain of the yeast Saccharomyces cerevisiae and its ancestor at >28× depth. Both single nucleotide polymorphisms and copy number amplifications were found, with specific gains over array-based methodologies previously used to analyze these genomes. Applying a segmentation algorithm to quantify structural changes, we determined the approximate genomic boundaries of a 5× gene amplification. These boundaries guided the recovery of breakpoint sequences, which provide insights into the nature of a complex genomic rearrangement. Conclusions This study suggests that whole-genome sequencing can provide a rapid approach to uncover the genetic basis of evolutionary adaptations, with further applications in the study of laboratory selections and mutagenesis screens. In addition, we show how single-end, short read sequencing data can provide detailed information about structural rearrangements, and generate predictions about the genomic features and processes that underlie genome plasticity.

  7. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.

    Science.gov (United States)

    Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G

    2012-12-07

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.

  8. Use of bacterial whole-genome sequencing to investigate local persistence and spread in bovine tuberculosis

    Directory of Open Access Journals (Sweden)

    Hannah Trewby

    2016-03-01

    Full Text Available Mycobacterium bovis is the causal agent of bovine tuberculosis, one of the most important diseases currently facing the UK cattle industry. Here, we use high-density whole genome sequencing (WGS in a defined sub-population of M. bovis in 145 cattle across 66 herd breakdowns to gain insights into local spread and persistence. We show that despite low divergence among isolates, WGS can in principle expose contributions of under-sampled host populations to M. bovis transmission. However, we demonstrate that in our data such a signal is due to molecular type switching, which had been previously undocumented for M. bovis. Isolates from farms with a known history of direct cattle movement between them did not show a statistical signal of higher genetic similarity. Despite an overall signal of genetic isolation by distance, genetic distances also showed no apparent relationship with spatial distance among affected farms over distances <5 km. Using simulations, we find that even over the brief evolutionary timescale covered by our data, Bayesian phylogeographic approaches are feasible. Applying such approaches showed that M. bovis dispersal in this system is heterogeneous but slow overall, averaging 2 km/year. These results confirm that widespread application of WGS to M. bovis will bring novel and important insights into the dynamics of M. bovis spread and persistence, but that the current questions most pertinent to control will be best addressed using approaches that more directly integrate WGS with additional epidemiological data.

  9. Whole genome sequencing reveals a de novo SHANK3 mutation in familial autism spectrum disorder.

    Directory of Open Access Journals (Sweden)

    Sergio I Nemirovsky

    Full Text Available Clinical genomics promise to be especially suitable for the study of etiologically heterogeneous conditions such as Autism Spectrum Disorder (ASD. Here we present three siblings with ASD where we evaluated the usefulness of Whole Genome Sequencing (WGS for the diagnostic approach to ASD.We identified a family segregating ASD in three siblings with an unidentified cause. We performed WGS in the three probands and used a state-of-the-art comprehensive bioinformatic analysis pipeline and prioritized the identified variants located in genes likely to be related to ASD. We validated the finding by Sanger sequencing in the probands and their parents.Three male siblings presented a syndrome characterized by severe intellectual disability, absence of language, autism spectrum symptoms and epilepsy with negative family history for mental retardation, language disorders, ASD or other psychiatric disorders. We found germline mosaicism for a heterozygous deletion of a cytosine in the exon 21 of the SHANK3 gene, resulting in a missense sequence of 5 codons followed by a premature stop codon (NM_033517:c.3259_3259delC, p.Ser1088Profs*6.We reported an infrequent form of familial ASD where WGS proved useful in the clinic. We identified a mutation in SHANK3 that underscores its relevance in Autism Spectrum Disorder.

  10. Whole-genome analyses resolve early branches in the tree of life of modern birds

    Science.gov (United States)

    Jarvis, Erich D.; Mirarab, Siavash; Aberer, Andre J.; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon Y. W.; Faircloth, Brant C.; Nabholz, Benoit; Howard, Jason T.; Suh, Alexander; Weber, Claudia C.; da Fonseca, Rute R.; Li, Jianwen; Zhang, Fang; Li, Hui; Zhou, Long; Narula, Nitish; Liu, Liang; Ganapathy, Ganesh; Boussau, Bastien; Bayzid, Md. Shamsuzzoha; Zavidovych, Volodymyr; Subramanian, Sankar; Gabaldón, Toni; Capella-Gutiérrez, Salvador; Huerta-Cepas, Jaime; Rekepalli, Bhanu; Munch, Kasper; Schierup, Mikkel; Lindow, Bent; Warren, Wesley C.; Ray, David; Green, Richard E.; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Li, Shengbin; Li, Ning; Huang, Yinhua; Derryberry, Elizabeth P.; Bertelsen, Mads Frost; Sheldon, Frederick H.; Brumfield, Robb T.; Mello, Claudio V.; Lovell, Peter V.; Wirthlin, Morgan; Schneider, Maria Paula Cruz; Prosdocimi, Francisco; Samaniego, José Alfredo; Velazquez, Amhed Missael Vargas; Alfaro-Núñez, Alonzo; Campos, Paula F.; Petersen, Bent; Sicheritz-Ponten, Thomas; Pas, An; Bailey, Tom; Scofield, Paul; Bunce, Michael; Lambert, David M.; Zhou, Qi; Perelman, Polina; Driskell, Amy C.; Shapiro, Beth; Xiong, Zijun; Zeng, Yongli; Liu, Shiping; Li, Zhenyu; Liu, Binghang; Wu, Kui; Xiao, Jin; Yinqi, Xiong; Zheng, Qiuemei; Zhang, Yong; Yang, Huanming; Wang, Jian; Smeds, Linnea; Rheindt, Frank E.; Braun, Michael; Fjeldsa, Jon; Orlando, Ludovic; Barker, F. Keith; Jønsson, Knud Andreas; Johnson, Warren; Koepfli, Klaus-Peter; O’Brien, Stephen; Haussler, David; Ryder, Oliver A.; Rahbek, Carsten; Willerslev, Eske; Graves, Gary R.; Glenn, Travis C.; McCormack, John; Burt, Dave; Ellegren, Hans; Alström, Per; Edwards, Scott V.; Stamatakis, Alexandros; Mindell, David P.; Cracraft, Joel; Braun, Edward L.; Warnow, Tandy; Jun, Wang; Gilbert, M. Thomas P.; Zhang, Guojie

    2015-01-01

    To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago. PMID:25504713

  11. Whole-genome transcriptional analysis of heavy metal stresses inCaulobacter crescentus

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Ping; Brodie, Eoin L.; Suzuki, Yohey; McAdams, Harley H.; Andersen, Gary L.

    2005-09-21

    The bacterium Caulobacter crescentus and related stalkbacterial species are known for their distinctive ability to live in lownutrient environments, a characteristic of most heavy metal contaminatedsites. Caulobacter crescentus is a model organism for studying cell cycleregulation with well developed genetics. We have identified the pathwaysresponding to heavy metal toxicity in C. crescentus to provide insightsfor possible application of Caulobacter to environmental restoration. Weexposed C. crescentus cells to four heavy metals (chromium, cadmium,selenium and uranium) and analyzed genome wide transcriptional activitiespost exposure using a Affymetrix GeneChip microarray. C. crescentusshowed surprisingly high tolerance to uranium, a possible mechanism forwhich may be formation of extracellular calcium-uranium-phosphateprecipitates. The principal response to these metals was protectionagainst oxidative stress (up-regulation of manganese-dependent superoxidedismutase, sodA). Glutathione S-transferase, thioredoxin, glutaredoxinsand DNA repair enzymes responded most strongly to cadmium and chromate.The cadmium and chromium stress response also focused on reducing theintracellular metal concentration, with multiple efflux pumps employed toremove cadmium while a sulfate transporter was down-regulated to reducenon-specific uptake of chromium. Membrane proteins were also up-regulatedin response to most of the metals tested. A two-component signaltransduction system involved in the uranium response was identified.Several differentially regulated transcripts from regions previously notknown to encode proteins were identified, demonstrating the advantage ofevaluating the transcriptome using whole genome microarrays.

  12. Homoeologous chromosomes of Xenopus laevis are highly conserved after whole-genome duplication.

    Science.gov (United States)

    Uno, Y; Nishida, C; Takagi, C; Ueno, N; Matsuda, Y

    2013-11-01

    It has been suggested that whole-genome duplication (WGD) occurred twice during the evolutionary process of vertebrates around 450 and 500 million years ago, which contributed to an increase in the genomic and phenotypic complexities of vertebrates. However, little is still known about the evolutionary process of homoeologous chromosomes after WGD because many duplicate genes have been lost. Therefore, Xenopus laevis (2n=36) and Xenopus (Silurana) tropicalis (2n=20) are good animal models for studying the process of genomic and chromosomal reorganization after WGD because X. laevis is an allotetraploid species that resulted from WGD after the interspecific hybridization of diploid species closely related to X. tropicalis. We constructed a comparative cytogenetic map of X. laevis using 60 complimentary DNA clones that covered the entire chromosomal regions of 10 pairs of X. tropicalis chromosomes. We consequently identified all nine homoeologous chromosome groups of X. laevis. Hybridization signals on two pairs of X. laevis homoeologous chromosomes were detected for 50 of 60 (83%) genes, and the genetic linkage is highly conserved between X. tropicalis and X. laevis chromosomes except for one fusion and one inversion and also between X. laevis homoeologous chromosomes except for two inversions. These results indicate that the loss of duplicated genes and inter- and/or intrachromosomal rearrangements occurred much less frequently in this lineage, suggesting that these events were not essential for diploidization of the allotetraploid genome in X. laevis after WGD.

  13. Whole-Genome Sequencing of Native Sheep Provides Insights into Rapid Adaptations to Extreme Environments

    Science.gov (United States)

    Yang, Ji; Li, Wen-Rong; Lv, Feng-Hua; He, San-Gang; Tian, Shi-Lin; Peng, Wei-Feng; Sun, Ya-Wei; Zhao, Yong-Xin; Tu, Xiao-Long; Zhang, Min; Xie, Xing-Long; Wang, Yu-Tao; Li, Jin-Quan; Liu, Yong-Gang; Shen, Zhi-Qiang; Wang, Feng; Liu, Guang-Jian; Lu, Hong-Feng; Kantanen, Juha; Han, Jian-Lin; Li, Meng-Hua; Liu, Ming-Jun

    2016-01-01

    Global climate change has a significant effect on extreme environments and a profound influence on species survival. However, little is known of the genome-wide pattern of livestock adaptations to extreme environments over a short time frame following domestication. Sheep (Ovis aries) have become well adapted to a diverse range of agroecological zones, including certain extreme environments (e.g., plateaus and deserts), during their post-domestication (approximately 8–9 kya) migration and differentiation. Here, we generated whole-genome sequences from 77 native sheep, with an average effective sequencing depth of ∼5× for 75 samples and ∼42× for 2 samples. Comparative genomic analyses among sheep in contrasting environments, that is, plateau (>4,000 m above sea level) versus lowland (1500 m) versus low-altitude region (600 mm), and arid zone (400 mm), detected a novel set of candidate genes as well as pathways and GO categories that are putatively associated with hypoxia responses at high altitudes and water reabsorption in arid environments. In addition, candidate genes and GO terms functionally related to energy metabolism and body size variations were identified. This study offers novel insights into rapid genomic adaptations to extreme environments in sheep and other animals, and provides a valuable resource for future research on livestock breeding in response to climate change. PMID:27401233

  14. Prospective Whole-Genome Sequencing Enhances National Surveillance of Listeria monocytogenes.

    Science.gov (United States)

    Kwong, Jason C; Mercoulia, Karolina; Tomita, Takehiro; Easton, Marion; Li, Hua Y; Bulach, Dieter M; Stinear, Timothy P; Seemann, Torsten; Howden, Benjamin P

    2016-02-01

    Whole-genome sequencing (WGS) has emerged as a powerful tool for comparing bacterial isolates in outbreak detection and investigation. Here we demonstrate that WGS performed prospectively for national epidemiologic surveillance of Listeria monocytogenes has the capacity to be superior to our current approaches using pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable-number tandem-repeat analysis (MLVA), binary typing, and serotyping. Initially 423 L. monocytogenes isolates underwent WGS, and comparisons uncovered a diverse genetic population structure derived from three distinct lineages. MLST, binary typing, and serotyping results inferred in silico from the WGS data were highly concordant (>99%) with laboratory typing performed in parallel. However, WGS was able to identify distinct nested clusters within groups of isolates that were otherwise indistinguishable using our current typing methods. Routine WGS was then used for prospective epidemiologic surveillance on a further 97 L. monocytogenes isolates over a 12-month period, which provided a greater level of discrimination than that of conventional typing for inferring linkage to point source outbreaks. A risk-based alert system based on WGS similarity was used to inform epidemiologists required to act on the data. Our experience shows that WGS can be adopted for prospective L. monocytogenes surveillance and investigated for other pathogens relevant to public health. Copyright © 2016 Kwong et al.

  15. The present and future of de novo whole-genome assembly.

    Science.gov (United States)

    Sohn, Jang-Il; Nam, Jin-Wu

    2018-01-01

    As the advent of next-generation sequencing (NGS) technology, various de novo assembly algorithms based on the de Bruijn graph have been developed to construct chromosome-level sequences. However, numerous technical or computational challenges in de novo assembly still remain, although many bright ideas and heuristics have been suggested to tackle the challenges in both experimental and computational settings. In this review, we categorize de novo assemblers on the basis of the type of de Bruijn graphs (Hamiltonian and Eulerian) and discuss the challenges of de novo assembly for short NGS reads regarding computational complexity and assembly ambiguity. Then, we discuss how the limitations of the short reads can be overcome by using a single-molecule sequencing platform that generates long reads of up to several kilobases. In fact, the long read assembly has caused a paradigm shift in whole-genome assembly in terms of algorithms and supporting steps. We also summarize (i) hybrid assemblies using both short and long reads and (ii) overlap-based assemblies for long reads and discuss their challenges and future prospects. This review provides guidelines to determine the optimal approach for a given input data type, computational budget or genome. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  16. A whole-genome assembly of the domestic cow, Bos taurus

    Science.gov (United States)

    Zimin, Aleksey V; Delcher, Arthur L; Florea, Liliana; Kelley, David R; Schatz, Michael C; Puiu, Daniela; Hanrahan, Finnian; Pertea, Geo; Van Tassell, Curtis P; Sonstegard, Tad S; Marçais, Guillaume; Roberts, Michael; Subramanian, Poorani; Yorke, James A; Salzberg, Steven L

    2009-01-01

    Background The genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion base pairs that has multiple improvements over previous assemblies: it is more complete, covering more of the genome; thousands of gaps have been closed; many erroneous inversions, deletions, and translocations have been corrected; and thousands of single-nucleotide errors have been corrected. Our evaluation using independent metrics demonstrates that the resulting assembly is substantially more accurate and complete than alternative versions. Conclusions By using independent mapping data and conserved synteny between the cow and human genomes, we were able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes. We constructed a new cow-human synteny map that expands upon previous maps. We also identified for the first time a portion of the B. taurus Y chromosome. PMID:19393038

  17. Impacts of Whole-Genome Triplication on MIRNA Evolution in Brassica rapa.

    Science.gov (United States)

    Sun, Chao; Wu, Jian; Liang, Jianli; Schnable, James C; Yang, Wencai; Cheng, Feng; Wang, Xiaowu

    2015-11-01

    MicroRNAs (miRNAs) are a class of short non-coding, endogenous RNAs that play essential roles in eukaryotes. Although the influence of whole-genome triplication (WGT) on protein-coding genes has been well documented in Brassica rapa, little is known about its impacts on MIRNAs. In this study, through generating a comprehensive annotation of 680 MIRNAs for B. rapa, we analyzed the evolutionary characteristics of these MIRNAs from different aspects in B. rapa. First, while MIRNAs and genes show similar patterns of biased distribution among subgenomes of B. rapa, we found that MIRNAs are much more overretained than genes following fractionation after WGT. Second, multiple-copy MIRNAs show significant sequence conservation than that of single-copy MIRNAs, which is opposite to that of genes. This indicates that increased purifying selection is acting upon these highly retained multiple-copy MIRNAs and their functional importance over singleton MIRNAs. Furthermore, we found the extensive divergence between pairs of miRNAs and their target genes following the WGT in B. rapa. In summary, our study provides a valuable resource for exploring MIRNA in B. rapa and highlights the impacts of WGT on the evolution of MIRNA. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. A bioinformatic approach to understanding antibiotic resistance in intracellular bacteria through whole genome analysis.

    Science.gov (United States)

    Biswas, Silpak; Raoult, Didier; Rolain, Jean-Marc

    2008-09-01

    Intracellular bacteria survive within eukaryotic host cells and are difficult to kill with certain antibiotics. As a result, antibiotic resistance in intracellular bacteria is becoming commonplace in healthcare institutions. Owing to the lack of methods available for transforming these bacteria, we evaluated the mechanisms of resistance using molecular methods and in silico genome analysis. The objective of this review was to understand the molecular mechanisms of antibiotic resistance through in silico comparisons of the genomes of obligate and facultative intracellular bacteria. The available data on in vitro mutants reported for intracellular bacteria were also reviewed. These genomic data were analysed to find natural mutations in known target genes involved in antibiotic resistance and to look for the presence or absence of different resistance determinants. Our analysis revealed the presence of tetracycline resistance protein (Tet) in Bartonella quintana, Francisella tularensis and Brucella ovis; moreover, most of the Francisella strains possessed the blaA gene, AmpG protein and metallo-beta-lactamase family protein. The presence or absence of folP (dihydropteroate synthase) and folA (dihydrofolate reductase) genes in the genome could explain natural resistance to co-trimoxazole. Finally, multiple genes encoding different efflux pumps were studied. This in silico approach was an effective method for understanding the mechanisms of antibiotic resistance in intracellular bacteria. The whole genome sequence analysis will help to predict several important phenotypic characteristics, in particular resistance to different antibiotics. In the future, stable mutants should be obtained through transformation methods in order to demonstrate experimentally the determinants of resistance in intracellular bacteria.

  19. Isolation and whole genome analysis of endospore-forming bacteria from heroin.

    Science.gov (United States)

    Kalinowski, Jörn; Ahrens, Björn; Al-Dilaimi, Arwa; Winkler, Anika; Wibberg, Daniel; Schleenbecker, Uwe; Rückert, Christian; Wölfel, Roman; Grass, Gregor

    2018-01-01

    Infections caused by endospore-forming bacteria have been associated with severe illness and death among persons who inject drugs. Analysis of the bacteria residing in heroin has thus been biased towards species that affect human health. Similarly, exploration of the bacterial diversity of seized street market heroin correlated with the skin microflora of recreational heroin users insofar as different Staphylococus spp. or typical environmental endospore formers including Bacillus cereus and other Bacilli outside the B. cereus sensu lato group as well as diverse Clostridia were identified. In this work 82 samples of non-street market ("wholesale") heroin originating from the German Federal Criminal Police Office's heroin analysis program seized during the period between 2009 and 2014 were analyzed for contaminating bacteria. Without contact with the end user and with only little contaminations introduced by final processing, adulteration and cutting this heroin likely harbors original microbiota from the drug's original source or trafficking route. We found this drug to be only sparsely populated with retrievable heterotrophic, aerobic bacteria. In total, 68 isolates were retrieved from 49 out of 82 samples analyzed (60% culture positive). All isolates were endospore-forming, Gram-positive Bacilli. Completely absent were non-endospore-formers or Gram-negatives. The three most predominant species were Bacillus clausii, Bacillus (para)licheniformis, and Terribacillus saccharophilus. Whole genome sequencing of these 68 isolates was performed using Illumina technology. Sequence data sets were assembled and annotated using an automated bioinformatics pipeline. Average nucleotide identity (ANI) values were calculated for all draft genomes and all close to identical genomes (ANI>99.5%) were compared to the forensic data of the seized drug, showing positive correlations that strongly warrant further research on this subject. Copyright © 2017 Elsevier B.V. All rights

  20. Whole-genome DNA methylation characteristics in pediatric precursor B cell acute lymphoblastic leukemia (BCP ALL.

    Directory of Open Access Journals (Sweden)

    Radosław Chaber

    Full Text Available In addition to genetic alterations, epigenetic abnormalities have been shown to underlie the pathogenesis of acute lymphoblastic leukemia (ALL-the most common pediatric cancer. The purpose of this study was to characterize the whole genome DNA methylation profile in children with precursor B-cell ALL (BCP ALL and to compare this profile with methylation observed in normal bone marrow samples. Additional efforts were made to correlate the observed methylation patterns with selected clinical features. We assessed DNA methylation from bone marrow samples obtained from 38 children with BCP ALL at the time of diagnosis along with 4 samples of normal bone marrow cells as controls using Infinium MethylationEPIC BeadChip Array. Patients were diagnosed and stratified into prognosis groups according to the BFM ALL IC 2009 protocol. The analysis of differentially methylated sites across the genome as well as promoter methylation profiles allowed clear separation of the leukemic and control samples into two clusters. 86.6% of the promoter-associated differentially methylated sites were hypermethylated in BCP ALL. Seven sites were found to correlate with the BFM ALL IC 2009 high risk group. Amongst these, one was located within the gene body of the MBP gene and another was within the promoter region- PSMF1 gene. Differentially methylated sites that were significantly related with subsets of patients with ETV6-RUNX1 fusion and hyperdiploidy. The analyzed translocations and change of genes' sequence context does not affect methylation and methylation seems not to be a mechanism for the regulation of expression of the resulting fusion genes.

  1. Whole-genome resequencing uncovers molecular signatures of natural and sexual selection in wild bighorn sheep.

    Science.gov (United States)

    Kardos, Marty; Luikart, Gordon; Bunch, Rowan; Dewey, Sarah; Edwards, William; McWilliam, Sean; Stephenson, John; Allendorf, Fred W; Hogg, John T; Kijas, James

    2015-11-01

    The identification of genes influencing fitness is central to our understanding of the genetic basis of adaptation and how it shapes phenotypic variation in wild populations. Here, we used whole-genome resequencing of wild Rocky Mountain bighorn sheep (Ovis canadensis) to >50-fold coverage to identify 2.8 million single nucleotide polymorphisms (SNPs) and genomic regions bearing signatures of directional selection (i.e. selective sweeps). A comparison of SNP diversity between the X chromosome and the autosomes indicated that bighorn males had a dramatically reduced long-term effective population size compared to females. This probably reflects a long history of intense sexual selection mediated by male-male competition for mates. Selective sweep scans based on heterozygosity and nucleotide diversity revealed evidence for a selective sweep shared across multiple populations at RXFP2, a gene that strongly affects horn size in domestic ungulates. The massive horns carried by bighorn rams appear to have evolved in part via strong positive selection at RXFP2. We identified evidence for selection within individual populations at genes affecting early body growth and cellular response to hypoxia; however, these must be interpreted more cautiously as genetic drift is strong within local populations and may have caused false positives. These results represent a rare example of strong genomic signatures of selection identified at genes with known function in wild populations of a nonmodel species. Our results also showcase the value of reference genome assemblies from agricultural or model species for studies of the genomic basis of adaptation in closely related wild taxa. © 2015 John Wiley & Sons Ltd.

  2. Whole Genome Association Studies of Residual Feed Intake and Related Traits in the Pig.

    Science.gov (United States)

    Onteru, Suneel K; Gorbach, Danielle M; Young, Jennifer M; Garrick, Dorian J; Dekkers, Jack C M; Rothschild, Max F

    2013-01-01

    Residual feed intake (RFI), a measure of feed efficiency, is the difference between observed feed intake and the expected feed requirement predicted from growth and maintenance. Pigs with low RFI have reduced feed costs without compromising their growth. Identification of genes or genetic markers associated with RFI will be useful for marker-assisted selection at an early age of animals with improved feed efficiency. Whole genome association studies (WGAS) for RFI, average daily feed intake (ADFI), average daily gain (ADG), back fat (BF) and loin muscle area (LMA) were performed on 1,400 pigs from the divergently selected ISU-RFI lines, using the Illumina PorcineSNP60 BeadChip. Various statistical methods were applied to find SNPs and genomic regions associated with the traits, including a Bayesian approach using GenSel software, and frequentist approaches such as allele frequency differences between lines, single SNP and haplotype analyses using PLINK software. Single SNP and haplotype analyses showed no significant associations (except for LMA) after genomic control and FDR. Bayesian analyses found at least 2 associations for each trait at a false positive probability of 0.5. At generation 8, the RFI selection lines mainly differed in allele frequencies for SNPs near (energy homeostasis (e.g., MC4R, PGM1, GPR81) and muscle growth related genes (e.g., TGFB1) with ADG, and of fat metabolism genes (e.g., ACOXL, AEBP1) with BF. Specifically, a very highly significantly associated QTL for LMA on SSC7 with skeletal myogenesis genes (e.g., KLHL31) was identified for subsequent fine mapping. Important genomic regions associated with RFI related traits were identified for future validation studies prior to their incorporation in marker-assisted selection programs.

  3. Whole genomic analysis of G2P[4] human Rotaviruses in Mymensingh, north-central Bangladesh

    Directory of Open Access Journals (Sweden)

    Satoru Aida

    2016-09-01

    Full Text Available Rotavirus A (RVA is a dominant causative agent of acute gastroenteritis in children worldwide. G2P[4] is one of the most common genotypes among human rotavirus (HRV strains, and has been persistently prevalent in South Asia including Bangladesh. In the present study, whole genome sequences of a total of 16 G2P[4] HRV strains (8 strains each in 2010 and 2013 detected in Mymensingh, north-central Bangladesh were determined. These strains had typical DS-1-like genotype constellation. Most of gene segments from DS-1 genogroup exhibited high level sequence identities to each other (>98%, while slight diversity was observed for VP1, VP3, and NSP4 genes. By phylogenetic analysis, individual RNA segments were classified into one (V or two-three lineages (V–VI or V–VII. In terms of lineages (sublineages of 11 gene segments, the 16 Bangladeshi strains could be further classified into four clades (A-D containing 8 lineage constellations, revealing the presence of three clades (A-C with three lineage constellations in 2010, and a single clade (D with four constellations in 2013. Therefore, co-existence of multiple G2P[4] HRV strains with different lineage constellations, and change in clades for the study period were demonstrated. Although amino acids in the antigenic regions on VP7 and VP4 were mostly identical to those of global G2P[4] strains after 2000, VP4 of clade D RVAs in 2013 had alanine and proline at positions 88 and 114, respectively, which are novel substitutions compared with recent global G2P[4] strains. Replacement of lineage constellations associated with unique amino acid changes in the antigenic region in VP4 suggested continuous genetic evolutionary state for emerging new G2P[4] rotavirus strains in Bangladesh.

  4. Analysis of the differences in whole-genome expression related to asthma and obesity.

    Science.gov (United States)

    Gruchała-Niedoszytko, Marta; Niedoszytko, Marek; Sanjabi, Bahram; van der Vlies, Pieter; Niedoszytko, Piotr; Jassem, Ewa; Małgorzewicz, Sylwia

    2015-01-01

    Concomitant obesity significantly impairs asthma control. Obese asthmatics show more severe symptoms and an increased use of medications. The primary aim of the study was to identify genes that are differentially expressed in the peripheral blood of asthmatic patients with obesity, asthmatic patients with normal body mass, and obese patients without asthma. Secondly, we investigated whether the analysis of gene expression in peripheral blood may be helpful in the differential diagnosis of obese patients who present with symptoms similar to asthma. The study group included 15 patients with asthma (9 obese and 6 normal-weight patients), while the control group-13 obese patients in whom asthma was excluded. The analysis of whole-genome expression was performed on RNA samples isolated from peripheral blood. The comparison of gene expression profiles between asthmatic patients with obesity and those with normal body mass revealed a significant difference in 6 genes. The comparison of the expression between controls and normal-weight patients with asthma showed a significant difference in 23 genes. The analysis of genes with a different expression revealed a group of transcripts that may be related to an increased body mass (PI3, LOC100008589, RPS6KA3, LOC441763, IFIT1, and LOC100133565). Based on gene expression results, a prediction model was constructed, which allowed to correctly classify 92% of obese controls and 89% of obese asthmatic patients, resulting in the overall accuracy of the model of 90.9%. The results of our study showed significant differences in gene expression between obese asthmatic patients compared with asthmatic patients with normal body mass as well as in obese patients without asthma compared with asthmatic patients with normal body mass.

  5. Whole-genome sequencing overcomes pseudogene homology to diagnose autosomal dominant polycystic kidney disease.

    Science.gov (United States)

    Mallawaarachchi, Amali C; Hort, Yvonne; Cowley, Mark J; McCabe, Mark J; Minoche, André; Dinger, Marcel E; Shine, John; Furlong, Timothy J

    2016-11-01

    Autosomal dominant polycystic kidney disease (ADPKD) is the most common monogenic kidney disorder and is due to disease-causing variants in PKD1 or PKD2. Strong genotype-phenotype correlation exists although diagnostic sequencing is not part of routine clinical practice. This is because PKD1 bears 97.7% sequence similarity with six pseudogenes, requiring laborious and error-prone long-range PCR and Sanger sequencing to overcome. We hypothesised that whole-genome sequencing (WGS) would be able to overcome the problem of this sequence homology, because of 150 bp, paired-end reads and avoidance of capture bias that arises from targeted sequencing. We prospectively recruited a cohort of 28 unique pedigrees with ADPKD phenotype. Standard DNA extraction, library preparation and WGS were performed using Illumina HiSeq X and variants were classified following standard guidelines. Molecular diagnosis was made in 24 patients (86%), with 100% variant confirmation by current gold standard of long-range PCR and Sanger sequencing. We demonstrated unique alignment of sequencing reads over the pseudogene-homologous region. In addition to identifying function-affecting single-nucleotide variants and indels, we identified single- and multi-exon deletions affecting PKD1 and PKD2, which would have been challenging to identify using exome sequencing. We report the first use of WGS to diagnose ADPKD. This method overcomes pseudogene homology, provides uniform coverage, detects all variant types in a single test and is less labour-intensive than current techniques. This technique is translatable to a diagnostic setting, allows clinicians to make better-informed management decisions and has implications for other disease groups that are challenged by regions of confounding sequence homology.

  6. Flexible positions, managed hopes: the promissory bioeconomy of a whole genome sequencing cancer study.

    Science.gov (United States)

    Haase, Rachel; Michie, Marsha; Skinner, Debra

    2015-04-01

    Genomic research has rapidly expanded its scope and ambition over the past decade, promoted by both public and private sectors as having the potential to revolutionize clinical medicine. This promissory bioeconomy of genomic research and technology is generated by, and in turn generates, the hopes and expectations shared by investors, researchers and clinicians, patients, and the general public alike. Examinations of such bioeconomies have often focused on the public discourse, media representations, and capital investments that fuel these "regimes of hope," but also crucial are the more intimate contexts of small-scale medical research, and the private hopes, dreams, and disappointments of those involved. Here we examine one local site of production in a university-based clinical research project that sought to identify novel cancer predisposition genes through whole genome sequencing in individuals at high risk for cancer. In-depth interviews with 24 adults who donated samples to the study revealed an ability to shift flexibly between positioning themselves as research participants on the one hand, and as patients or as family members of patients, on the other. Similarly, interviews with members of the research team highlighted the dual nature of their positions as researchers and as clinicians. For both parties, this dual positioning shaped their investment in the project and valuing of its possible outcomes. In their narratives, all parties shifted between these different relational positions as they managed hopes and expectations for the research project. We suggest that this flexibility facilitated study implementation and participation in the face of potential and probable disappointment on one or more fronts, and acted as a key element in the resilience of this local promissory bioeconomy. We conclude that these multiple dimensions of relationality and positionality are inherent and essential in the creation of any complex economy, "bio" or otherwise

  7. CNV discovery for milk composition traits in dairy cattle using whole genome resequencing.

    Science.gov (United States)

    Gao, Yahui; Jiang, Jianping; Yang, Shaohua; Hou, Yali; Liu, George E; Zhang, Shengli; Zhang, Qin; Sun, Dongxiao

    2017-03-29

    Copy number variations (CNVs) are important and widely distributed in the genome. CNV detection opens a new avenue for exploring genes associated with complex traits in humans, animals and plants. Herein, we present a genome-wide assessment of CNVs that are potentially associated with milk composition traits in dairy cattle. In this study, CNVs were detected based on whole genome re-sequencing data of eight Holstein bulls from four half- and/or full-sib families, with extremely high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage. The range of coverage depth per individual was 8.2-11.9×. Using CNVnator, we identified a total of 14,821 CNVs, including 5025 duplications and 9796 deletions. Among them, 487 differential CNV regions (CNVRs) comprising ~8.23 Mb of the cattle genome were observed between the high and low groups. Annotation of these differential CNVRs were performed based on the cattle genome reference assembly (UMD3.1) and totally 235 functional genes were found within the CNVRs. By Gene Ontology and KEGG pathway analyses, we found that genes were significantly enriched for specific biological functions related to protein and lipid metabolism, insulin/IGF pathway-protein kinase B signaling cascade, prolactin signaling pathway and AMPK signaling pathways. These genes included INS, IGF2, FOXO3, TH, SCD5, GALNT18, GALNT16, ART3, SNCA and WNT7A, implying their potential association with milk protein and fat traits. In addition, 95 CNVRs were overlapped with 75 known QTLs that are associated with milk protein and fat traits of dairy cattle (Cattle QTLdb). In conclusion, based on NGS of 8 Holstein bulls with extremely high and low EBVs for milk PP and FP, we identified a total of 14,821 CNVs, 487 differential CNVRs between groups, and 10 genes, which were suggested as promising candidate genes for milk protein and fat traits.

  8. Whole genome analysis of Leptospira licerasiae provides insight into leptospiral evolution and pathogenicity.

    Directory of Open Access Journals (Sweden)

    Jessica N Ricaldi

    Full Text Available The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835 provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010(T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT. Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for

  9. Quantification of trace-level DNA by real-time whole genome amplification.

    Directory of Open Access Journals (Sweden)

    Min-Jung Kang

    Full Text Available Quantification of trace amounts of DNA is a challenge in analytical applications where the concentration of a target DNA is very low or only limited amounts of samples are available for analysis. PCR-based methods including real-time PCR are highly sensitive and widely used for quantification of low-level DNA samples. However, ordinary PCR methods require at least one copy of a specific gene sequence for amplification and may not work for a sub-genomic amount of DNA. We suggest a real-time whole genome amplification method adopting the degenerate oligonucleotide primed PCR (DOP-PCR for quantification of sub-genomic amounts of DNA. This approach enabled quantification of sub-picogram amounts of DNA independently of their sequences. When the method was applied to the human placental DNA of which amount was accurately determined by inductively coupled plasma-optical emission spectroscopy (ICP-OES, an accurate and stable quantification capability for DNA samples ranging from 80 fg to 8 ng was obtained. In blind tests of laboratory-prepared DNA samples, measurement accuracies of 7.4%, -2.1%, and -13.9% with analytical precisions around 15% were achieved for 400-pg, 4-pg, and 400-fg DNA samples, respectively. A similar quantification capability was also observed for other DNA species from calf, E. coli, and lambda phage. Therefore, when provided with an appropriate standard DNA, the suggested real-time DOP-PCR method can be used as a universal method for quantification of trace amounts of DNA.

  10. Identification of Escherichia coli and Shigella Species from Whole-Genome Sequences.

    Science.gov (United States)

    Chattaway, Marie A; Schaefer, Ulf; Tewolde, Rediat; Dallman, Timothy J; Jenkins, Claire

    2017-02-01

    Escherichia coli and Shigella species are closely related and genetically constitute the same species. Differentiating between these two pathogens and accurately identifying the four species of Shigella are therefore challenging. The organism-specific bioinformatics whole-genome sequencing (WGS) typing pipelines at Public Health England are dependent on the initial identification of the bacterial species by use of a kmer-based approach. Of the 1,982 Escherichia coli and Shigella sp. isolates analyzed in this study, 1,957 (98.4%) had concordant results by both traditional biochemistry and serology (TB&S) and the kmer identification (ID) derived from the WGS data. Of the 25 mismatches identified, 10 were enteroinvasive E. coli isolates that were misidentified as Shigella flexneri or S. boydii by the kmer ID, and 8 were S. flexneri isolates misidentified by TB&S as S. boydii due to nonfunctional S. flexneri O antigen biosynthesis genes. Analysis of the population structure based on multilocus sequence typing (MLST) data derived from the WGS data showed that the remaining discrepant results belonged to clonal complex 288 (CC288), comprising both S. boydii and S. dysenteriae strains. Mismatches between the TB&S and kmer ID results were explained by the close phylogenetic relationship between the two species and were resolved with reference to the MLST data. Shigella can be differentiated from E. coli and accurately identified to the species level by use of kmer comparisons and MLST. Analysis of the WGS data provided explanations for the discordant results between TB&S and WGS data, revealed the true phylogenetic relationships between different species of Shigella, and identified emerging pathoadapted lineages. © Crown copyright 2017.

  11. Utility of Whole-Genome Sequencing in Characterizing Acinetobacter Epidemiology and Analyzing Hospital Outbreaks.

    Science.gov (United States)

    Fitzpatrick, Margaret A; Ozer, Egon A; Hauser, Alan R

    2016-03-01

    Acinetobacter baumannii frequently causes nosocomial infections and outbreaks. Whole-genome sequencing (WGS) is a promising technique for strain typing and outbreak investigations. We compared the performance of conventional methods with WGS for strain typing clinical Acinetobacter isolates and analyzing a carbapenem-resistant A. baumannii (CRAB) outbreak. We performed two band-based typing techniques (pulsed-field gel electrophoresis and repetitive extragenic palindromic-PCR), multilocus sequence type (MLST) analysis, and WGS on 148 Acinetobacter calcoaceticus-A. baumannii complex bloodstream isolates collected from a single hospital from 2005 to 2012. Phylogenetic trees inferred from core-genome single nucleotide polymorphisms (SNPs) confirmed three Acinetobacter species within this collection. Four major A. baumannii clonal lineages (as defined by MLST) circulated during the study, three of which are globally distributed and one of which is novel. WGS indicated that a threshold of 2,500 core SNPs accurately distinguished A. baumannii isolates from different clonal lineages. The band-based techniques performed poorly in assigning isolates to clonal lineages and exhibited little agreement with sequence-based techniques. After applying WGS to a CRAB outbreak that occurred during the study, we identified a threshold of 2.5 core SNPs that distinguished nonoutbreak from outbreak strains. WGS was more discriminatory than the band-based techniques and was used to construct a more accurate transmission map that resolved many of the plausible transmission routes suggested by epidemiologic links. Our study demonstrates that WGS is superior to conventional techniques for A. baumannii strain typing and outbreak analysis. These findings support the incorporation of WGS into health care infection prevention efforts. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  12. Extremely low-coverage whole genome sequencing in South Asians captures population genomics information.

    Science.gov (United States)

    Rustagi, Navin; Zhou, Anbo; Watkins, W Scott; Gedvilaite, Erika; Wang, Shuoguo; Ramesh, Naveen; Muzny, Donna; Gibbs, Richard A; Jorde, Lynn B; Yu, Fuli; Xing, Jinchuan

    2017-05-22

    The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.

  13. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.

    Science.gov (United States)

    Belkadi, Aziz; Bolze, Alexandre; Itan, Yuval; Cobat, Aurélie; Vincent, Quentin B; Antipenko, Alexander; Shang, Lei; Boisson, Bertrand; Casanova, Jean-Laurent; Abel, Laurent

    2015-04-28

    We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels was identified exclusively by WES whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger-sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656 variants, ∼3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26 variants). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy-number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs.

  14. Insight into Shiga toxin genes encoded by Escherichia coli O157 from whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Philip M. Ashton

    2015-02-01

    Full Text Available The ability of Shiga toxin-producing Escherichia coli (STEC to cause severe illness in humans is determined by multiple host factors and bacterial characteristics, including Shiga toxin (Stx subtype. Given the link between Stx2a subtype and disease severity, we sought to identify the stx subtypes present in whole genome sequences (WGS of 444 isolates of STEC O157. Difficulties in assembling the stx genes in some strains were overcome by using two complementary bioinformatics methods: mapping and de novo assembly. We compared the WGS analysis with the results obtained using a PCR approach and investigated the diversity within and between the subtypes. All strains of STEC O157 in this study had stx1a, stx2a or stx2c or a combination of these three genes. There was over 99% (442/444 concordance between PCR and WGS. When common source strains were excluded, 236/349 strains of STEC O157 had multiple copies of different Stx subtypes and 54 had multiple copies of the same Stx subtype. Of those strains harbouring multiple copies of the same Stx subtype, 33 had variants between the alleles while 21 had identical copies. Strains harbouring Stx2a only were most commonly found to have multiple alleles of the same subtype (42%. Both the PCR and WGS approach to stx subtyping provided a good level of sensitivity and specificity. In addition, the WGS data also showed there were a significant proportion of strains harbouring multiple alleles of the same Stx subtype associated with clinical disease in England.

  15. A Comparison of Whole Genome Sequencing to Multigene Panel Testing in Hypertrophic Cardiomyopathy Patients.

    Science.gov (United States)

    Cirino, Allison L; Lakdawala, Neal K; McDonough, Barbara; Conner, Lauren; Adler, Dale; Weinfeld, Mark; O'Gara, Patrick; Rehm, Heidi L; Machini, Kalotina; Lebo, Matthew; Blout, Carrie; Green, Robert C; MacRae, Calum A; Seidman, Christine E; Ho, Carolyn Y

    2017-10-01

    As DNA sequencing costs decline, genetic testing options have expanded. Whole exome sequencing and whole genome sequencing (WGS) are entering clinical use, posing questions about their incremental value compared with disease-specific multigene panels that have been the cornerstone of genetic testing. Forty-one patients with hypertrophic cardiomyopathy who had undergone targeted hypertrophic cardiomyopathy genetic testing (either multigene panel or familial variant test) were recruited into the MedSeq Project, a clinical trial of WGS. Results from panel genetic testing and WGS were compared. In 20 of 41 participants, panel genetic testing identified variants classified as pathogenic, likely pathogenic, or uncertain significance. WGS identified 19 of these 20 variants, but the variant detection algorithm missed a pathogenic 18 bp duplication in myosin binding protein C (MYBPC3) because of low coverage. In 3 individuals, WGS identified variants in genes implicated in cardiomyopathy but not included in prior panel testing: a pathogenic protein tyrosine phosphatase, non-receptor type 11 (PTPN11) variant and variants of uncertain significance in integrin-linked kinase (ILK) and filamin-C (FLNC). WGS also identified 84 secondary findings (mean=2 per person, range=0-6), which mostly defined carrier status for recessive conditions. WGS detected nearly all variants identified on panel testing, provided 1 new diagnostic finding, and allowed interrogation of posited disease genes. Several variants of uncertain clinical use and numerous secondary genetic findings were also identified. Whereas panel testing and WGS provided similar diagnostic yield, WGS offers the advantage of reanalysis over time to incorporate advances in knowledge, but requires expertise in genomic interpretation to appropriately incorporate WGS into clinical care. URL: https://clinicaltrials.gov. Unique identifier: NCT01736566. © 2017 American Heart Association, Inc.

  16. Whole genome sequences are required to fully resolve the linkage disequilibrium structure of human populations.

    Science.gov (United States)

    Pengelly, Reuben J; Tapper, William; Gibson, Jane; Knut, Marcin; Tearle, Rick; Collins, Andrew; Ennis, Sarah

    2015-09-03

    An understanding of linkage disequilibrium (LD) structures in the human genome underpins much of medical genetics and provides a basis for disease gene mapping and investigating biological mechanisms such as recombination and selection. Whole genome sequencing (WGS) provides the opportunity to determine LD structures at maximal resolution. We compare LD maps constructed from WGS data with LD maps produced from the array-based HapMap dataset, for representative European and African populations. WGS provides up to 5.7-fold greater SNP density than array-based data and achieves much greater resolution of LD structure, allowing for identification of up to 2.8-fold more regions of intense recombination. The absence of ascertainment bias in variant genotyping improves the population representativeness of the WGS maps, and highlights the extent of uncaptured variation using array genotyping methodologies. The complete capture of LD patterns using WGS allows for higher genome-wide association study (GWAS) power compared to array-based GWAS, with WGS also allowing for the analysis of rare variation. The impact of marker ascertainment issues in arrays has been greatest for Sub-Saharan African populations where larger sample sizes and substantially higher marker densities are required to fully resolve the LD structure. WGS provides the best possible resource for LD mapping due to the maximal marker density and lack of ascertainment bias. WGS LD maps provide a rich resource for medical and population genetics studies. The increasing availability of WGS data for large populations will allow for improved research utilising LD, such as GWAS and recombination biology studies.

  17. When bins blur: Patient perspectives on categories of results from clinical whole genome sequencing.

    Science.gov (United States)

    Jamal, Leila; Robinson, Jill O; Christensen, Kurt D; Blumenthal-Barby, Jennifer; Slashinski, Melody J; Perry, Denise Lautenbach; Vassy, Jason L; Wycliff, Julia; Green, Robert C; McGuire, Amy L

    2017-01-01

    Clinical genome and exome sequencing (CGES) is being used in an expanding range of clinical settings. Most approaches to offering patients choices about learning CGES results classify results according to expert definitions of clinical actionability. Little is known about how patients conceptualize different categories of CGES results. The MedSeq Project is a randomized controlled trial studying the use of whole-genome sequencing (WGS) in primary care and cardiology. We surveyed 202 patient-participants about different kinds of WGS results and conducted qualitative interviews with 49 of these participants. Interview data were analyzed both inductively and deductively using thematic content analysis. Participants demonstrated high levels of study understanding and genetic literacy. A small majority of participants wanted to learn all of their WGS results (n = 123, 61%). Qualitative data provided a deeper understanding of participants' perspectives about different types of WGS results. Participants did not have the same views about which WGS results would be actionable or upsetting to learn. They conceptualized variants of uncertain significance (VUS) in a variety of different ways. Many participants expressed optimism that the uncertainty associated with VUS results could be reduced over time. Proposals to determine which WGS/CGES results to disclose by soliciting patient preferences may fail to appreciate the complex ways patients think about disease and the information WGS/CGES can produce. Our findings challenge prevailing methods of facilitating patient choice and assessing the benefits and harms related to the return of WGS/CGES results, which mostly rely on expert definitions of clinical utility to categorize the kinds of results patients can learn.

  18. Parents perspectives on whole genome sequencing for their children: qualified enthusiasm?

    Science.gov (United States)

    Anderson, J A; Meyn, M S; Shuman, C; Zlotnik Shaul, R; Mantella, L E; Szego, M J; Bowdin, S; Monfared, N; Hayeems, R Z

    2017-08-01

    To better understand the consequences of returning whole genome sequencing (WGS) results in paediatrics and facilitate its evidence-based clinical implementation, we studied parents' experiences with WGS and their preferences for the return of adult-onset secondary variants (SVs)-medically actionable genomic variants unrelated to their child's current medical condition that predict adult-onset disease. We conducted qualitative interviews with parents whose children were undergoing WGS as part of the SickKids Genome Clinic, a research project that studies the impact of clinical WGS on patients, families, and the healthcare system. Interviews probed parents' experience with and motivation for WGS as well as their preferences related to SVs. Interviews were analysed thematically. Of 83 invited, 23 parents from 18 families participated. These parents supported WGS as a diagnostic test, perceiving clear intrinsic and instrumental value. However, many parents were ambivalent about receiving SVs, conveying a sense of self-imposed obligation to take on the 'weight' of knowing their child's SVs, however unpleasant. Some parents chose to learn about adult-onset SVs for their child but not for themselves. Despite general enthusiasm for WGS as a diagnostic test, many parents felt a duty to learn adult-onset SVs. Analogous to 'inflicted insight', we call this phenomenon 'inflicted ought'. Importantly, not all parents of children undergoing WGS view the best interests of their child in relational terms, thereby challenging an underlying justification for current ACMG guidelines for reporting incidental secondary findings from whole exome and WGS. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  19. Whole-Genome Sequencing of Bacterial Pathogens: the Future of Nosocomial Outbreak Analysis.

    Science.gov (United States)

    Quainoo, Scott; Coolen, Jordy P M; van Hijum, Sacha A F T; Huynen, Martijn A; Melchers, Willem J G; van Schaik, Willem; Wertheim, Heiman F L

    2017-10-01

    Outbreaks of multidrug-resistant bacteria present a frequent threat to vulnerable patient populations in hospitals around the world. Intensive care unit (ICU) patients are particularly susceptible to nosocomial infections due to indwelling devices such as intravascular catheters, drains, and intratracheal tubes for mechanical ventilation. The increased vulnerability of infected ICU patients demonstrates the importance of effective outbreak management protocols to be in place. Understanding the transmission of pathogens via genotyping methods is an important tool for outbreak management. Recently, whole-genome sequencing (WGS) of pathogens has become more accessible and affordable as a tool for genotyping. Analysis of the entire pathogen genome via WGS could provide unprecedented resolution in discriminating even highly related lineages of bacteria and revolutionize outbreak analysis in hospitals. Nevertheless, clinicians have long been hesitant to implement WGS in outbreak analyses due to the expensive and cumbersome nature of early sequencing platforms. Recent improvements in sequencing technologies and analysis tools have rapidly increased the output and analysis speed as well as reduced the overall costs of WGS. In this review, we assess the feasibility of WGS technologies and bioinformatics analysis tools for nosocomial outbreak analyses and provide a comparison to conventional outbreak analysis workflows. Moreover, we review advantages and limitations of sequencing technologies and analysis tools and present a real-world example of the implementation of WGS for antimicrobial resistance analysis. We aimed to provide health care professionals with a guide to WGS outbreak analysis that highlights its benefits for hospitals and assists in the transition from conventional to WGS-based outbreak analysis. Copyright © 2017 American Society for Microbiology.

  20. Microbiota present in cystic fibrosis lungs as revealed by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Philippe M Hauser

    Full Text Available Determination of the precise composition and variation of microbiota in cystic fibrosis lungs is crucial since chronic inflammation due to microorganisms leads to lung damage and ultimately, death. However, this constitutes a major technical challenge. Culturing of microorganisms does not provide a complete representation of a microbiota, even when using culturomics (high-throughput culture. So far, only PCR-based metagenomics have been investigated. However, these methods are biased towards certain microbial groups, and suffer from uncertain quantification of the different microbial domains. We have explored whole genome sequencing (WGS using the Illumina high-throughput technology applied directly to DNA extracted from sputa obtained from two cystic fibrosis patients. To detect all microorganism groups, we used four procedures for DNA extraction, each with a different lysis protocol. We avoided biases due to whole DNA amplification thanks to the high efficiency of current Illumina technology. Phylogenomic classification of the reads by three different methods produced similar results. Our results suggest that WGS provides, in a single analysis, a better qualitative and quantitative assessment of microbiota compositions than cultures and PCRs. WGS identified a high quantity of Haemophilus spp. (patient 1 or Staphylococcus spp. plus Streptococcus spp. (patient 2 together with low amounts of anaerobic (Veillonella, Prevotella, Fusobacterium and aerobic bacteria (Gemella, Moraxella, Granulicatella. WGS suggested that fungal members represented very low proportions of the microbiota, which were detected by cultures and PCRs because of their selectivity. The future increase of reads' sizes and decrease in cost should ensure the usefulness of WGS for the characterisation of microbiota.

  1. Comprehensive whole genome sequence analyses yields novel genetic and structural insights for Intellectual Disability.

    Science.gov (United States)

    Zahir, Farah R; Mwenifumbo, Jill C; Chun, Hye-Jung E; Lim, Emilia L; Van Karnebeek, Clara D M; Couse, Madeline; Mungall, Karen L; Lee, Leora; Makela, Nancy; Armstrong, Linlea; Boerkoel, Cornelius F; Langlois, Sylvie L; McGillivray, Barbara M; Jones, Steven J M; Friedman, Jan M; Marra, Marco A

    2017-05-24

    Intellectual Disability (ID) is among the most common global disorders, yet etiology is unknown in ~30% of patients despite clinical assessment. Whole genome sequencing (WGS) is able to interrogate the entire genome, providing potential to diagnose idiopathic patients. We conducted WGS on eight children with idiopathic ID and brain structural defects, and their normal parents; carrying out an extensive data analyses, using standard and discovery approaches. We verified de novo pathogenic single nucleotide variants (SNV) in ARID1B c.1595delG and PHF6 c.820C > T, potentially causative de novo two base indels in SQSTM1 c.115_116delinsTA and UPF1 c.1576_1577delinsA, and de novo SNVs in CACNB3 c.1289G > A, and SPRY4 c.508 T > A, of uncertain significance. We report results from a large secondary control study of 2081 exomes probing the pathogenicity of the above genes. We analyzed structural variation by four different algorithms including de novo genome assembly. We confirmed a likely contributory 165 kb de novo heterozygous 1q43 microdeletion missed by clinical microarray. The de novo assembly resulted in unmasking hidden genome instability that was missed by standard re-alignment based algorithms. We also interrogated regulatory sequence variation for known and hypothesized ID genes and present useful strategies for WGS data analyses for non-coding variation. This study provides an extensive analysis of WGS in the context of ID, providing genetic and structural insights into ID and yielding diagnoses.

  2. Genomic analysis of the basal lineage fungus Rhizopus oryzae reveals a whole-genome duplication.

    Directory of Open Access Journals (Sweden)

    Li-Jun Ma

    2009-07-01

    Full Text Available Rhizopus oryzae is the primary cause of mucormycosis, an emerging, life-threatening infection characterized by rapid angioinvasive growth with an overall mortality rate that exceeds 50%. As a representative of the paraphyletic basal group of the fungal kingdom called "zygomycetes," R. oryzae is also used as a model to study fungal evolution. Here we report the genome sequence of R. oryzae strain 99-880, isolated from a fatal case of mucormycosis. The highly repetitive 45.3 Mb genome assembly contains abundant transposable elements (TEs, comprising approximately 20% of the genome. We predicted 13,895 protein-coding genes not overlapping TEs, many of which are paralogous gene pairs. The order and genomic arrangement of the duplicated gene pairs and their common phylogenetic origin provide evidence for an ancestral whole-genome duplication (WGD event. The WGD resulted in the duplication of nearly all subunits of the protein complexes associated with respiratory electron transport chains, the V-ATPase, and the ubiquitin-proteasome systems. The WGD, together with recent gene duplications, resulted in the expansion of multiple gene families related to cell growth and signal transduction, as well as secreted aspartic protease and subtilase protein families, which are known fungal virulence factors. The duplication of the ergosterol biosynthetic pathway, especially the major azole target, lanosterol 14alpha-demethylase (ERG11, could contribute to the variable responses of R. oryzae to different azole drugs, including voriconazole and posaconazole. Expanded families of cell-wall synthesis enzymes, essential for fungal cell integrity but absent in mammalian hosts, reveal potential targets for novel and R. oryzae-specific diagnostic and therapeutic treatments.

  3. Resolving Evolutionary Relationships in Closely Related Species with Whole-Genome Sequencing Data.

    Science.gov (United States)

    Nater, Alexander; Burri, Reto; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2015-11-01

    Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results. To study these phenomena on a full phylogenomic scale, we use whole-genome sequence data from 200 individuals of four black-and-white flycatcher species with so far unresolved phylogenetic relationships to infer gene tree topologies and visualize genome-wide patterns of gene tree incongruence. Using phylogenetic analysis in nonoverlapping 10-kb windows, we show that gene tree topologies are extremely diverse and change on a very small physical scale. Moreover, we find strong evidence for gene flow among flycatcher species, with distinct patterns of reduced introgression on the Z chromosome. To resolve species relationships on the background of widespread gene tree incongruence, we used four complementary coalescent-based methods for species tree reconstruction, including complex modeling approaches that incorporate post-divergence gene flow among species. This allowed us to infer the most likely species tree with high confidence. Based on this finding, we show that regions of reduced effective population size, which have been suggested as particularly useful for species tree inference, can produce positively misleading species tree topologies. Our findings disclose the pitfalls of using loci potentially under selection as phylogenetic markers and highlight the potential of modeling approaches to disentangle species relationships in systems with large effective population sizes and post

  4. Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases.

    Directory of Open Access Journals (Sweden)

    Mariane M A Stefani

    2017-06-01

    Full Text Available Since leprosy is both treated and controlled by multidrug therapy (MDT it is important to monitor recurrent cases for drug resistance and to distinguish between relapse and reinfection as a means of assessing therapeutic efficacy. All three objectives can be reached with single nucleotide resolution using next generation sequencing and bioinformatics analysis of Mycobacterium leprae DNA present in human skin.DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR. Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico.In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence. However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable.This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission.

  5. Practical Approaches for Whole-Genome Sequence Analysis of Heart- and Blood-Related Traits.

    Science.gov (United States)

    Morrison, Alanna C; Huang, Zhuoyi; Yu, Bing; Metcalf, Ginger; Liu, Xiaoming; Ballantyne, Christie; Coresh, Josef; Yu, Fuli; Muzny, Donna; Feofanova, Elena; Rustagi, Navin; Gibbs, Richard; Boerwinkle, Eric

    2017-02-02

    Whole-genome sequencing (WGS) allows for a comprehensive view of the sequence of the human genome. We present and apply integrated methodologic steps for interrogating WGS data to characterize the genetic architecture of 10 heart- and blood-related traits in a sample of 1,860 African Americans. In order to evaluate the contribution of regulatory and non-protein coding regions of the genome, we conducted aggregate tests of rare variation across the entire genomic landscape using a sliding window, complemented by an annotation-based assessment of the genome using predefined regulatory elements and within the first intron of all genes. These tests were performed treating all variants equally as well as with individual variants weighted by a measure of predicted functional consequence. Significant findings were assessed in 1,705 individuals of European ancestry. After these steps, we identified and replicated components of the genomic landscape significantly associated with heart- and blood-related traits. For two traits, lipoprotein(a) levels and neutrophil count, aggregate tests of low-frequency and rare variation were significantly associated across multiple motifs. For a third trait, cardiac troponin T, investigation of regulatory domains identified a locus on chromosome 9. These practical approaches for WGS analysis led to the identification of informative genomic regions and also showed that defined non-coding regions, such as first introns of genes and regulatory domains, are associated with important risk factor phenotypes. This study illustrates the tractable nature of WGS data and outlines an approach for characterizing the genetic architecture of complex traits. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  6. The American cranberry: first insights into the whole genome of a species adapted to bog habitat.

    Science.gov (United States)

    Polashock, James; Zelzion, Ehud; Fajardo, Diego; Zalapa, Juan; Georgi, Laura; Bhattacharya, Debashish; Vorsa, Nicholi

    2014-06-13

    The American cranberry (Vaccinium macrocarpon Ait.) is one of only three widely-cultivated fruit crops native to North America- the other two are blueberry (Vaccinium spp.) and native grape (Vitis spp.). In terms of taxonomy, cranberries are in the core Ericales, an order for which genome sequence data are currently lacking. In addition, cranberries produce a host of important polyphenolic secondary compounds, some of which are beneficial to human health. Whereas next-generation sequencing technology is allowing the advancement of whole-genome sequencing, one major obstacle to the successful assembly from short-read sequence data of complex diploid (and higher ploidy) organisms is heterozygosity. Cranberry has the advantage of being diploid (2n = 2x = 24) and self-fertile. To minimize the issue of heterozygosity, we sequenced the genome of a fifth-generation inbred genotype (F ≥ 0.97) derived from five generations of selfing originating from the cultivar Ben Lear. The genome size of V. macrocarpon has been estimated to be about 470 Mb. Genomic sequences were assembled into 229,745 scaffolds representing 420 Mbp (N50 = 4,237 bp) with 20X average coverage. The number of predicted genes was 36,364 and represents 17.7% of the assembled genome. Of the predicted genes, 30,090 were assigned to candidate genes based on homology. Genes supported by transcriptome data totaled 13,170 (36%). Shotgun sequencing of the cranberry genome, with an average sequencing coverage of 20X, allowed efficient assembly and gene calling. The candidate genes identified represent a useful collection to further study important biochemical pathways and cellular processes and to use for marker development for breeding and the study of horticultural characteristics, such as disease resistance.

  7. Whole genome sequencing of an ethnic Pathan (Pakhtun) from the north-west of Pakistan.

    Science.gov (United States)

    Ilyas, Muhammad; Kim, Jong-Soo; Cooper, Jesse; Shin, Young-Ah; Kim, Hak-Min; Cho, Yun Sung; Hwang, Seungwoo; Kim, Hyunho; Moon, Jaewoo; Chung, Oksung; Jun, JeHoon; Rastogi, Achal; Song, Sanghoon; Ko, Junsu; Manica, Andrea; Rahman, Ziaur; Husnain, Tayyab; Bhak, Jong

    2015-03-12

    Pakistan covers a key geographic area in human history, being both part of the Indus River region that acted as one of the cradles of civilization and as a link between Western Eurasia and Eastern Asia. This region is inhabited by a number of distinct ethnic groups, the largest being the Punjabi, Pathan (Pakhtuns), Sindhi, and Baloch. We analyzed the first ethnic male Pathan genome by sequencing it to 29.7-fold coverage using the Illumina HiSeq2000 platform. A total of 3.8 million single nucleotide variations (SNVs) and 0.5 million small indels were identified by comparing with the human reference genome. Among the SNVs, 129,441 were novel, and 10,315 nonsynonymous SNVs were found in 5,344 genes. SNVs were annotated for health consequences and high risk diseases, as well as possible influences on drug efficacy. We confirmed that the Pathan genome presented here is representative of this ethnic group by comparing it to a panel of Central Asians from the HGDP-CEPH panels typed for ~650 k SNPs. The mtDNA (H2) and Y haplogroup (L1) of this individual were also typical of his geographic region of origin. Finally, we reconstruct the demographic history by PSMC, which highlights a recent increase in effective population size compatible with admixture between European and Asian lineages expected in this geographic region. We present a whole-genome sequence and analyses of an ethnic Pathan from the north-west province of Pakistan. It is a useful resource to understand genetic variation and human migration across the whole Asian continent.

  8. Whole genome evaluation of horizontal transfers in the pathogenic fungus Aspergillus fumigatus

    Directory of Open Access Journals (Sweden)

    Deschavanne Patrick

    2010-03-01

    Full Text Available Abstract Background Numerous cases of horizontal transfers (HTs have been described for eukaryote genomes, but in contrast to prokaryote genomes, no whole genome evaluation of HTs has been carried out. This is mainly due to a lack of parametric methods specially designed to take the intrinsic heterogeneity of eukaryote genomes into account. We applied a simple and tested method based on local variations of genomic signatures to analyze the genome of the pathogenic fungus Aspergillus fumigatus. Results We detected 189 atypical regions containing 214 genes, accounting for about 1 Mb of DNA sequences. However, the fraction of atypical DNA detected was smaller than the average amount detected in the same conditions in prokaryote genomes (3.1% vs 5.6%. It appeared that about one third of these regions contained no annotated genes, a proportion far greater than in prokaryote genomes. When analyzing the origin of these HTs by comparing their signatures to a home made database of species signatures, 3 groups of donor species emerged: bacteria (40%, fungi (25%, and viruses (22%. It is to be noticed that though inter-domain exchanges are confirmed, we only put in evidence very few exchanges between eukaryotic kingdoms. Conclusions In conclusion, we demonstrated that HTs are not negligible in eukaryote genomes, bearing in mind that in our stringent conditions this amount is a floor value, though of a lesser extent than in prokaryote genomes. The biological mechanisms underlying those transfers remain to be elucidated as well as the biological functions of the transferred genes.

  9. A bivariate whole-genome linkage scan suggests several shared genomic regions for obesity and osteoporosis.

    Science.gov (United States)

    Tang, Zi-Hui; Xiao, Peng; Lei, Shu-Feng; Deng, Fei-Yan; Zhao, Lan-Juan; Deng, Hong-Yi; Tan, Li-Jun; Shen, Hui; Xiong, Dong-Hai; Recker, Robert R; Deng, Hong-Wen

    2007-07-01

    A genome-wide bivariate analysis was conducted for body fat mass (BFM) and bone mineral density (BMD) in a large Caucasian sample. We found some quantitative trait loci shared by BFM and BMD in the total sample and the gender-specific subgroups, and quantitative trait loci with potential pleiotropy were disclosed. BFM and BMD, as the respective measure for obesity and osteoporosis, are phenotypically and genetically correlated. However, specific genomic regions accounting for their genetic correlation are unknown. To identify systemically the shared genomic regions for BFM and BMD, we performed a bivariate whole-genome linkage scan in 4498 Caucasian individuals from 451 families for BFM and BMD at the hip, spine, and wrist, respectively. Linkage analyses were performed in the total sample and the male and female subgroups, respectively. In the entire sample, suggestive linkages were detected at 7p22-p21 (LOD 2.69) for BFM and spine BMD, 6q27 (LOD 2.30) for BFM and hip BMD, and 11q13 (LOD 2.64) for BFM and wrist BMD. Male-specific suggestive linkages were found at 13q12 (LOD 3.23) for BFM and spine BMD and at 7q21 (LOD 2.59) for BFM and hip BMD. Female-specific suggestive LOD scores were 3.32 at 15q13 for BFM and spine BMD and 3.15 at 6p25-24 for BFM and wrist BMD. Several shared genomic regions for BFM and BMD were identified here. Our data may benefit further positional and functional studies, aimed at eventually uncovering the complex mechanism underlying the shared genetic determination of obesity and osteoporosis.

  10. Insights into Reston virus spillovers and adaption from virus whole genome sequences.

    Directory of Open Access Journals (Sweden)

    César G Albariño

    Full Text Available Reston virus (family Filoviridae is unique among the viruses of the Ebolavirus genus in that it is considered non-pathogenic in humans, in contrast to the other members which are highly virulent. The virus has however, been associated with several outbreaks of highly lethal hemorrhagic fever in non-human primates (NHPs, specifically cynomolgus monkeys (Macaca fascicularis originating in the Philippines. In addition, Reston virus has been isolated from domestic pigs in the Philippines. To better understand virus spillover events and potential adaption to new hosts, the whole genome sequences of representative Reston virus isolates were obtained using a next generation sequencing (NGS approach and comparative genomic analysis and virus fitness analyses were performed. Nine virus genome sequences were completed for novel and previously described isolates obtained from a variety of hosts including a human case, non-human primates and pigs. Results of phylogenetic analysis of the sequence differences are consistent with multiple independent introductions of RESTV from a still unknown natural reservoir into non-human primates and swine farming operations. No consistent virus genetic markers were found specific for viruses associated with primate or pig infections, but similar to what had been seen with some Ebola viruses detected in the large Western Africa outbreak in 2014-2016, a truncated version of VP30 was identified in a subgroup of Reston viruses obtained from an outbreak in pigs 2008-2009. Finally, the genetic comparison of two closely related viruses, one isolated from a human case and one from an NHP, showed amino acid differences in the viral polymerase and detectable differences were found in competitive growth assays on human and NHP cell lines.

  11. Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases

    Science.gov (United States)

    Bührer-Sékula, Samira; Benjak, Andrej; Loiseau, Chloé; Singh, Pushpendra; Pontes, Maria A. A.; Gonçalves, Heitor S.; Hungria, Emerith M.; Busso, Philippe; Piton, Jérémie; Silveira, Maria I. S.; Cruz, Rossilene; Schetinni, Antônio; Costa, Maurício B.; Virmond, Marcos C. L.; Diorio, Suzana M.; Dias-Baptista, Ida M. F.; Rosa, Patricia S.; Matsuoka, Masanori; Penna, Maria L. F.; Cole, Stewart T.; Penna, Gerson O.

    2017-01-01

    Background Since leprosy is both treated and controlled by multidrug therapy (MDT) it is important to monitor recurrent cases for drug resistance and to distinguish between relapse and reinfection as a means of assessing therapeutic efficacy. All three objectives can be reached with single nucleotide resolution using next generation sequencing and bioinformatics analysis of Mycobacterium leprae DNA present in human skin. Methodology DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR). Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico. Principal findings In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence. However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable. Conclusions/Significance This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission. PMID:28617800

  12. Mosquito-borne Inkoo virus in northern Sweden - isolation and whole genome sequencing.

    Science.gov (United States)

    Lwande, Olivia Wesula; Bucht, Göran; Ahlm, Clas; Ahlm, Kristoffer; Näslund, Jonas; Evander, Magnus

    2017-03-23

    Inkoo virus (INKV) is a less known mosquito-borne virus belonging to Bunyaviridae, genus Orthobunyavirus, California serogroup. Studies indicate that INKV infection is mainly asymptomatic, but can cause mild encephalitis in humans. In northern Europe, the sero-prevalence against INKV is high, 41% in Sweden and 51% in Finland. Previously, INKV RNA has been detected in adult Aedes (Ae.) communis, Ae. hexodontus and Ae. punctor mosquitoes and Ae. communis larvae, but there are still gaps of knowledge regarding mosquito vectors and genetic diversity. Therefore, we aimed to determine the occurrence of INKV in its mosquito vector and characterize the isolates. About 125,000 mosquitoes were collected during a mosquito-borne virus surveillance in northern Sweden during the summer period of 2015. Of these, 10,000 mosquitoes were processed for virus isolation and detection using cell culture and RT-PCR. Virus isolates were further characterized by whole genome sequencing. Genetic typing of mosquito species was conducted by cytochrome oxidase subunit I (COI) gene amplification and sequencing (genetic barcoding). Several Ae. communis mosquitoes were found positive for INKV RNA and two isolates were obtained. The first complete sequences of the small (S), medium (M), and large (L) segments of INKV in Sweden were obtained. Phylogenetic analysis showed that the INKV genome was most closely related to other INKV isolates from Sweden and Finland. Of the three INKV genome segments, the INKV M segment had the highest frequency of non-synonymous mutations. The overall G/C-content of INKV genes was low for the N/NSs genes (43.8-45.5%), polyprotein (Gn/Gc/NSm) gene (35.6%) and the RNA polymerase gene (33.8%) This may be due to the fact that INKV in most instances utilized A or T in the third codon position. INKV is frequently circulating in northern Sweden and Ae. communis is the key vector. The high mutation rate of the INKV M segment may have consequences on virulence.

  13. Whole genome sequencing reveals local transmission patterns of Mycobacterium bovis in sympatric cattle and badger populations.

    Directory of Open Access Journals (Sweden)

    Roman Biek

    Full Text Available Whole genome sequencing (WGS technology holds great promise as a tool for the forensic epidemiology of bacterial pathogens. It is likely to be particularly useful for studying the transmission dynamics of an observed epidemic involving a largely unsampled 'reservoir' host, as for bovine tuberculosis (bTB in British and Irish cattle and badgers. BTB is caused by Mycobacterium bovis, a member of the M. tuberculosis complex that also includes the aetiological agent for human TB. In this study, we identified a spatio-temporally linked group of 26 cattle and 4 badgers infected with the same Variable Number Tandem Repeat (VNTR type of M. bovis. Single-nucleotide polymorphisms (SNPs between sequences identified differences that were consistent with bacterial lineages being persistent on or near farms for several years, despite multiple clear whole herd tests in the interim. Comparing WGS data to mathematical models showed good correlations between genetic divergence and spatial distance, but poor correspondence to the network of cattle movements or within-herd contacts. Badger isolates showed between zero and four SNP differences from the nearest cattle isolate, providing evidence for recent transmissions between the two hosts. This is the first direct genetic evidence of M. bovis persistence on farms over multiple outbreaks with a continued, ongoing interaction with local badgers. However, despite unprecedented resolution, directionality of transmission cannot be inferred at this stage. Despite the often notoriously long timescales between time of infection and time of sampling for TB, our results suggest that WGS data alone can provide insights into TB epidemiology even where detailed contact data are not available, and that more extensive sampling and analysis will allow for quantification of the extent and direction of transmission between cattle and badgers.

  14. Genomic Analysis of the Basal Lineage Fungus Rhizopus oryzae Reveals a Whole-Genome Duplication

    Science.gov (United States)

    Ma, Li-Jun; Ibrahim, Ashraf S.; Skory, Christopher; Grabherr, Manfred G.; Burger, Gertraud; Butler, Margi; Elias, Marek; Idnurm, Alexander; Lang, B. Franz; Sone, Teruo; Abe, Ayumi; Calvo, Sarah E.; Corrochano, Luis M.; Engels, Reinhard; Fu, Jianmin; Hansberg, Wilhelm; Kim, Jung-Mi; Kodira, Chinnappa D.; Koehrsen, Michael J.; Liu, Bo; Miranda-Saavedra, Diego; O'Leary, Sinead; Ortiz-Castellanos, Lucila; Poulter, Russell; Rodriguez-Romero, Julio; Ruiz-Herrera, José; Shen, Yao-Qing; Zeng, Qiandong; Galagan, James; Birren, Bruce W.

    2009-01-01

    Rhizopus oryzae is the primary cause of mucormycosis, an emerging, life-threatening infection characterized by rapid angioinvasive growth with an overall mortality rate that exceeds 50%. As a representative of the paraphyletic basal group of the fungal kingdom called “zygomycetes,” R. oryzae is also used as a model to study fungal evolution. Here we report the genome sequence of R. oryzae strain 99–880, isolated from a fatal case of mucormycosis. The highly repetitive 45.3 Mb genome assembly contains abundant transposable elements (TEs), comprising approximately 20% of the genome. We predicted 13,895 protein-coding genes not overlapping TEs, many of which are paralogous gene pairs. The order and genomic arrangement of the duplicated gene pairs and their common phylogenetic origin provide evidence for an ancestral whole-genome duplication (WGD) event. The WGD resulted in the duplication of nearly all subunits of the protein complexes associated with respiratory electron transport chains, the V-ATPase, and the ubiquitin–proteasome systems. The WGD, together with recent gene duplications, resulted in the expansion of multiple gene families related to cell growth and signal transduction, as well as secreted aspartic protease and subtilase protein families, which are known fungal virulence factors. The duplication of the ergosterol biosynthetic pathway, especially the major azole target, lanosterol 14α-demethylase (ERG11), could contribute to the variable responses of R. oryzae to different azole drugs, including voriconazole and posaconazole. Expanded families of cell-wall synthesis enzymes, essential for fungal cell integrity but absent in mammalian hosts, reveal potential targets for novel and R. oryzae-specific diagnostic and therapeutic treatments. PMID:19578406

  15. Discovery of Gene Sources for Economic Traits in Hanwoo by Whole-genome Resequencing

    Directory of Open Access Journals (Sweden)

    Younhee Shin

    2016-09-01

    Full Text Available Hanwoo, a Korean native cattle (Bos taurus coreana, has great economic value due to high meat quality. Also, the breed has genetic variations that are associated with production traits such as health, disease resistance, reproduction, growth as well as carcass quality. In this study, next generation sequencing technologies and the availability of an appropriate reference genome were applied to discover a large amount of single nucleotide polymorphisms (SNPs in ten Hanwoo bulls. Analysis of whole-genome resequencing generated a total of 26.5 Gb data, of which 594,716,859 and 592,990,750 reads covered 98.73% and 93.79% of the bovine reference genomes of UMD 3.1 and Btau 4.6.1, respectively. In total, 2,473,884 and 2,402,997 putative SNPs were discovered, of which 1,095,922 (44.3% and 982,674 (40.9% novel SNPs were discovered against UMD3.1 and Btau 4.6.1, respectively. Among the SNPs, the 46,301 (UMD 3.1 and 28,613 SNPs (Btau 4.6.1 that were identified as Hanwoo-specific SNPs were included in the functional genes that may be involved in the mechanisms of milk production, tenderness, juiciness, marbling of Hanwoo beef and yellow hair. Most of the Hanwoo-specific SNPs were identified in the promoter region, suggesting that the SNPs influence differential expression of the regulated genes relative to the relevant traits. In particular, the non-synonymous (ns SNPs found in CORIN, which is a negative regulator of Agouti, might be a causal variant to determine yellow hair of Hanwoo. Our results will provide abundant genetic sources of variation to characterize Hanwoo genetics and for subsequent breeding.

  16. Recent advances in understanding the roles of whole genome duplications in evolution.

    Science.gov (United States)

    MacKintosh, Carol; Ferrier, David E K

    2017-01-01

    Ancient whole-genome duplications (WGDs)- paleo polyploidy events-are key to solving Darwin's 'abominable mystery' of how flowering plants evolved and radiated into a rich variety of species. The vertebrates also emerged from their invertebrate ancestors via two WGDs, and genomes of diverse gymnosperm trees, unicellular eukaryotes, invertebrates, fishes, amphibians and even a rodent carry evidence of lineage-specific WGDs. Modern polyploidy is common in eukaryotes, and it can be induced, enabling mechanisms and short-term cost-benefit assessments of polyploidy to be studied experimentally. However, the ancient WGDs can be reconstructed only by comparative genomics: these studies are difficult because the DNA duplicates have been through tens or hundreds of millions of years of gene losses, mutations, and chromosomal rearrangements that culminate in resolution of the polyploid genomes back into diploid ones (rediploidisation). Intriguing asymmetries in patterns of post-WGD gene loss and retention between duplicated sets of chromosomes have been discovered recently, and elaborations of signal transduction systems are lasting legacies from several WGDs. The data imply that simpler signalling pathways in the pre-WGD ancestors were converted via WGDs into multi-stranded parallelised networks. Genetic and biochemical studies in plants, yeasts and vertebrates suggest a paradigm in which different combinations of sister paralogues in the post-WGD regulatory networks are co-regulated under different conditions. In principle, such networks can respond to a wide array of environmental, sensory and hormonal stimuli and integrate them to generate phenotypic variety in cell types and behaviours. Patterns are also being discerned in how the post-WGD signalling networks are reconfigured in human cancers and neurological conditions. It is fascinating to unpick how ancient genomic events impact on complexity, variety and disease in modern life.

  17. Inference of gorilla demographic and selective history from whole-genome sequence data.

    Science.gov (United States)

    McManus, Kimberly F; Kelley, Joanna L; Song, Shiya; Veeramah, Krishna R; Woerner, August E; Stevison, Laurie S; Ryder, Oliver A; Ape Genome Project, Great; Kidd, Jeffrey M; Wall, Jeffrey D; Bustamante, Carlos D; Hammer, Michael F

    2015-03-01

    Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Whole genome wide expression profiles on germination of Verticillium dahliae microsclerotia.

    Directory of Open Access Journals (Sweden)

    Dongfang Hu

    Full Text Available Verticillium dahliae is a fungal pathogen causing Verticillium wilt on a range of economically important crops. Microsclerotia are its main survival and dormancy structures and serve as the primary inoculum on many hosts. Studies were conducted to determine the effect of temperature (5 to 50°C, pH (2 to 12 and nutrient regimes on microsclerotia germination. The optimal condition for microsclerotium germination was 20°C with pH 8.0 whereas nutrient regimes had no significant effect on its germination. The whole genome wide expression profiles during microsclerotium germination were characterized using the Illumina sequencing technology. Approximately 7.4 million of 21-nt cDNA tags were sequenced in the cDNA libraries derived from germinated and non-germinated microsclerotia. About 3.9% and 2.3% of the unique tags were up-regulated and down-regulated at least five-fold, respectively, in the germinated microsclerotia compared with the non-germinated microsclerotia. A total of 1654 genes showing differential expression were identified. Genes that are likely to have played important roles in microsclerotium germination include those encoding G-protein coupled receptor, lipase/esterase, cyclopentanone 1,2-monooxygenase, H(+/hexose cotransporter 1, fungal Zn(2-Cys(6 binuclear cluster domain, thymus-specific serine protease, glucan 1,3-beta-glucosidase, and alcohol dehydrogenase. These genes were mainly up-regulated or down-regulated only in germinated microsclerotia, compared with non-germinated microsclerotia. The differential expression of genes was confirmed by qRT-PCR analysis of 20 randomly selected genes from the 40 most differentially expressed genes.

  19. Functional diversification of vitamin D receptor paralogs in teleost fish after a whole genome duplication event.

    Science.gov (United States)

    Kollitz, Erin M; Hawkins, Mary Beth; Whitfield, G Kerr; Kullman, Seth W

    2014-12-01

    The diversity and success of teleost fishes (Actinopterygii) has been attributed to three successive rounds of whole-genome duplication (WGD). WGDs provide a source of raw genetic material for evolutionary forces to act upon, resulting in the divergence of genes with altered or novel functions. The retention of multiple gene pairs (paralogs) in teleosts provides a unique opportunity to study how genes diversify and evolve after a WGD. This study examines the hypothesis that vitamin D receptor (VDR) paralogs (VDRα and VDRβ) from two distantly related teleost orders have undergone functional divergence subsequent to the teleost-specific WGD. VDRα and VDRβ paralogs were cloned from the Japanese medaka (Beloniformes) and the zebrafish (Cypriniformes). Initial transactivation studies using 1α, 25-dihydroxyvitamin D3 revealed that although VDRα and VDRβ maintain similar ligand potency, the maximum efficacy of VDRβ was significantly attenuated compared with VDRα in both species. Subsequent analyses revealed that VDRα and VDRβ maintain highly similar ligand affinities; however, VDRα demonstrated preferential DNA binding compared with VDRβ. Protein-protein interactions between the VDR paralogs and essential nuclear receptor coactivators were investigated using transactivation and mammalian two-hybrid assays. Our results imply that functional differences between VDRα and VDRβ occurred early in teleost evolution because they are conserved between distantly related species. Our results further suggest that the observed differences may be associated with differential protein-protein interactions between the VDR paralogs and coactivators. We speculate that the observed functional differences are due to subtle ligand-induced conformational differences between the two paralogs, leading to divergent downstream functions.

  20. Whole-Genome Saliva and Blood DNA Methylation Profiling in Individuals with a Respiratory Allergy.

    Directory of Open Access Journals (Sweden)

    Sabine A S Langie

    Full Text Available The etiology of respiratory allergies (RA can be partly explained by DNA methylation changes caused by adverse environmental and lifestyle factors experienced early in life. Longitudinal, prospective studies can aid in the unravelment of the epigenetic mechanisms involved in the disease development. High compliance rates can be expected in these studies when data is collected using non-invasive and convenient procedures. Saliva is an attractive biofluid to analyze changes in DNA methylation patterns. We investigated in a pilot study the differential methylation in saliva of RA (n = 5 compared to healthy controls (n = 5 using the Illumina Methylation 450K BeadChip platform. We evaluated the results against the results obtained in mononuclear blood cells from the same individuals. Differences in methylation patterns from saliva and mononuclear blood cells were clearly distinguishable (PAdj0.2, though the methylation status of about 96% of the cg-sites was comparable between peripheral blood mononuclear cells and saliva. When comparing RA cases with healthy controls, the number of differentially methylated sites (DMS in saliva and blood were 485 and 437 (P0.1, respectively, of which 216 were in common. The methylation levels of these sites were significantly correlated between blood and saliva. The absolute levels of methylation in blood and saliva were confirmed for 3 selected DMS in the PM20D1, STK32C, and FGFR2 genes using pyrosequencing analysis. The differential methylation could only be confirmed for DMS in PM20D1 and STK32C genes in saliva. We show that saliva can be used for genome-wide methylation analysis and that it is possible to identify DMS when comparing RA cases and healthy controls. The results were replicated in blood cells of the same individuals and confirmed by pyrosequencing analysis. This study provides proof-of-concept for the applicability of saliva-based whole-genome methylation analysis in the field of respiratory allergy.

  1. Whole-genome sequencing reveals complex mechanisms of intrinsic resistance to BRAF inhibition.

    Science.gov (United States)

    Turajlic, S; Furney, S J; Stamp, G; Rana, S; Ricken, G; Oduko, Y; Saturno, G; Springer, C; Hayes, A; Gore, M; Larkin, J; Marais, R

    2014-05-01

    BRAF is mutated in ∼42% of human melanomas (COSMIC. http://www.sanger.ac.uk/genetics/CGP/cosmic/) and pharmacological BRAF inhibitors such as vemurafenib and dabrafenib achieve dramatic responses in patients whose tumours harbour BRAF(V600) mutations. Objective responses occur in ∼50% of patients and disease stabilisation in a further ∼30%, but ∼20% of patients present primary or innate resistance and do not respond. Here, we investigated the underlying cause of treatment failure in a patient with BRAF mutant melanoma who presented primary resistance. We carried out whole-genome sequencing and single nucleotide polymorphism (SNP) array analysis of five metastatic tumours from the patient. We validated mechanisms of resistance in a cell line derived from the patient's tumour. We observed that the majority of the single-nucleotide variants identified were shared across all tumour sites, but also saw site-specific copy-number alterations in discrete cell populations at different sites. We found that two ubiquitous mutations mediated resistance to BRAF inhibition in these tumours. A mutation in GNAQ sustained mitogen-activated protein kinase (MAPK) signalling, whereas a mutation in PTEN activated the PI3 K/AKT pathway. Inhibition of both pathways synergised to block the growth of the cells. Our analyses show that the five metastases arose from a common progenitor and acquired additional alterations after disease dissemination. We demonstrate that a distinct combination of mutations mediated primary resistance to BRAF inhibition in this patient. These mutations were present in all five tumours and in a tumour sample taken before BRAF inhibitor treatment was administered. Inhibition of both pathways was required to block tumour cell growth, suggesting that combined targeting of these pathways could have been a valid therapeutic approach for this patient.

  2. Quality control metrics improve repeatability and reproducibility of single-nucleotide variants derived from whole-genome sequencing.

    Science.gov (United States)

    Zhang, W; Soika, V; Meehan, J; Su, Z; Ge, W; Ng, H W; Perkins, R; Simonyan, V; Tong, W; Hong, H

    2015-08-01

    Although many quality control (QC) methods have been developed to improve the quality of single-nucleotide variants (SNVs) in SNV-calling, QC methods for use subsequent to single-nucleotide polymorphism-calling have not been reported. We developed five QC metrics to improve the quality of SNVs using the whole-genome-sequencing data of a monozygotic twin pair from the Korean Personal Genome Project. The QC metrics improved both repeatability between the monozygotic twin pair and reproducibility between SNV-calling pipelines. We demonstrated the QC metrics improve reproducibility of SNVs derived from not only whole-genome-sequencing data but also whole-exome-sequencing data. The QC metrics are calculated based on the reference genome used in the alignment without accessing the raw and intermediate data or knowing the SNV-calling details. Therefore, the QC metrics can be easily adopted in downstream association analysis.

  3. The use of mycobacterial interspersed repetitive unit typing and whole genome sequencing to inform tuberculosis prevention and control activities.

    Science.gov (United States)

    Gilbert, Gwendolyn L; Sintchenko, Vitali

    2013-07-01

    Molecular strain typing of Mycobacterium tuberculosis has been possible for only about 20 years; it has significantly improved our understanding of the evolution and epidemiology of Mycobacterium tuberculosis and tuberculosis disease. Mycobacterial interspersed repetitive unit typing, based on 24 variable number tandem repeat unit loci, is highly discriminatory, relatively easy to perform and interpret and is currently the most widely used molecular typing system for tuberculosis surveillance. Nevertheless, clusters identified by mycobacterial interspersed repetitive unit typing sometimes cannot be confirmed or adequately defined by contact tracing and additional methods are needed. Recently, whole genome sequencing has been used to identify single nucleotide polymorphisms and other mutations, between genotypically indistinguishable isolates from the same cluster, to more accurately trace transmission pathways. Rapidly increasing speed and quality and reduced costs will soon make large scale whole genome sequencing feasible, combined with the use of sophisticated bioinformatics tools, for epidemiological surveillance of tuberculosis.

  4. Whole genome sequence and genome annotation of Colletotrichum acutatum, causal agent of anthracnose in pepper plants in South Korea.

    Science.gov (United States)

    Han, Joon-Hee; Chon, Jae-Kyung; Ahn, Jong-Hwa; Choi, Ik-Young; Lee, Yong-Hwan; Kim, Kyoung Su

    2016-06-01

    Colletotrichum acutatum is a destructive fungal pathogen which causes anthracnose in a wide range of crops. Here we report the whole genome sequence and annotation of C. acutatum strain KC05, isolated from an infected pepper in Kangwon, South Korea. Genomic DNA from the KC05 strain was used for the whole genome sequencing using a PacBio sequencer and the MiSeq system. The KC05 genome was determined to be 52,190,760 bp in size with a G + C content of 51.73% in 27 scaffolds and to contain 13,559 genes with an average length of 1516 bp. Gene prediction and annotation were performed by incorporating RNA-Seq data. The genome sequence of the KC05 was deposited at DDBJ/ENA/GenBank under the accession number LUXP00000000.

  5. Investigations on Genetic Architecture of Hairy Loci in Dairy Cattle by Using Single and Whole Genome Regression Approaches

    Directory of Open Access Journals (Sweden)

    B. Karacaören

    2016-07-01

    Full Text Available Development of body hair is an important physiological and cellular process that leads to better adaption in tropical environments for dairy cattle. Various studies suggested a major gene and, more recently, associated genes for hairy locus in dairy cattle. Main aim of this study was to i employ a variant of the discordant sib pair model, in which half sibs from the same sires are randomly sampled using their affection statues, ii use various single marker regression approaches, and iii use whole genome regression approaches to dissect genetic architecture of the hairy gene in the cattle. Whole and single genome regression approaches detected strong genomic signals from Chromosome 23. Although there is a major gene effect on hairy phenotype sourced from chromosome 23: whole genome regression approach also suggested polygenic component related with other parts of the genome. Such a result could not be obtained by any of the single marker approaches.

  6. Genetic-linkage mapping of complex hereditary disorders to a whole-genome molecular-interaction network

    OpenAIRE

    Iossifov, Ivan; Zheng, Tian; Baron, Miron; Gilliam, T. Conrad; Rzhetsky, Andrey

    2008-01-01

    Common hereditary neurodevelopmental disorders such as autism, bipolar disorder, and schizophrenia are most likely both genetically multifactorial and heterogeneous. Because of these characteristics traditional methods for genetic analysis fail when applied to such diseases. To address the problem we propose a novel probabilistic framework that combines the standard genetic linkage formalism with whole-genome molecular-interaction data to predict pathways or networks of interacting genes that...

  7. Identification and Whole Genome Sequencing of the First Case of Kosakonia radicincitans Causing a Human Bloodstream Infection

    OpenAIRE

    Bhatti, Micah D.; Kalia, Awdhesh; Sahasrabhojane, Pranoti; Kim, Jiwoong; Greenberg, David E.; Shelburne, Samuel A.

    2017-01-01

    The taxonomy of Enterobacter species is rapidly changing. Herein we report a bloodstream infection isolate originally identified as Enterobacter cloacae by Vitek2 methodology that we found to be Kosakonia radicincitans using genetic means. Comparative whole genome sequencing of our isolate and other published Kosakonia genomes revealed these organisms lack the AmpC β-lactamase present on the chromosome of Enterobacter sp. A fimbriae operon primarily found in Escherichia coli O157:H7 isolates ...

  8. Whole genome sequencing of Halomonas sp. SUBG004 isolated from Little Rann of Kutch, a desert of India.

    Science.gov (United States)

    Patel, Jigna H; Thaker, Vrinda S

    2015-12-01

    A salt tolerant strain, designated as SUBG004, was isolated from the desert of India, Little Rann of Kutch. The organism is a Gram-negative, facultatively anaerobic and rod shaped bacterium. Chemotaxonomic and phylogenetic properties were consistent with its classification in the genus Halomonas. Here we report the whole genome sequence of Halomonas sp. SUBG004 deposited in DDBJ/EMBL/GenBank under accession number JPEU0100000 which provides insights for salt stress adaptation through betaine synthesis.

  9. Whole-genome sequence of Sunxiuqinia dokdonensis DH1(T), isolated from deep sub-seafloor sediment in Dokdo Island.

    Science.gov (United States)

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-09-01

    Sunxiuqinia dokdonensis DH1(T) was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island) in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  10. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions

    OpenAIRE

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role ...

  11. Whole Genome Sequencing of Danish Staphylococcus argenteus Reveals a Genetically Diverse Collection with Clear Separation from Staphylococcus aureus

    OpenAIRE

    Hansen, Thomas A.; Bartels, Mette D.; Hogh, Silje V.; Dons, Lone E.; Pedersen, Michael; Jensen, Thoger G.; Kemp, Michael; Skov, Marianne N.; Gumpert, Heidi; Worning, Peder; Westh, Henrik

    2017-01-01

    Staphylococcus argenteus (S. argenteus) is a newly identified Staphylococcus species that has been misidentified as Staphylococcus aureus (S. aureus) and is clinically relevant. We identified 25 S. argenteus genomes in our collection of whole genome sequenced S. aureus. These genomes were compared to publicly available genomes and a phylogeny revealed seven clusters corresponding to seven clonal complexes. The genome of S. argenteus was found to be different from the genome of S. aureus and a...

  12. New Sequence Types of Vibrio parahaemolyticus Isolated from a Malaysian Aquaculture Pond, as Revealed by Whole-Genome Sequencing.

    Science.gov (United States)

    Foo, Soon Man; Eng, Wilhelm Wei Han; Lee, Yin Peng; Gui, Kimberly; Gan, Han Ming

    2017-05-11

    The acquisition of Photorhabdus insect-related (Pir) toxin-like genes in Vibrio parahaemolyticus has been linked to hepatopancreatic necrosis disease in shrimp. We report the whole-genome sequences of genetically virulent and avirulent V. parahaemolyticus isolated from a Malaysian aquaculture pond and show that they represent previously unreported sequence types of V. parahaemolyticus. Copyright © 2017 Foo et al.

  13. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

    Science.gov (United States)

    Li, Miaoxin; Li, Jiang; Li, Mulin Jun; Pan, Zhicheng; Hsu, Jacob Shujui; Liu, Dajiang J; Zhan, Xiaowei; Wang, Junwen; Song, Youqiang; Sham, Pak Chung

    2017-05-19

    Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Comprehensive Rare Variant Analysis via Whole-Genome Sequencing to Determine the Molecular Pathology of Inherited Retinal Disease.

    Science.gov (United States)

    Carss, Keren J; Arno, Gavin; Erwood, Marie; Stephens, Jonathan; Sanchis-Juan, Alba; Hull, Sarah; Megy, Karyn; Grozeva, Detelina; Dewhurst, Eleanor; Malka, Samantha; Plagnol, Vincent; Penkett, Christopher; Stirrups, Kathleen; Rizzo, Roberta; Wright, Genevieve; Josifova, Dragana; Bitner-Glindzicz, Maria; Scott, Richard H; Clement, Emma; Allen, Louise; Armstrong, Ruth; Brady, Angela F; Carmichael, Jenny; Chitre, Manali; Henderson, Robert H H; Hurst, Jane; MacLaren, Robert E; Murphy, Elaine; Paterson, Joan; Rosser, Elisabeth; Thompson, Dorothy A; Wakeling, Emma; Ouwehand, Willem H; Michaelides, Michel; Moore, Anthony T; Webster, Andrew R; Raymond, F Lucy

    2017-01-05

    Inherited retinal disease is a common cause of visual impairment and represents a highly heterogeneous group of conditions. Here, we present findings from a cohort of 722 individuals with inherited retinal disease, who have had whole-genome sequencing (n = 605), whole-exome sequencing (n = 72), or both (n = 45) performed, as part of the NIHR-BioResource Rare Diseases research study. We identified pathogenic variants (single-nucleotide variants, indels, or structural variants) for 404/722 (56%) individuals. Whole-genome sequencing gives unprecedented power to detect three categories of pathogenic variants in particular: structural variants, variants in GC-rich regions, which have significantly improved coverage compared to whole-exome sequencing, and variants in non-coding regulatory regions. In addition to previously reported pathogenic regulatory variants, we have identified a previously unreported pathogenic intronic variant in CHM in two males with choroideremia. We have also identified 19 genes not previously known to be associated with inherited retinal disease, which harbor biallelic predicted protein-truncating variants in unsolved cases. Whole-genome sequencing is an increasingly important comprehensive method with which to investigate the genetic causes of inherited retinal disease. Copyright © 2017. Published by Elsevier Inc.

  15. Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

    Science.gov (United States)

    2014-01-01

    Background Following transmission, HIV-1 evolves into a diverse population, and next generation sequencing enables us to detect variants occurring at low frequencies. Studying viral evolution at the level of whole genomes was hitherto not possible because next generation sequencing delivers relatively short reads. Results We here provide a proof of principle that whole HIV-1 genomes can be reliably reconstructed from short reads, and use this to study the selection of immune escape mutations at the level of whole genome haplotypes. Using realistically simulated HIV-1 populations, we demonstrate that reconstruction of complete genome haplotypes is feasible with high fidelity. We do not reconstruct all genetically distinct genomes, but each reconstructed haplotype represents one or more of the quasispecies in the HIV-1 population. We then reconstruct 30 whole genome haplotypes from published short sequence reads sampled longitudinally from a single HIV-1 infected patient. We confirm the reliability of the reconstruction by validating our predicted haplotype genes with single genome amplification sequences, and by comparing haplotype frequencies with observed epitope escape frequencies. Conclusions Phylogenetic analysis shows that the HIV-1 population undergoes selection driven evolution, with successive replacement of the viral population by novel dominant strains. We demonstrate that immune escape mutants evolve in a dependent manner with various mutations hitchhiking along with others. As a consequence of this clonal interference, selection coefficients have to be estimated for complete haplotypes and not for individual immune escapes. PMID:24996694

  16. High Resolution Typing by Whole Genome Mapping Enables Discrimination of LA-MRSA (CC398 Strains and Identification of Transmission Events.

    Directory of Open Access Journals (Sweden)

    Thijs Bosch

    Full Text Available After its emergence in 2003, a livestock-associated (LA-MRSA clade (CC398 has caused an impressive increase in the number of isolates submitted for the Dutch national MRSA surveillance and now comprises 40% of all isolates. The currently used molecular typing techniques have limited discriminatory power for this MRSA clade, which hampers studies on the origin and transmission routes. Recently, a new molecular analysis technique named whole genome mapping was introduced. This method creates high-resolution, ordered whole genome restriction maps that may have potential for strain typing. In this study, we assessed and validated the capability of whole genome mapping to differentiate LA-MRSA isolates. Multiple validation experiments showed that whole genome mapping produced highly reproducible results. Assessment of the technique on two well-documented MRSA outbreaks showed that whole genome mapping was able to confirm one outbreak, but revealed major differences between the maps of a second, indicating that not all isolates belonged to this outbreak. Whole genome mapping of LA-MRSA isolates that were epidemiologically unlinked provided a much higher discriminatory power than spa-typing or MLVA. In contrast, maps created from LA-MRSA isolates obtained during a proven LA-MRSA outbreak were nearly indistinguishable showing that transmission of LA-MRSA can be detected by whole genome mapping. Finally, whole genome maps of LA-MRSA isolates originating from two unrelated veterinarians and their household members showed that veterinarians may carry and transmit different LA-MRSA strains at the same time. No such conclusions could be drawn based spa-typing and MLVA. Although PFGE seems to be suitable for molecular typing of LA-MRSA, WGM provides a much higher discriminatory power. Furthermore, whole genome mapping can provide a comparison with other maps within 2 days after the bacterial culture is received, making it suitable to investigate transmission

  17. Operational amplifiers

    CERN Document Server

    Dostal, Jiri

    1993-01-01

    This book provides the reader with the practical knowledge necessary to select and use operational amplifier devices. It presents an extensive treatment of applications and a practically oriented, unified theory of operational circuits.Provides the reader with practical knowledge necessary to select and use operational amplifier devices. Presents an extensive treatment of applications and a practically oriented, unified theory of operational circuits

  18. Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study.

    Science.gov (United States)

    Harris, Simon R; Cartwright, Edward J P; Török, M Estée; Holden, Matthew T G; Brown, Nicholas M; Ogilvy-Stuart, Amanda L; Ellington, Matthew J; Quail, Michael A; Bentley, Stephen D; Parkhill, Julian; Peacock, Sharon J

    2013-02-01

    The emergence of meticillin-resistant Staphylococcus aureus (MRSA) that can persist in the community and replace existing hospital-adapted lineages of MRSA means that it is necessary to understand transmission dynamics in terms of hospitals and the community as one entity. We assessed the use of whole-genome sequencing to enhance detection of MRSA transmission between these settings. We studied a putative MRSA outbreak on a special care baby unit (SCBU) at a National Health Service Foundation Trust in Cambridge, UK. We used whole-genome sequencing to validate and expand findings from an infection-control team who assessed the outbreak through conventional analysis of epidemiological data and antibiogram profiles. We sequenced isolates from all colonised patients in the SCBU, and sequenced MRSA isolates from patients in the hospital or community with the same antibiotic susceptibility profile as the outbreak strain. The hospital infection-control team identified 12 infants colonised with MRSA in a 6 month period in 2011, who were suspected of being linked, but a persistent outbreak could not be confirmed with conventional methods. With whole-genome sequencing, we identified 26 related cases of MRSA carriage, and showed transmission occurred within the SCBU, between mothers on a postnatal ward, and in the community. The outbreak MRSA type was a new sequence type (ST) 2371, which is closely related to ST22, but contains genes encoding Panton-Valentine leucocidin. Whole-genome sequencing data were used to propose and confirm that MRSA carriage by a staff member had allowed the outbreak to persist during periods without known infection on the SCBU and after a deep clean. Whole-genome sequencing holds great promise for rapid, accurate, and comprehensive identification of bacterial transmission pathways in hospital and community settings, with concomitant reductions in infections, morbidity, and costs. UK Clinical Research Collaboration Translational Infection Research

  19. Different responsiveness to a high-fat/cholesterol diet in two inbred mice and underlying genetic factors: a whole genome microarray analysis

    Directory of Open Access Journals (Sweden)

    Jin Gang

    2009-10-01

    Full Text Available Abstract Background To investigate different responses to a high-fat/cholesterol diet and uncover their underlying genetic factors between C57BL/6J (B6 and DBA/2J (D2 inbred mice. Methods B6 and D2 mice were fed a high-fat/cholesterol diet for a series of time-points. Serum and bile lipid profiles, bile acid yields, hepatic apoptosis, gallstones and atherosclerosis formation were measured. Furthermore, a whole genome microarray was performed to screen hepatic genes expression profile. Quantitative real-time PCR, western blot and TUNEL assay were conducted to validate microarray data. Results After fed the high-fat/cholesterol diet, serum and bile total cholesterol, serum cholesterol esters, HDL cholesterol and Non-HDL cholesterol levels were altered in B6 but not significantly changed in D2; meanwhile, biliary bile acid was decreased in B6 but increased in D2. At the same time, hepatic apoptosis, gallstones and atherosclerotic lesions occurred in B6 but not in D2. The hepatic microarray analysis revealed distinctly different genes expression patterns between B6 and D2 mice. Their functional pathway groups included lipid metabolism, oxidative stress, immune/inflammation response and apoptosis. Quantitative real time PCR, TUNEL assay and western-blot results were consistent with microarray analysis. Conclusion Different genes expression patterns between B6 and D2 mice might provide a genetic basis for their distinctive responses to a high-fat/cholesterol diet, and give us an opportunity to identify novel pharmaceutical targets in related diseases in the future.

  20. Whole Genome Sequencing Based Characterization of Extensively Drug-Resistant Mycobacterium tuberculosis Isolates from Pakistan

    KAUST Repository

    Ali, Asho

    2015-02-26

    Improved molecular diagnostic methods for detection drug resistance in Mycobacterium tuberculosis (MTB) strains are required. Resistance to first- and second- line anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs) in particular genes. However, these SNPs can vary between MTB lineages therefore local data is required to describe different strain populations. We used whole genome sequencing (WGS) to characterize 37 extensively drug-resistant (XDR) MTB isolates from Pakistan and investigated 40 genes associated with drug resistance. Rifampicin resistance was attributable to SNPs in the rpoB hot-spot region. Isoniazid resistance was most commonly associated with the katG codon 315 (92%) mutation followed by inhA S94A (8%) however, one strain did not have SNPs in katG, inhA or oxyR-ahpC. All strains were pyrazimamide resistant but only 43% had pncA SNPs. Ethambutol resistant strains predominantly had embB codon 306 (62%) mutations, but additional SNPs at embB codons 406, 378 and 328 were also present. Fluoroquinolone resistance was associated with gyrA 91-94 codons in 81% of strains; four strains had only gyr B mutations, while others did not have SNPs in either gyrA or gyrB. Streptomycin resistant strains had mutations in ribosomal RNA genes; rpsL codon 43 (42%); rrs 500 region (16%), and gidB (34%) while six strains did not have mutations in any of these genes. Amikacin/kanamycin/capreomycin resistance was associated with SNPs in rrs at nt1401 (78%) and nt1484 (3%), except in seven (19%) strains. We estimate that if only the common hot-spot region targets of current commercial assays were used, the concordance between phenotypic and genotypic testing for these XDR strains would vary between rifampicin (100%), isoniazid (92%), flouroquinolones (81%), aminoglycoside (78%) and ethambutol (62%); while pncA sequencing would provide genotypic resistance in less than half the isolates. This work highlights the importance of expanded

  1. Whole Genome Association Studies of Residual Feed Intake and Related Traits in the Pig.

    Directory of Open Access Journals (Sweden)

    Suneel K Onteru

    Full Text Available Residual feed intake (RFI, a measure of feed efficiency, is the difference between observed feed intake and the expected feed requirement predicted from growth and maintenance. Pigs with low RFI have reduced feed costs without compromising their growth. Identification of genes or genetic markers associated with RFI will be useful for marker-assisted selection at an early age of animals with improved feed efficiency.Whole genome association studies (WGAS for RFI, average daily feed intake (ADFI, average daily gain (ADG, back fat (BF and loin muscle area (LMA were performed on 1,400 pigs from the divergently selected ISU-RFI lines, using the Illumina PorcineSNP60 BeadChip. Various statistical methods were applied to find SNPs and genomic regions associated with the traits, including a Bayesian approach using GenSel software, and frequentist approaches such as allele frequency differences between lines, single SNP and haplotype analyses using PLINK software. Single SNP and haplotype analyses showed no significant associations (except for LMA after genomic control and FDR. Bayesian analyses found at least 2 associations for each trait at a false positive probability of 0.5. At generation 8, the RFI selection lines mainly differed in allele frequencies for SNPs near (<0.05 Mb genes that regulate insulin release and leptin functions. The Bayesian approach identified associations of genomic regions containing insulin release genes (e.g., GLP1R, CDKAL, SGMS1 with RFI and ADFI, of regions with energy homeostasis (e.g., MC4R, PGM1, GPR81 and muscle growth related genes (e.g., TGFB1 with ADG, and of fat metabolism genes (e.g., ACOXL, AEBP1 with BF. Specifically, a very highly significantly associated QTL for LMA on SSC7 with skeletal myogenesis genes (e.g., KLHL31 was identified for subsequent fine mapping.Important genomic regions associated with RFI related traits were identified for future validation studies prior to their incorporation in marker

  2. TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

    Science.gov (United States)

    Gilly, Arthur; Etcheverry, Mathilde; Madoui, Mohammed-Amin; Guy, Julie; Quadrana, Leandro; Alberti, Adriana; Martin, Antoine; Heitkam, Tony; Engelen, Stefan; Labadie, Karine; Le Pen, Jeremie; Wincker, Patrick; Colot, Vincent; Aury, Jean-Marc

    2014-11-19

    Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements. We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker . We show that TE-Tracker accurately detects both the source and destination of

  3. Reducing INDEL calling errors in whole genome and exome sequencing data.

    Science.gov (United States)

    Fang, Han; Wu, Yiyang; Narzisi, Giuseppe; O'Rawe, Jason A; Barrón, Laura T Jimenez; Rosenbaum, Julie; Ronemus, Michael; Iossifov, Ivan; Schatz, Michael C; Lyon, Gholson J

    2014-01-01

    INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low-quality INDELs (7% vs. 51%). Simulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (53%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (84% vs. 57%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data. Overall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project

  4. Whole-genome sequence analysis of the naturally competent Acinetobacter baumannii clinical isolate A118.

    Science.gov (United States)

    Traglia, German M; Chua, Katherina; Centrón, Daniela; Tolmasky, Marcelo E; Ramírez, María Soledad

    2014-08-26

    Recent studies have demonstrated a high genomic plasticity in Acinetobacter baumannii, which may explain its high capacity to acquire multiple antibiotic resistance determinants and to survive in the hospital environment. Acinetobacter baumannii strain A118 (Ab A118) was isolated in the year 1995 from a blood culture of an intensive care unit patient. As this particular strain showed some peculiar characteristic such as being naturally competent and susceptible to numerous antibiotics, we performed whole-genome comparison (WGC) studies to gain insights into the nature and extent of the genomic differences. The Ab A118 genome is approximately 3,824 kb long with a 38.4% GC content and contains 3,520 coding sequences. WGC studies showed that the Ab A118 genome has 98% average nucleotide identity with that of A. baumannii ATCC 17978, and 96% average nucleotide identity with that of strains AYE and ACICU. At least 12 inversions, 275 insertions, and 626 deletions were identified when the Ab A118 genome was compared with those of strains ATCC 17978, AYE, and ACICU using MAUVE WGC. Multiple gene order arrangements were observed among the analyzed strains. MAUVE WGC analysis identified 19 conserved segments, known as locally colinear blocks. The number of single nucleotide polymorphisms found when comparing the Ab A118 genome with that of strains ATCC 17978, AYE, and ACICU was 43,784 (1.1496%), 44,130 (1.158%), and 43,914 (1.153%), respectively. Genes comEA, pilQ, pilD, pilF, comL, pilA, comEC, pilI, pilH, pilO, pilN, pilY1(comC), pilE, pilR, and comM, potentially involved in natural competence were found in the Ab A118 genome. In particular, unlike in most strains where comM is interrupted by an insertion of a resistance island (AbaR), in strain Ab A118 it is uninterrupted. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  5. Sequence and analysis of a whole genome from Kuwaiti population subgroup of Persian ancestry.

    Science.gov (United States)

    Thareja, Gaurav; John, Sumi Elsa; Hebbar, Prashantha; Behbehani, Kazem; Thanaraj, Thangavel Alphonse; Alsmadi, Osama

    2015-02-18

    The 1000 Genome project paved the way for sequencing diverse human populations. New genome projects are being established to sequence underrepresented populations helping in understanding human genetic diversity. The Kuwait Genome Project an initiative to sequence individual genomes from the three subgroups of Kuwaiti population namely, Saudi Arabian tribe; "tent-dwelling" Bedouin; and Persian, attributing their ancestry to different regions in Arabian Peninsula and to modern-day Iran (West Asia). These subgroups were in line with settlement history and are confirmed by genetic studies. In this work, we report whole genome sequence of a Kuwaiti native from Persian subgroup at >37X coverage. We document 3,573,824 SNPs, 404,090 insertions/deletions, and 11,138 structural variations. Out of the reported SNPs and indels, 85,939 are novel. We identify 295 'loss-of-function' and 2,314 'deleterious' coding variants, some of which carry homozygous genotypes in the sequenced genome; the associated phenotypes include pharmacogenomic traits such as greater triglyceride lowering ability with fenofibrate treatment, and requirement of high warfarin dosage to elicit anticoagulation response. 6,328 non-coding SNPs associate with 811 phenotype traits: in congruence with medical history of the participant for Type 2 diabetes and β-Thalassemia, and of participant's family for migraine, 72 (of 159 known) Type 2 diabetes, 3 (of 4) β-Thalassemia, and 76 (of 169) migraine variants are seen in the genome. Intergenome comparisons based on shared disease-causing variants, positions the sequenced genome between Asian and European genomes in congruence with geographical location of the region. On comparison, bead arrays perform better than sequencing platforms in correctly calling genotypes in low-coverage sequenced genome regions however in the event of novel SNP or indel near genotype calling position can lead to false calls using bead arrays. We report, for the first time, reference

  6. Whole genome sequencing reveals genetic heterogeneity of G3P[8] rotaviruses circulating in Italy.

    Science.gov (United States)

    Medici, Maria Cristina; Tummolo, Fabio; Martella, Vito; Arcangeletti, Maria Cristina; De Conto, Flora; Chezzi, Carlo; Magrì, Alessandro; Fehér, Enikő; Marton, Szilvia; Calderaro, Adriana; Bányai, Krisztián

    2016-06-01

    After a sporadic detection in 1990s, G3P[8] rotaviruses emerged as a predominant genotype during recent years in many areas worldwide, including parts of Italy. The present study describes the molecular epidemiology and evolution of G3P[8] rotaviruses detected in Italian children with gastroenteritis during two survey periods (2004-2005 and 2008-2013). Whole genome of selected G3P[8] strains was determined and antigenic differences between these strains and rotavirus vaccine strains were analyzed. Among 819 (271 in 2004-2005 and 548 in 2008-2013) rotaviruses genotyped during the survey periods, the number of G3P[8] rotavirus markedly varied over the years (0/83 in 2004, 30/188 in 2005 and 0/96 in 2008, 6/88 in 2009, 4/97 in 2010, 0/83 in 2011, 9/82 in 2012, 56/102 cases in 2013). The genotypes of the 11 gene segments of 15 selected strains were assigned to G3-P[8]-I1-R1-C1-M1-A1-N1-T1-E1-H1; thus all strains belonged to the Wa genogroup. Phylogenetic analysis of the Italian G3P[8] strains showed a peculiar picture of segregation with a 2012 lineage for VP1-VP3, NSP1, NSP2, NSP4 and NSP5 genes and a 2013 lineage for VP6, NSP1 and NSP3 genes, with a 1.3-20.2% nucleotide difference from the oldest Italian G3P[8] strains. The genetic variability of the Italian G3P[8] observed in comparison with sequences of rotaviruses available in GenBank suggested a process of selection acting on a global scale, rather than the emergence of local strains, as several lineages were already circulating globally. Compared with the vaccine strains, the Italian G3P[8] rotaviruses segregated in different lineages (5-5.3% and 7.2-11.4% nucleotide differences in the VP7 and VP4, respectively) with some mismatches in the putative neutralizing epitopes of VP7 and VP4 antigens. The accumulation of point mutations and amino acid differences between vaccine strains and currently circulating rotaviruses might generate, over the years, vaccine-resistant variants. Copyright © 2016 Elsevier B.V. All

  7. Functional regression method for whole genome eQTL epistasis analysis with sequencing data.

    Science.gov (United States)

    Xu, Kelin; Jin, Li; Xiong, Momiao

    2017-05-18

    Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction

  8. Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery

    Directory of Open Access Journals (Sweden)

    Stothard Paul

    2011-11-01

    Full Text Available Abstract Background One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle. Results The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs, 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs. Ten

  9. Whole genome analysis of p38 SAPK-mediated gene expression upon stress

    Directory of Open Access Journals (Sweden)

    Lopez-Bigas Nuria

    2010-03-01

    Full Text Available Abstract Background Cells have the ability to respond and adapt to environmental changes through activation of stress-activated protein kinases (SAPKs. Although p38 SAPK signalling is known to participate in the regulation of gene expression little is known on the molecular mechanisms used by this SAPK to regulate stress-responsive genes and the overall set of genes regulated by p38 in response to different stimuli. Results Here, we report a whole genome expression analyses on mouse embryonic fibroblasts (MEFs treated with three different p38 SAPK activating-stimuli, namely osmostress, the cytokine TNFα and the protein synthesis inhibitor anisomycin. We have found that the activation kinetics of p38α SAPK in response to these insults is different and also leads to a complex gene pattern response specific for a given stress with a restricted set of overlapping genes. In addition, we have analysed the contribution of p38α the major p38 family member present in MEFs, to the overall stress-induced transcriptional response by using both a chemical inhibitor (SB203580 and p38α deficient (p38α-/- MEFs. We show here that p38 SAPK dependency ranged between 60% and 88% depending on the treatments and that there is a very good overlap between the inhibitor treatment and the ko cells. Furthermore, we have found that the dependency of SAPK varies depending on the time the cells are subjected to osmostress. Conclusions Our genome-wide transcriptional analyses shows a selective response to specific stimuli and a restricted common response of up to 20% of the stress up-regulated early genes that involves an important set of transcription factors, which might be critical for either cell adaptation or preparation for continuous extra-cellular changes. Interestingly, up to 85% of the up-regulated genes are under the transcriptional control of p38 SAPK. Thus, activation of p38 SAPK is critical to elicit the early gene expression program required for cell

  10. Structural and functional-annotation of an equine whole genome oligoarray

    Directory of Open Access Journals (Sweden)

    Chowdhary Bhanu

    2009-10-01

    Full Text Available Abstract Background The horse genome is sequenced, allowing equine researchers to use high-throughput functional genomics platforms such as microarrays; next-generation sequencing for gene expression and proteomics. However, for researchers to derive value from these functional genomics datasets, they must be able to model this data in biologically relevant ways; to do so requires that the equine genome be more fully annotated. There are two interrelated types of genomic annotation: structural and functional. Structural annotation is delineating and demarcating the genomic elements (such as genes, promoters, and regulatory elements. Functional annotation is assigning function to structural elements. The Gene Ontology (GO is the de facto standard for functional annotation, and is routinely used as a basis for modelling and hypothesis testing, large functional genomics datasets. Results An Equine Whole Genome Oligonucleotide (EWGO array with 21,351 elements was developed at Texas A&M University. This 70-mer oligoarray was designed using the approximately 7× assembled and annotated sequence of the equine genome to be one of the most comprehensive arrays available for expressed equine sequences. To assist researchers in determining the biological meaning of data derived from this array, we have structurally annotated it by mapping the elements to multiple database accessions, including UniProtKB, Entrez Gene, NRPD (Non-Redundant Protein Database and UniGene. We next provided GO functional annotations for the gene transcripts represented on this array. Overall, we GO annotated 14,531 gene products (68.1% of the gene products represented on the EWGO array with 57,912 annotations. GAQ (GO Annotation Quality scores were calculated for this array both before and after we added GO annotation. The additional annotations improved the meanGAQ score 16-fold. This data is publicly available at AgBase http://www.agbase.msstate.edu/. Conclusion Providing

  11. Whole genome transcript profiling from fingerstick blood samples: a comparison and feasibility study

    Directory of Open Access Journals (Sweden)

    Williams Adam R

    2009-12-01

    Full Text Available Abstract Background Whole genome gene expression profiling has revolutionized research in the past decade especially with the advent of microarrays. Recently, there have been significant improvements in whole blood RNA isolation techniques which, through stabilization of RNA at the time of sample collection, avoid bias and artifacts introduced during sample handling. Despite these improvements, current human whole blood RNA stabilization/isolation kits are limited by the requirement of a venous blood sample of at least 2.5 mL. While fingerstick blood collection has been used for many different assays, there has yet to be a kit developed to isolate high quality RNA for use in gene expression studies from such small human samples. The clinical and field testing advantages of obtaining reliable and reproducible gene expression data from a fingerstick are many; it is less invasive, time saving, more mobile, and eliminates the need of a trained phlebotomist. Furthermore, this method could also be employed in small animal studies, i.e. mice, where larger sample collections often require sacrificing the animal. In this study, we offer a rapid and simple method to extract sufficient amounts of high quality total RNA from approximately 70 μl of whole blood collected via a fingerstick using a modified protocol of the commercially available Qiagen PAXgene RNA Blood Kit. Results From two sets of fingerstick collections, about 70 uL whole blood collected via finger lancet and capillary tube, we recovered an average of 252.6 ng total RNA with an average RIN of 9.3. The post-amplification yields for 50 ng of total RNA averaged at 7.0 ug cDNA. The cDNA hybridized to Affymetrix HG-U133 Plus 2.0 GeneChips had an average % Present call of 52.5%. Both fingerstick collections were highly correlated with r2 values ranging from 0.94 to 0.97. Similarly both fingerstick collections were highly correlated to the venous collection with r2 values ranging from 0.88 to 0

  12. Distinct functions of two olfactory marker protein genes derived from teleost-specific whole genome duplication.

    Science.gov (United States)

    Suzuki, Hikoyu; Nikaido, Masato; Hagino-Yamagishi, Kimiko; Okada, Norihiro

    2015-11-10

    Whole genome duplications (WGDs) have been proposed to have made a significant impact on vertebrate evolution. Two rounds of WGD (1R and 2R) occurred in the common ancestor of Gnathostomata and Cyclostomata, followed by the third-round WGD (3R) in a common ancestor of all modern teleosts. The 3R-derived paralogs are good models for understanding the evolution of genes after WGD, which have the potential to facilitate phenotypic diversification. However, the recent studies of 3R-derived paralogs tend to be based on in silico analyses. Here we analyzed the paralogs encoding teleost olfactory marker protein (OMP), which was shown to be specifically expressed in mature olfactory sensory neurons and is expected to be involved in olfactory transduction. Our genome database search identified two OMPs (OMP1 and OMP2) in teleosts, whereas only one was present in other vertebrates. Phylogenetic and synteny analyses suggested that OMP1 and 2 were derived from 3R. Both OMPs showed distinct expression patterns in zebrafish; OMP1 was expressed in the deep layer of the olfactory epithelium (OE), which is consistent with previous studies of mice and zebrafish, whereas OMP2 was sporadically expressed in the superficial layer. Interestingly, OMP2 was expressed in a very restricted region of the retina as well as in the OE. In addition, the analysis of transcriptome data of spotted gar, a non-teleost fish, revealed that single OMP gene was expressed in the eyes. We found distinct expression patterns of zebrafish OMP1 and 2 at the tissue and cellular level. These differences in expression patterns may be explained by subfunctionalization as the model of molecular evolution. Namely, single OMP gene was speculated to be originally expressed in the OE and the eyes in the common ancestor of all Osteichthyes (bony fish including tetrapods). Then, two OMP gene paralogs derived from 3R-WGD reduced and specialized the expression patterns. This study provides a good example for analyzing a

  13. Draft Whole-Genome Sequence of the Alkaliphilic Alishewanella aestuarii Strain HH-ZS, Isolated from Historical Lime Kiln Waste-Contaminated Soil

    OpenAIRE

    Salah, Zohier B.; Rout, Simon P.; Humphreys, Paul

    2016-01-01

    Here, we present the whole-genome sequence of an environmental Gram-negative Alishewanella aestuarii strain (HH-ZS), isolated\\ud from the hyperalkaline contaminated soil of a historical lime kiln in Buxton, United Kingdom.

  14. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Science.gov (United States)

    Wilson, Mark R; Brown, Eric; Keys, Chris; Strain, Errol; Luo, Yan; Muruvanda, Tim; Grim, Christopher; Jean-Gilles Beaubrun, Junia; Jarvis, Karen; Ewing, Laura; Gopinath, Gopal; Hanes, Darcy; Allard, Marc W; Musser, Steven

    2016-01-01

    Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS) to Salmonella subspecies enterica serotype Tennessee (S. Tennessee) to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana), which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP) analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs), suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts during future

  15. Whole-Genome Sequencing and Concordance Between Antimicrobial Susceptibility Genotypes and Phenotypes of Bacterial Isolates Associated with Bovine Respiratory Disease

    Directory of Open Access Journals (Sweden)

    Joseph R. Owen

    2017-09-01

    Full Text Available Extended laboratory culture and antimicrobial susceptibility testing timelines hinder rapid species identification and susceptibility profiling of bacterial pathogens associated with bovine respiratory disease, the most prevalent cause of cattle mortality in the United States. Whole-genome sequencing offers a culture-independent alternative to current bacterial identification methods, but requires a library of bacterial reference genomes for comparison. To contribute new bacterial genome assemblies and evaluate genetic diversity and variation in antimicrobial resistance genotypes, whole-genome sequencing was performed on bovine respiratory disease–associated bacterial isolates (Histophilus somni, Mycoplasma bovis, Mannheimia haemolytica, and Pasteurella multocida from dairy and beef cattle. One hundred genomically distinct assemblies were added to the NCBI database, doubling the available genomic sequences for these four species. Computer-based methods identified 11 predicted antimicrobial resistance genes in three species, with none being detected in M. bovis. While computer-based analysis can identify antibiotic resistance genes within whole-genome sequences (genotype, it may not predict the actual antimicrobial resistance observed in a living organism (phenotype. Antimicrobial susceptibility testing on 64 H. somni, M. haemolytica, and P. multocida isolates had an overall concordance rate between genotype and phenotypic resistance to the associated class of antimicrobials of 72.7% (P < 0.001, showing substantial discordance. Concordance rates varied greatly among different antimicrobial, antibiotic resistance gene, and bacterial species combinations. This suggests that antimicrobial susceptibility phenotypes are needed to complement genomically predicted antibiotic resistance gene genotypes to better understand how the presence of antibiotic resistance genes within a given bacterial species could potentially impact optimal bovine respiratory

  16. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Directory of Open Access Journals (Sweden)

    Mark R Wilson

    Full Text Available Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS to Salmonella subspecies enterica serotype Tennessee (S. Tennessee to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana, which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs, suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts

  17. Whole genome sequencing reveals mycobacterial microevolution among concurrent isolates from sputum and blood in HIV infected TB patients.

    Science.gov (United States)

    Ssengooba, Willy; de Jong, Bouke C; Joloba, Moses L; Cobelens, Frank G; Meehan, Conor J

    2016-08-05

    In the context of advanced immunosuppression, M. tuberculosis is known to cause detectable mycobacteremia. However, little is known about the intra-patient mycobacterial microevolution and the direction of seeding between the sputum and blood compartments. From a diagnostic study of HIV-infected TB patients, 51 pairs of concurrent blood and sputum M. tuberculosis isolates from the same patient were available. In a previous analysis, we identified a subset with genotypic concordance, based on spoligotyping and 24 locus MIRU-VNTR. These paired isolates with identical genotypes were analyzed by whole genome sequencing and phylogenetic analysis. Of the 25 concordant pairs (49 % of the 51 paired isolates), 15 (60 %) remained viable for extraction of high quality DNA for whole genome sequencing. Two patient pairs were excluded due to poor quality sequence reads. The median CD4 cell count was 32 (IQR; 16-101)/mm(3) and ten (77 %) patients were on ART. No drug resistance mutations were identified in any of the sequences analyzed. Three (23.1 %) of 13 patients had SNPs separating paired isolates from blood and sputum compartments, indicating evidence of microevolution. Using a phylogenetic approach to identify the ancestral compartment, in two (15 %) patients the blood isolate was ancestral to the sputum isolate, in one (8 %) it was the opposite, and ten (77 %) of the pairs were identical. Among HIV-infected patients with poor cellular immunity, infection with multiple strains of M. tuberculosis was found in half of the patients. In those patients with identical strains, whole genome sequencing indicated that M. tuberculosis intra-patient microevolution does occur in a few patients, yet did not reveal a consistent direction of spread between sputum and blood. This suggests that these compartments are highly connected and potentially seed each other repeatedly.

  18. GENOME-WIDE ASSOCIATION ANALYSES BASED ON WHOLE-GENOME SEQUENCING IN SARDINIA PROVIDE INSIGHTS INTO REGULATION OF HEMOGLOBIN LEVELS

    Science.gov (United States)

    Danjou, Fabrice; Zoledziewska, Magdalena; Sidore, Carlo; Steri, Maristella; Busonero, Fabio; Maschio, Andrea; Mulas, Antonella; Perseu, Lucia; Barella, Susanna; Porcu, Eleonora; Pistis, Giorgio; Pitzalis, Maristella; Pala, Mauro; Menzel, Stephan; Metrustry, Sarah; Spector, Timothy D.; Leoni, Lidia; Angius, Andrea; Uda, Manuela; Moi, Paolo; Thein, Swee Lay; Galanello, Renzo; Abecasis, Gonçalo R.; Schlessinger, David; Sanna, Serena; Cucca, Francesco

    2015-01-01

    We report GWAS results for the levels of A1, A2 and fetal hemoglobins, analyzed for the first time concurrently. Integrating high-density array genotyping and whole-genome sequencing in a large general population cohort from Sardinia, we detected 23 associations at 10 loci. Five are due to variants at previously undetected loci: MPHOSPH9, PLTP-PCIF1, FOG1, NFIX, and CCND3. Among those at known loci, 10 are new lead variants and 4 are novel independent signals. Half of all variants also showed pleiotropic associations with different hemoglobins, which further corroborated some of the detected associations and revealed features of coordinated hemoglobin species production. PMID:26366553

  19. Identification and Whole Genome Sequencing of the First Case of Kosakonia radicincitans Causing a Human Bloodstream Infection.

    Science.gov (United States)

    Bhatti, Micah D; Kalia, Awdhesh; Sahasrabhojane, Pranoti; Kim, Jiwoong; Greenberg, David E; Shelburne, Samuel A

    2017-01-01

    The taxonomy of Enterobacter species is rapidly changing. Herein we report a bloodstream infection isolate originally identified as Enterobacter cloacae by Vitek2 methodology that we found to be Kosakonia radicincitans using genetic means. Comparative whole genome sequencing of our isolate and other published Kosakonia genomes revealed these organisms lack the AmpC β-lactamase present on the chromosome of Enterobacter sp. A fimbriae operon primarily found in Escherichia coli O157:H7 isolates was present in our organism and other available K. radicincitans genomes. This is the first report of a Kosakonia species, which are typically associated with plants, causing a human infection.

  20. Whole-Genome Expression Analysis of Human Mesenchymal Stromal Cells Exposed to Ultrasmooth Tantalum vs. Titanium Oxide Surfaces

    DEFF Research Database (Denmark)

    Stiehler, C.; Bunger, C.; Overall, R. W.

    2013-01-01

    to Ti surface. Key genes related to osteogenesis and cell adhesion were upregulated by MSCs exposed to Ta. We further identified differentially regulated candidate transcription factors, e.g., NRF2, EGR1, IRF-1, IRF-8, NF-Y, and p53 as well as relevant signaling pathways, e.g., p53 and mTOR, indicating...... to titanium (Ti) surface. The aim of this study was to extend the previous investigation of biocompatibility by monitoring temporal gene expression of MSCs on topographically comparable smooth Ta and Ti surfaces using whole-genome gene expression analysis. Total RNA samples from telomerase-immortalized human...

  1. Generation of whole genome sequences of new Cryptosporidium hominis and Cryptosporidium parvum isolates directly from stool samples.

    Science.gov (United States)

    Hadfield, Stephen J; Pachebat, Justin A; Swain, Martin T; Robinson, Guy; Cameron, Simon Js; Alexander, Jenna; Hegarty, Matthew J; Elwin, Kristin; Chalmers, Rachel M

    2015-08-29

    Whole genome sequencing (WGS) of Cryptosporidium spp. has previously relied on propagation of the parasite in animals to generate enough oocysts from which to extract DNA of sufficient quantity and purity for analysis. We have developed and validated a method for preparation of genomic Cryptosporidium DNA suitable for WGS directly from human stool samples and used it to generate 10 high-quality whole Cryptosporidium genome assemblies. Our method uses a combination of salt flotation, immunomagnetic separation (IMS), and surface sterilisation of oocysts prior to DNA extraction, with subsequent use of the transposome-based Nextera XT kit to generate libraries for sequencing on Illumina platforms. IMS was found to be superior to caesium chloride density centrifugation for purification of oocysts from small volume stool samples and for reducing levels of contaminant DNA. The IMS-based method was used initially to sequence whole genomes of Cryptosporidium hominis gp60 subtype IbA10G2 and Cryptosporidium parvum gp60 subtype IIaA19G1R2 from small amounts of stool left over from diagnostic testing of clinical cases of cryptosporidiosis. The C. parvum isolate was sequenced to a mean depth of 51.8X with reads covering 100 % of the bases of the C. parvum Iowa II reference genome (Bioproject PRJNA 15586), while the C. hominis isolate was sequenced to a mean depth of 34.7X with reads covering 98 % of the bases of the C. hominis TU502 v1 reference genome (Bioproject PRJNA 15585). The method was then applied to a further 17 stools, successfully generating another eight new whole genome sequences, of which two were C. hominis (gp60 subtypes IbA10G2 and IaA14R3) and six C. parvum (gp60 subtypes IIaA15G2R1 from three samples, and one each of IIaA17G1R1, IIaA18G2R1, and IIdA22G1), demonstrating the utility of this method to sequence Cryptosporidium genomes directly from clinical samples. This development is especially important as it reduces the requirement to propagate

  2. Whole-genome pyrosequencing of an epidemic multidrug-resistant Acinetobacter baumannii strain belonging to the European clone II group

    DEFF Research Database (Denmark)

    Iacono, M.; Villa, L.; Fortini, D.

    2008-01-01

    The whole-genome sequence of an epidemic, multidrug-resistant Acinetobacter baumannii strain (strain ACICU) belonging to the European clone II group and carrying the plasmid-mediated bla(OXA-58) carbapenem resistance gene was determined. The A. baumannii ACICU genome was compared with the genomes...... of A. baumannii ATCC 17978 and Acinetobacter baylyi ADP1, with the aim of identifying novel genes related to virulence and d