WorldWideScience

Sample records for whole-genome genotyping arrays

  1. Assessing the utility of whole-genome amplified serum DNA for array-based high throughput genotyping.

    Science.gov (United States)

    Bucasas, Kristine L; Pandya, Gagan A; Pradhan, Sonal; Fleischmann, Robert D; Peterson, Scott N; Belmont, John W

    2009-12-18

    Whole genome amplification (WGA) offers new possibilities for genome-wide association studies where limited DNA samples have been collected. This study provides a realistic and high-precision assessment of WGA DNA genotyping performance from 20-year old archived serum samples using the Affymetrix Genome-Wide Human SNP Array 6.0 (SNP6.0) platform. Whole-genome amplified (WGA) DNA samples from 45 archived serum replicates and 5 fresh sera paired with non-amplified genomic DNA were genotyped in duplicate. All genotyped samples passed the imposed QC thresholds for quantity and quality. In general, WGA serum DNA samples produced low call rates (45.00 +/- 2.69%), although reproducibility for successfully called markers was favorable (concordance = 95.61 +/- 4.39%). Heterozygote dropouts explained the majority (>85% in technical replicates, 50% in paired genomic/serum samples) of discordant results. Genotyping performance on WGA serum DNA samples was improved by implementation of Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) algorithm but at the loss of many samples which failed to pass its quality threshold. Poor genotype clustering was evident in the samples that failed the CRLMM confidence threshold. We conclude that while it is possible to extract genomic DNA and subsequently perform whole-genome amplification from archived serum samples, WGA serum DNA did not perform well and appeared unsuitable for high-resolution genotyping on these arrays.

  2. Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data.

    Directory of Open Access Journals (Sweden)

    Davoud Torkamaneh

    Full Text Available Genotyping-by-sequencing (GBS represents a highly cost-effective high-throughput genotyping approach. By nature, however, GBS is subject to generating sizeable amounts of missing data and these will need to be imputed for many downstream analyses. The extent to which such missing data can be tolerated in calling SNPs has not been explored widely. In this work, we first explore the use of imputation to fill in missing genotypes in GBS datasets. Importantly, we use whole genome resequencing data to assess the accuracy of the imputed data. Using a panel of 301 soybean accessions, we show that over 62,000 SNPs could be called when tolerating up to 80% missing data, a five-fold increase over the number called when tolerating up to 20% missing data. At all levels of missing data examined (between 20% and 80%, the resulting SNP datasets were of uniformly high accuracy (96-98%. We then used imputation to combine complementary SNP datasets derived from GBS and a SNP array (SoySNP50K. We thus produced an enhanced dataset of >100,000 SNPs and the genotypes at the previously untyped loci were again imputed with a high level of accuracy (95%. Of the >4,000,000 SNPs identified through resequencing 23 accessions (among the 301 used in the GBS analysis, 1.4 million tag SNPs were used as a reference to impute this large set of SNPs on the entire panel of 301 accessions. These previously untyped loci could be imputed with around 90% accuracy. Finally, we used the 100K SNP dataset (GBS + SoySNP50K to perform a GWAS on seed oil content within this collection of soybean accessions. Both the number of significant marker-trait associations and the peak significance levels were improved considerably using this enhanced catalog of SNPs relative to a smaller catalog resulting from GBS alone at ≤20% missing data. Our results demonstrate that imputation can be used to fill in both missing genotypes and untyped loci with very high accuracy and that this leads to more

  3. Development of a dense SNP-based linkage map of an apple rootstock progeny using the Malus Infinium whole genome genotyping array

    Directory of Open Access Journals (Sweden)

    Antanaviciute Laima

    2012-05-01

    Full Text Available Abstract Background A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Results Of the 7,867 Malus SNP markers on the array, 1,823 (23.2% were heterozygous in one of the two parents of the progeny, 1,007 (12.8% were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the ‘Golden Delicious’ genome sequence. A total of 311 markers (13.7% of all mapped markers mapped to positions that conflicted with their predicted positions on the ‘Golden Delicious’ pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. Conclusions We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a

  4. Optimized design and assessment of whole genome tiling arrays.

    NARCIS (Netherlands)

    Graf, S.; Nielsen, F.G.G.; Kurtz, S.; Huynen, M.A.; Birney, E.; Stunnenberg, H.G.; Flicek, P.

    2007-01-01

    MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling

  5. Whole genome amplification and its impact on CGH array profiles

    Directory of Open Access Journals (Sweden)

    Meldrum Cliff

    2008-07-01

    Full Text Available Abstract Background Some array comparative genomic hybridisation (array CGH platforms require a minimum of micrograms of DNA for the generation of reliable and reproducible data. For studies where there are limited amounts of genetic material, whole genome amplification (WGA is an attractive method for generating sufficient quantities of genomic material from miniscule amounts of starting material. A range of WGA methods are available and the multiple displacement amplification (MDA approach has been shown to be highly accurate, although amplification bias has been reported. In the current study, WGA was used to amplify DNA extracted from whole blood. In total, six array CGH experiments were performed to investigate whether the use of whole genome amplified DNA (wgaDNA produces reliable and reproducible results. Four experiments were conducted on amplified DNA compared to unamplified DNA and two experiments on unamplified DNA compared to unamplified DNA. Findings All the experiments involving wgaDNA resulted in a high proportion of losses and gains of genomic material. Previously, amplification bias has been overcome by using amplified DNA in both the test and reference DNA. Our data suggests that this approach may not be effective, as the gains and losses introduced by WGA appears to be random and are not reproducible between different experiments using the same DNA. Conclusion In light of these findings, the use of both amplified test and reference DNA on CGH arrays may not provide an accurate representation of copy number variation in the DNA.

  6. Development and validation of a 20K single nucleotide polymorphism (SNP whole genome genotyping array for apple (Malus × domestica Borkh.

    Directory of Open Access Journals (Sweden)

    Luca Bianco

    Full Text Available High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus. A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs. Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs.

  7. Comparative analysis of copy number detection by whole-genome BAC and oligonucleotide array CGH

    Directory of Open Access Journals (Sweden)

    Bejjani Bassem A

    2010-06-01

    Full Text Available Abstract Background Microarray-based comparative genomic hybridization (aCGH is a powerful diagnostic tool for the detection of DNA copy number gains and losses associated with chromosome abnormalities, many of which are below the resolution of conventional chromosome analysis. It has been presumed that whole-genome oligonucleotide (oligo arrays identify more clinically significant copy-number abnormalities than whole-genome bacterial artificial chromosome (BAC arrays, yet this has not been systematically studied in a clinical diagnostic setting. Results To determine the difference in detection rate between similarly designed BAC and oligo arrays, we developed whole-genome BAC and oligonucleotide microarrays and validated them in a side-by-side comparison of 466 consecutive clinical specimens submitted to our laboratory for aCGH. Of the 466 cases studied, 67 (14.3% had a copy-number imbalance of potential clinical significance detectable by the whole-genome BAC array, and 73 (15.6% had a copy-number imbalance of potential clinical significance detectable by the whole-genome oligo array. However, because both platforms identified copy number variants of unclear clinical significance, we designed a systematic method for the interpretation of copy number alterations and tested an additional 3,443 cases by BAC array and 3,096 cases by oligo array. Of those cases tested on the BAC array, 17.6% were found to have a copy-number abnormality of potential clinical significance, whereas the detection rate increased to 22.5% for the cases tested by oligo array. In addition, we validated the oligo array for detection of mosaicism and found that it could routinely detect mosaicism at levels of 30% and greater. Conclusions Although BAC arrays have faster turnaround times, the increased detection rate of oligo arrays makes them attractive for clinical cytogenetic testing.

  8. Spiked GBS: a unified, open platform for single marker genotyping and whole-genome profiling.

    Science.gov (United States)

    Rife, Trevor W; Wu, Shuangye; Bowden, Robert L; Poland, Jesse A

    2015-03-28

    In plant breeding, there are two primary applications for DNA markers in selection: 1) selection of known genes using a single marker assay (marker-assisted selection; MAS); and 2) whole-genome profiling and prediction (genomic selection; GS). Typically, marker platforms have addressed only one of these objectives. We have developed spiked genotyping-by-sequencing (sGBS), which combines targeted amplicon sequencing with reduced representation genotyping-by-sequencing. To minimize the cost of targeted assays, we utilize a small percent of sequencing capacity available in runs of GBS libraries to "spike" amplified targets of a priori alleles tagged with a different set of unique barcodes. This open platform allows multiple, single-target loci to be assayed while simultaneously generating a whole-genome profile. This dual-genotyping approach allows different sets of samples to be evaluated for single markers or whole genome-profiling. Here, we report the application of sGBS on a winter wheat panel that was screened for converted KASP markers and newly-designed markers targeting known polymorphisms in the leaf rust resistance gene Lr34. The flexibility and low-cost of sGBS will enable a range of applications across genetics research. Specifically in breeding applications, the sGBS approach will allow breeders to obtain a whole-genome profile of important individuals while simultaneously targeting specific genes for a range of selection strategies across the breeding program.

  9. Whole genome sequencing of Saccharomyces cerevisiae: from genotype to phenotype for improved metabolic engineering applications

    Directory of Open Access Journals (Sweden)

    Asadollahi Mohammad A

    2010-12-01

    genotype to phenotype correlations are manifested post-transcriptionally or post-translationally either through protein concentration and/or function. Conclusions With an intensifying need for microbial cell factories that produce a wide array of target compounds, whole genome high-throughput sequencing and annotation for SNP detection can aid in better reducing and defining the metabolic landscape. This work demonstrates direct correlations between genotype and phenotype that provides clear and high-probability of success metabolic engineering targets. The genome sequence, annotation, and a SNP viewer of CEN.PK113-7D are deposited at http://www.sysbio.se/cenpk.

  10. Genomic variation by whole-genome SNP mapping arrays predicts time-to-event outcome in patients with chronic lymphocytic leukemia: a comparison of CLL and HapMap genotypes.

    Science.gov (United States)

    Schweighofer, Carmen D; Coombes, Kevin R; Majewski, Tadeusz; Barron, Lynn L; Lerner, Susan; Sargent, Rachel L; O'Brien, Susan; Ferrajoli, Alessandra; Wierda, William G; Czerniak, Bogdan A; Medeiros, L Jeffrey; Keating, Michael J; Abruzzo, Lynne V

    2013-03-01

    Genomic abnormalities, such as deletions in 11q22 or 17p13, are associated with poorer prognosis in patients with chronic lymphocytic leukemia (CLL). We hypothesized that unknown regions of copy number variation (CNV) affect clinical outcome and can be detected by array-based single-nucleotide polymorphism (SNP) genotyping. We compared SNP genotypes from 168 untreated patients with CLL with genotypes from 73 white HapMap controls. We identified 322 regions of recurrent CNV, 82 of which occurred significantly more often in CLL than in HapMap (CLL-specific CNV), including regions typically aberrant in CLL: deletions in 6q21, 11q22, 13q14, and 17p13 and trisomy 12. In univariate analyses, 35 of total and 11 of CLL-specific CNVs were associated with unfavorable time-to-event outcomes, including gains or losses in chromosomes 2p, 4p, 4q, 6p, 6q, 7q, 11p, 11q, and 17p. In multivariate analyses, six CNVs (ie, CLL-specific variations in 11p15.1-15.4 or 6q27) predicted time-to-treatment or overall survival independently of established markers of prognosis. Moreover, genotypic complexity (ie, the number of independent CNVs per patient) significantly predicted prognosis, with a median time-to-treatment of 64 months versus 23 months in patients with zero to one versus two or more CNVs, respectively (P = 3.3 × 10(-8)). In summary, a comparison of SNP genotypes from patients with CLL with HapMap controls allowed us to identify known and unknown recurrent CNVs and to determine regions and rates of CNV that predict poorer prognosis in patients with CLL. Copyright © 2013 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  11. Refining QTL with high-density SNP genotyping and whole genome sequence in three cattle breeds

    DEFF Research Database (Denmark)

    Sahana, Goutam; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2012-01-01

    method. Principal components were used to account for population structure. The QTL segregating in all three breeds were selected and a few of the most significant ones were followed in further analyses. The polymorphisms in the identified QTL regions were imputed using 90 whole genome sequences......Genome-wide association study was carried out in Nordic Holsteins, Nordic Red and Jersey breeds for functional traits using BovineHD Genotyping BreadChip (Illumina, San Diego, CA). The association analyses were carried out using both linear mixed model approach and a Bayesian variable selection...

  12. Revisiting the genotyping scheme for varicella-zoster viruses based on whole-genome comparisons.

    Science.gov (United States)

    Jensen, Nancy J; Rivailler, Pierre; Tseng, Hung Fu; Quinlivan, Mark L; Radford, Kay; Folster, Jennifer; Harpaz, Rafael; LaRussa, Philip; Jacobsen, Steven; Scott Schmid, D

    2017-06-01

    We report whole-genome sequences (WGSs) for four varicella-zoster virus (VZV) samples from a shingles study conducted by Kaiser Permanente of Southern California. Comparative genomics and phylogenetic analysis of all published VZV WGSs revealed that strain KY037798 is in clade IX, which shall henceforth be designated clade 9. Previously published single nucleotide polymorphisms (SNP)-based genotyping schemes fail to discriminate between clades 6 and VIII and employ positions that are not clade-specific. We provide an updated list of clade-specific positions that supersedes the list determined at the 2008 VZV nomenclature meeting. Finally, we propose a new targeted genotyping scheme that will discriminate the circulating VZV clades with at least a twofold redundancy. Genotyping strategies using a limited set of targeted SNPs will continue to provide an efficient 'first pass' method for VZV strain surveillance as vaccination programmes for varicella and zoster influence the dynamics of VZV transmission.

  13. Effects of DNA mass on multiple displacement whole genome amplification and genotyping performance

    Directory of Open Access Journals (Sweden)

    Haque Kashif A

    2005-09-01

    Full Text Available Abstract Background Whole genome amplification (WGA promises to eliminate practical molecular genetic analysis limitations associated with genomic DNA (gDNA quantity. We evaluated the performance of multiple displacement amplification (MDA WGA using gDNA extracted from lymphoblastoid cell lines (N = 27 with a range of starting gDNA input of 1–200 ng into the WGA reaction. Yield and composition analysis of whole genome amplified DNA (wgaDNA was performed using three DNA quantification methods (OD, PicoGreen® and RT-PCR. Two panels of N = 15 STR (using the AmpFlSTR® Identifiler® panel and N = 49 SNP (TaqMan® genotyping assays were performed on each gDNA and wgaDNA sample in duplicate. gDNA and wgaDNA masses of 1, 4 and 20 ng were used in the SNP assays to evaluate the effects of DNA mass on SNP genotyping assay performance. A total of N = 6,880 STR and N = 56,448 SNP genotype attempts provided adequate power to detect differences in STR and SNP genotyping performance between gDNA and wgaDNA, and among wgaDNA produced from a range of gDNA templates inputs. Results The proportion of double-stranded wgaDNA and human-specific PCR amplifiable wgaDNA increased with increased gDNA input into the WGA reaction. Increased amounts of gDNA input into the WGA reaction improved wgaDNA genotyping performance. Genotype completion or genotype concordance rates of wgaDNA produced from all gDNA input levels were observed to be reduced compared to gDNA, although the reduction was not always statistically significant. Reduced wgaDNA genotyping performance was primarily due to the increased variance of allelic amplification, resulting in loss of heterozygosity or increased undetermined genotypes. MDA WGA produces wgaDNA from no template control samples; such samples exhibited substantial false-positive genotyping rates. Conclusion The amount of gDNA input into the MDA WGA reaction is a critical determinant of genotyping performance of wgaDNA. At least 10 ng of

  14. Whole-genome array CGH evaluation for replacing prenatal karyotyping in Hong Kong.

    Directory of Open Access Journals (Sweden)

    Anita S Y Kan

    Full Text Available OBJECTIVE: To evaluate the effectiveness of whole-genome array comparative genomic hybridization (aCGH in prenatal diagnosis in Hong Kong. METHODS: Array CGH was performed on 220 samples recruited prospectively as the first-tier test study. In addition 150 prenatal samples with abnormal fetal ultrasound findings found to have normal karyotypes were analyzed as a 'further-test' study using NimbleGen CGX-135K oligonucleotide arrays. RESULTS: Array CGH findings were concordant with conventional cytogenetic results with the exception of one case of triploidy. It was found in the first-tier test study that aCGH detected 20% (44/220 clinically significant copy number variants (CNV, of which 21 were common aneuploidies and 23 had other chromosomal imbalances. There were 3.2% (7/220 samples with CNVs detected by aCGH but not by conventional cytogenetics. In the 'further-test' study, the additional diagnostic yield of detecting chromosome imbalance was 6% (9/150. The overall detection for CNVs of unclear clinical significance was 2.7% (10/370 with 0.9% found to be de novo. Eleven loci of common CNVs were found in the local population. CONCLUSION: Whole-genome aCGH offered a higher resolution diagnostic capacity than conventional karyotyping for prenatal diagnosis either as a first-tier test or as a 'further-test' for pregnancies with fetal ultrasound anomalies. We propose replacing conventional cytogenetics with aCGH for all pregnancies undergoing invasive diagnostic procedures after excluding common aneuploidies and triploidies by quantitative fluorescent PCR. Conventional cytogenetics can be reserved for visualization of clinically significant CNVs.

  15. A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels.

    Science.gov (United States)

    Faux, Pierre; Druet, Tom

    2017-05-16

    Haplotype reconstruction (phasing) is an essential step in many applications, including imputation and genomic selection. The best phasing methods rely on both familial and linkage disequilibrium (LD) information. With whole-genome sequence (WGS) data, relatively small samples of reference individuals are generally sequenced due to prohibitive sequencing costs, thus only a limited amount of familial information is available. However, reference individuals have many relatives that have been genotyped (at lower density). The goal of our study was to improve phasing of WGS data by integrating familial information from haplotypes that were obtained from a larger genotyped dataset and to quantify its impact on imputation accuracy. Aligning a pre-phased WGS panel [~5 million single nucleotide polymorphisms (SNPs)], which is based on LD information only, to a 50k SNP array that is phased with both LD and familial information (called scaffold) resulted in correctly assigning parental origin for 99.62% of the WGS SNPs, their phase being determined unambiguously based on parental genotypes. Without using the 50k haplotypes as scaffold, that value dropped as expected to 50%. Correctly phased segments were on average longer after alignment to the genotype phase while the number of switches decreased slightly. Most of the incorrectly assigned segments, and subsequent switches, were due to singleton errors. Imputation from 50k SNP array to WGS data with improved phasing had a marginal impact on imputation accuracy (measured as r 2), i.e. on average, 90.47% with traditional techniques versus 90.65% with pre-phasing integrating familial information. Differences were larger for SNPs located in chromosome ends and rare variants. Using a denser WGS panel (~13 millions SNPs) that was obtained with traditional variant filtering rules, we found similar results although performances of both phasing and imputation accuracy were lower. We present a phasing strategy for WGS data, which

  16. Whole-Genome Sequencing and Concordance Between Antimicrobial Susceptibility Genotypes and Phenotypes of Bacterial Isolates Associated with Bovine Respiratory Disease

    Directory of Open Access Journals (Sweden)

    Joseph R. Owen

    2017-09-01

    Full Text Available Extended laboratory culture and antimicrobial susceptibility testing timelines hinder rapid species identification and susceptibility profiling of bacterial pathogens associated with bovine respiratory disease, the most prevalent cause of cattle mortality in the United States. Whole-genome sequencing offers a culture-independent alternative to current bacterial identification methods, but requires a library of bacterial reference genomes for comparison. To contribute new bacterial genome assemblies and evaluate genetic diversity and variation in antimicrobial resistance genotypes, whole-genome sequencing was performed on bovine respiratory disease–associated bacterial isolates (Histophilus somni, Mycoplasma bovis, Mannheimia haemolytica, and Pasteurella multocida from dairy and beef cattle. One hundred genomically distinct assemblies were added to the NCBI database, doubling the available genomic sequences for these four species. Computer-based methods identified 11 predicted antimicrobial resistance genes in three species, with none being detected in M. bovis. While computer-based analysis can identify antibiotic resistance genes within whole-genome sequences (genotype, it may not predict the actual antimicrobial resistance observed in a living organism (phenotype. Antimicrobial susceptibility testing on 64 H. somni, M. haemolytica, and P. multocida isolates had an overall concordance rate between genotype and phenotypic resistance to the associated class of antimicrobials of 72.7% (P < 0.001, showing substantial discordance. Concordance rates varied greatly among different antimicrobial, antibiotic resistance gene, and bacterial species combinations. This suggests that antimicrobial susceptibility phenotypes are needed to complement genomically predicted antibiotic resistance gene genotypes to better understand how the presence of antibiotic resistance genes within a given bacterial species could potentially impact optimal bovine respiratory

  17. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  18. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+").

    Science.gov (United States)

    Antwerpen, Markus H; Prior, Karola; Mellmann, Alexander; Höppner, Sebastian; Splettstoesser, Wolf D; Harmsen, Dag

    2015-01-01

    The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  19. TGS-TB: Total Genotyping Solution for Mycobacterium tuberculosis Using Short-Read Whole-Genome Sequencing

    Science.gov (United States)

    Sekizuka, Tsuyoshi; Yamashita, Akifumi; Murase, Yoshiro; Iwamoto, Tomotada; Mitarai, Satoshi; Kato, Seiya; Kuroda, Makoto

    2015-01-01

    Whole-genome sequencing (WGS) with next-generation DNA sequencing (NGS) is an increasingly accessible and affordable method for genotyping hundreds of Mycobacterium tuberculosis (Mtb) isolates, leading to more effective epidemiological studies involving single nucleotide variations (SNVs) in core genomic sequences based on molecular evolution. We developed an all-in-one web-based tool for genotyping Mtb, referred to as the Total Genotyping Solution for TB (TGS-TB), to facilitate multiple genotyping platforms using NGS for spoligotyping and the detection of phylogenies with core genomic SNVs, IS6110 insertion sites, and 43 customized loci for variable number tandem repeat (VNTR) through a user-friendly, simple click interface. This methodology is implemented with a KvarQ script to predict MTBC lineages/sublineages and potential antimicrobial resistance. Seven Mtb isolates (JP01 to JP07) in this study showing the same VNTR profile were accurately discriminated through median-joining network analysis using SNVs unique to those isolates. An additional IS6110 insertion was detected in one of those isolates as supportive genetic information in addition to core genomic SNVs. The results of in silico analyses using TGS-TB are consistent with those obtained using conventional molecular genotyping methods, suggesting that NGS short reads could provide multiple genotypes to discriminate multiple strains of Mtb, although longer NGS reads (≥300-mer) will be required for full genotyping on the TGS-TB web site. Most available short reads (~100-mer) can be utilized to discriminate the isolates based on the core genome phylogeny. TGS-TB provides a more accurate and discriminative strain typing for clinical and epidemiological investigations; NGS strain typing offers a total genotyping solution for Mtb outbreak and surveillance. TGS-TB web site: https://gph.niid.go.jp/tgs-tb/. PMID:26565975

  20. TGS-TB: Total Genotyping Solution for Mycobacterium tuberculosis Using Short-Read Whole-Genome Sequencing.

    Science.gov (United States)

    Sekizuka, Tsuyoshi; Yamashita, Akifumi; Murase, Yoshiro; Iwamoto, Tomotada; Mitarai, Satoshi; Kato, Seiya; Kuroda, Makoto

    2015-01-01

    Whole-genome sequencing (WGS) with next-generation DNA sequencing (NGS) is an increasingly accessible and affordable method for genotyping hundreds of Mycobacterium tuberculosis (Mtb) isolates, leading to more effective epidemiological studies involving single nucleotide variations (SNVs) in core genomic sequences based on molecular evolution. We developed an all-in-one web-based tool for genotyping Mtb, referred to as the Total Genotyping Solution for TB (TGS-TB), to facilitate multiple genotyping platforms using NGS for spoligotyping and the detection of phylogenies with core genomic SNVs, IS6110 insertion sites, and 43 customized loci for variable number tandem repeat (VNTR) through a user-friendly, simple click interface. This methodology is implemented with a KvarQ script to predict MTBC lineages/sublineages and potential antimicrobial resistance. Seven Mtb isolates (JP01 to JP07) in this study showing the same VNTR profile were accurately discriminated through median-joining network analysis using SNVs unique to those isolates. An additional IS6110 insertion was detected in one of those isolates as supportive genetic information in addition to core genomic SNVs. The results of in silico analyses using TGS-TB are consistent with those obtained using conventional molecular genotyping methods, suggesting that NGS short reads could provide multiple genotypes to discriminate multiple strains of Mtb, although longer NGS reads (≥ 300-mer) will be required for full genotyping on the TGS-TB web site. Most available short reads (~100-mer) can be utilized to discriminate the isolates based on the core genome phylogeny. TGS-TB provides a more accurate and discriminative strain typing for clinical and epidemiological investigations; NGS strain typing offers a total genotyping solution for Mtb outbreak and surveillance. TGS-TB web site: https://gph.niid.go.jp/tgs-tb/.

  1. Axiom turkey genotyping array

    Science.gov (United States)

    The Axiom®Turkey Genotyping Array interrogates 643,845 probesets on the array, covering 643,845 SNPs. The array development was led by Dr. Julie Long of the USDA-ARS Beltsville Agricultural Research Center under a public-private partnership with Hendrix Genetics, Aviagen, and Affymetrix. The Turk...

  2. R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips.

    Science.gov (United States)

    Ritchie, Matthew E; Carvalho, Benilton S; Hetrick, Kurt N; Tavaré, Simon; Irizarry, Rafael A

    2009-10-01

    Illumina produces a number of microarray-based technologies for human genotyping. An Infinium BeadChip is a two-color platform that types between 10(5) and 10(6) single nucleotide polymorphisms (SNPs) per sample. Despite being widely used, there is a shortage of open source software to process the raw intensities from this platform into genotype calls. To this end, we have developed the R/Bioconductor package crlmm for analyzing BeadChip data. After careful preprocessing, our software applies the CRLMM algorithm to produce genotype calls, confidence scores and other quality metrics at both the SNP and sample levels. We provide access to the raw summary-level intensity data, allowing users to develop their own methods for genotype calling or copy number analysis if they wish. The crlmm Bioconductor package is available from http://www.bioconductor.org. Data packages and documentation are available from http://rafalab.jhsph.edu/software.html.

  3. Examination of Mycobacterium avium subspecies paratuberculosis mixed genotype infections in dairy animals using a whole genome sequencing approach

    Directory of Open Access Journals (Sweden)

    Fraser W. Davidson

    2016-12-01

    Full Text Available Many pathogenic mycobacteria are known to cause severe disease in humans and animals. M. avium subspecies paratuberculosis (Map is the causative agent of Johne’s disease—a chronic wasting disease affecting ruminants such as cattle and sheep, responsible for significant economic losses in the dairy and beef industries. Due to the lack of treatment options or effective vaccines, mitigating losses can be difficult. In addition, the early stages of Map infection may occur in asymptomatic hosts that continue to shed viable bacteria in their faeces, leading to the infection of other healthy animals. Using multi-locus short sequence repeat (ML-SSR analysis we previously reported that individual Johne’s positive dairy cattle from farms across the island of Newfoundland were infected by Map with multiple SSR-types simultaneously. The occurrence of multiple mixed genotype infections has the potential to change pathogen and disease dynamics as well as reduce the efficacy of treatments and vaccines. Therefore, we conducted whole genome sequencing (WGS and single nucleotide polymorphism (SNP analysis on a subset of these isolates for a more in-depth examination. We also implemented a PCR assay using two discriminatory SNPs and demonstrated the incidence of a mixed infection by three genotypically diverse Map isolates in a single animal. In addition, results show that WGS and SNP analysis can provide a better understanding of the relationship between Map isolates from individual and different animals. In the future such studies on the occurrence of mixed genotype infections could potentially lead to the identification of variable pathogenicity of different genotypes and allow for better tracking of Map isolates for epidemiological studies.

  4. Implementation of High Resolution Whole Genome Array CGH in the Prenatal Clinical Setting: Advantages, Challenges, and Review of the Literature

    Directory of Open Access Journals (Sweden)

    Paola Evangelidou

    2013-01-01

    Full Text Available Array Comparative Genomic Hybridization analysis is replacing postnatal chromosomal analysis in cases of intellectual disabilities, and it has been postulated that it might also become the first-tier test in prenatal diagnosis. In this study, array CGH was applied in 64 prenatal samples with whole genome oligonucleotide arrays (BlueGnome, Ltd. on DNA extracted from chorionic villi, amniotic fluid, foetal blood, and skin samples. Results were confirmed with Fluorescence In Situ Hybridization or Real-Time PCR. Fifty-three cases had normal karyotype and abnormal ultrasound findings, and seven samples had balanced rearrangements, five of which also had ultrasound findings. The value of array CGH in the characterization of previously known aberrations in five samples is also presented. Seventeen out of 64 samples carried copy number alterations giving a detection rate of 26.5%. Ten of these represent benign or variables of unknown significance, giving a diagnostic capacity of the method to be 10.9%. If karyotype is performed the additional diagnostic capacity of the method is 5.1% (3/59. This study indicates the ability of array CGH to identify chromosomal abnormalities which cannot be detected during routine prenatal cytogenetic analysis, therefore increasing the overall detection rate. In addition a thorough review of the literature is presented.

  5. Whole genome sequencing of Saccharomyces cerevisiae: from genotype to phenotype for improved metabolic engineering applications

    DEFF Research Database (Denmark)

    Otero, José Manuel; Vongsangnak, Wanwipa; Asadollahi, Mohammadali

    2010-01-01

    BACKGROUND: The need for rapid and efficient microbial cell factory design and construction are possible through the enabling technology, metabolic engineering, which is now being facilitated by systems biology approaches. Metabolic engineering is often complimented by directed evolution, where s...... that provides clear and high-probability of success metabolic engineering targets. The genome sequence, annotation, and a SNP viewer of CEN.PK113-7D are deposited at http://www.sysbio.se/cenpk.......BACKGROUND: The need for rapid and efficient microbial cell factory design and construction are possible through the enabling technology, metabolic engineering, which is now being facilitated by systems biology approaches. Metabolic engineering is often complimented by directed evolution, where...... selective pressure is applied to a partially genetically engineered strain to confer a desirable phenotype. The exact genetic modification or resulting genotype that leads to the improved phenotype is often not identified or understood to enable further metabolic engineering. RESULTS: In this work we...

  6. Whole-genome characterization of a Peruvian alpaca rotavirus isolate expressing a novel VP4 genotype.

    Science.gov (United States)

    Rojas, Miguel; Gonçalves, Jorge Luiz S; Dias, Helver G; Manchego, Alberto; Pezo, Danilo; Santos, Norma

    2016-11-30

    The SA44 isolate of Rotavirus A (RVA) was identified from a neonatal Peruvian alpaca presenting with diarrhea, and the full-length genome sequence of the isolate (designated RVA/Alpaca-tc/PER/SA44/2014/G3P[40]) was determined. Phylogenetic analyses showed that the isolate possessed the genotype constellation G3-P[40]-I8-R3-C3-M3-A9-N3-T3-E3-H6, which differs considerably from those of RVA strains isolated from other species of the order Artiodactyla. Overall, the genetic constellation of the SA44 strain was quite similar to those of RVA strains isolated from a bat in Asia (MSLH14 and MYAS33). Nonetheless, phylogenetic analyses of each genome segment identified a distinct combination of genes. Several sequences were closely related to corresponding gene sequences in RVA strains from other species, including human (VP1, VP2, NSP1, and NSP2), simian (VP3 and NSP5), bat (VP6 and NSP4), and equine (NSP3). The VP7 gene sequence was closely related to RVA strains from a Peruvian alpaca (K'ayra/3368-10; 99.0% nucleotide and 99.7% amino acid identity) and from humans (RCH272; 95% nucleotide and 99.0% amino acid identity). The nucleotide sequence of the VP4 gene was distantly related to other VP4 sequences and was designated as the reference strain for the new P[40] genotype. This unique genetic makeup suggests that the SA44 strain emerged from multiple reassortment events between bat-, equine-, and human-like RVA strains. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Whole-genome microarrays of fission yeast: characteristics, accuracy, reproducibility, and processing of array data

    Directory of Open Access Journals (Sweden)

    Chen Dongrong

    2003-07-01

    Full Text Available Abstract Background The genome of the fission yeast Schizosaccharomyces pombe has recently been sequenced, setting the stage for the post-genomic era of this increasingly popular model organism. We have built fission yeast microarrays, optimised protocols to improve array performance, and carried out experiments to assess various characteristics of microarrays. Results We designed PCR primers to amplify specific probes (180–500 bp for all known and predicted fission yeast genes, which are printed in duplicate onto separate regions of glass slides together with control elements (~13,000 spots/slide. Fluorescence signal intensities depended on the size and intragenic position of the array elements, whereas the signal ratios were largely independent of element properties. Only the coding strand is covalently linked to the slides, and our array elements can discriminate transcriptional direction. The microarrays can distinguish sequences with up to 70% identity, above which cross-hybridisation contributes to the signal intensity. We tested the accuracy of signal ratios and measured the reproducibility of array data caused by biological and technical factors. Because the technical variability is lower, it is best to use samples prepared from independent biological experiments to obtain repeated measurements with swapping of fluorochromes to prevent dye bias. We also developed a script that discards unreliable data and performs a normalization to correct spatial artefacts. Conclusions This paper provides data for several microarray properties that are rarely measured. The results define critical parameters for microarray design and experiments and provide a framework to optimise and interpret array data. Our arrays give reproducible and accurate expression ratios with high sensitivity. The scripts for primer design and initial data processing as well as primer sequences and detailed protocols are available from our website.

  8. A whole genome SNP genotyping by DNA microarray and candidate gene association study for kidney stone disease

    Science.gov (United States)

    2014-01-01

    Background Kidney stone disease (KSD) is a complex disorder with unknown etiology in majority of the patients. Genetic and environmental factors may cause the disease. In the present study, we used DNA microarray to genotype single nucleotide polymorphisms (SNP) and performed candidate gene association analysis to determine genetic variations associated with the disease. Methods A whole genome SNP genotyping by DNA microarray was initially conducted in 101 patients and 105 control subjects. A set of 104 candidate genes reported to be involved in KSD, gathered from public databases and candidate gene association study databases, were evaluated for their variations associated with KSD. Results Altogether 82 SNPs distributed within 22 candidate gene regions showed significant differences in SNP allele frequencies between the patient and control groups (P AHSG, CD44, and HAO1, encoding osteocalcin, fetuin-A, CD44-molecule and glycolate oxidase 1, respectively, were further assessed for their associations with the disease because they carried high proportion of SNPs with statistical differences of allele frequencies between the patient and control groups within the gene. The total of 26 SNPs showed significant differences of allele frequencies between the patient and control groups and haplotypes associated with disease risk were identified. The SNP rs759330 located 144 bp downstream of BGLAP where it is a predicted microRNA binding site at 3′UTR of PAQR6 – a gene encoding progestin and adipoQ receptor family member VI, was genotyped in 216 patients and 216 control subjects and found to have significant differences in its genotype and allele frequencies (P = 0.0007, OR 2.02 and P = 0.0001, OR 2.02, respectively). Conclusions Our results suggest that these candidate genes are associated with KSD and PAQR6 comes into our view as the most potent candidate since associated SNP rs759330 is located in the miRNA binding site and may affect mRNA expression level

  9. Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study.

    Directory of Open Access Journals (Sweden)

    Andreas Roetzer

    Full Text Available BACKGROUND: Understanding Mycobacterium tuberculosis (Mtb transmission is essential to guide efficient tuberculosis control strategies. Traditional strain typing lacks sufficient discriminatory power to resolve large outbreaks. Here, we tested the potential of using next generation genome sequencing for identification of outbreak-related transmission chains. METHODS AND FINDINGS: During long-term (1997 to 2010 prospective population-based molecular epidemiological surveillance comprising a total of 2,301 patients, we identified a large outbreak caused by an Mtb strain of the Haarlem lineage. The main performance outcome measure of whole genome sequencing (WGS analyses was the degree of correlation of the WGS analyses with contact tracing data and the spatio-temporal distribution of the outbreak cases. WGS analyses of the 86 isolates revealed 85 single nucleotide polymorphisms (SNPs, subdividing the outbreak into seven genome clusters (two to 24 isolates each, plus 36 unique SNP profiles. WGS results showed that the first outbreak isolates detected in 1997 were falsely clustered by classical genotyping. In 1998, one clone (termed "Hamburg clone" started expanding, apparently independently from differences in the social environment of early cases. Genome-based clustering patterns were in better accordance with contact tracing data and the geographical distribution of the cases than clustering patterns based on classical genotyping. A maximum of three SNPs were identified in eight confirmed human-to-human transmission chains, involving 31 patients. We estimated the Mtb genome evolutionary rate at 0.4 mutations per genome per year. This rate suggests that Mtb grows in its natural host with a doubling time of approximately 22 h (400 generations per year. Based on the genome variation discovered, emergence of the Hamburg clone was dated back to a period between 1993 and 1997, hence shortly before the discovery of the outbreak through epidemiological

  10. Whole Genome Sequencing

    Science.gov (United States)

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  11. Co-circulation of multiple subtypes of enterovirus A71 (EV- A71) genotype C, including novel recombinants characterised by use of whole genome sequencing (WGS), Denmark 2016.

    Science.gov (United States)

    Midgley, Sofie E; Nielsen, Astrid G; Trebbien, Ramona; Poulsen, Mille W; Andersen, Peter H; Fischer, Thea K

    2017-06-29

    In Europe, enterovirus A71 (EV-A71) has primarily been associated with sporadic cases of neurological disease. The recent emergence of new genotypes and larger outbreaks with severely ill patients demonstrates a potential for the spread of new, highly pathogenic EV-A71 strains. Detection and characterisation of these new emerging EV variants is challenging as standard EV assays may not be adequate, necessitating the use of whole genome analysis. This article is copyright of The Authors, 2017.

  12. Co-circulation of multiple subtypes of enterovirus A71 (EV- A71) genotype C, including novel recombinants characterised by use of whole genome sequencing (WGS), Denmark 2016

    DEFF Research Database (Denmark)

    Midgley, Sofie E; Nielsen, Astrid G; Trebbien, Ramona

    2017-01-01

    In Europe, enterovirus A71 (EV-A71) has primarily been associated with sporadic cases of neurological disease. The recent emergence of new genotypes and larger outbreaks with severely ill patients demonstrates a potential for the spread of new, highly pathogenic EV-A71 strains. Detection...... and characterisation of these new emerging EV variants is challenging as standard EV assays may not be adequate, necessitating the use of whole genome analysis....

  13. Copy number and loss of heterozygosity detected by SNP array of formalin-fixed tissues using whole-genome amplification.

    Directory of Open Access Journals (Sweden)

    Angela Stokes

    Full Text Available The requirement for large amounts of good quality DNA for whole-genome applications prohibits their use for small, laser capture micro-dissected (LCM, and/or rare clinical samples, which are also often formalin-fixed and paraffin-embedded (FFPE. Whole-genome amplification of DNA from these samples could, potentially, overcome these limitations. However, little is known about the artefacts introduced by amplification of FFPE-derived DNA with regard to genotyping, and subsequent copy number and loss of heterozygosity (LOH analyses. Using a ligation adaptor amplification method, we present data from a total of 22 Affymetrix SNP 6.0 experiments, using matched paired amplified and non-amplified DNA from 10 LCM FFPE normal and dysplastic oral epithelial tissues, and an internal method control. An average of 76.5% of SNPs were called in both matched amplified and non-amplified DNA samples, and concordance was a promising 82.4%. Paired analysis for copy number, LOH, and both combined, showed that copy number changes were reduced in amplified DNA, but were 99.5% concordant when detected, amplifications were the changes most likely to be 'missed', only 30% of non-amplified LOH changes were identified in amplified pairs, and when copy number and LOH are combined ∼50% of gene changes detected in the unamplified DNA were also detected in the amplified DNA and within these changes, 86.5% were concordant for both copy number and LOH status. However, there are also changes introduced as ∼20% of changes in the amplified DNA are not detected in the non-amplified DNA. An integrative network biology approach revealed that changes in amplified DNA of dysplastic oral epithelium localize to topologically critical regions of the human protein-protein interaction network, suggesting their functional implication in the pathobiology of this disease. Taken together, our results support the use of amplification of FFPE-derived DNA, provided sufficient samples are used

  14. Developing a 670k genotyping array to tag ~2M SNPs across 24 horse breeds.

    Science.gov (United States)

    Schaefer, Robert J; Schubert, Mikkel; Bailey, Ernest; Bannasch, Danika L; Barrey, Eric; Bar-Gal, Gila Kahila; Brem, Gottfried; Brooks, Samantha A; Distl, Ottmar; Fries, Ruedi; Finno, Carrie J; Gerber, Vinzenz; Haase, Bianca; Jagannathan, Vidhya; Kalbfleisch, Ted; Leeb, Tosso; Lindgren, Gabriella; Lopes, Maria Susana; Mach, Núria; da Câmara Machado, Artur; MacLeod, James N; McCoy, Annette; Metzger, Julia; Penedo, Cecilia; Polani, Sagi; Rieder, Stefan; Tammen, Imke; Tetens, Jens; Thaller, Georg; Verini-Supplizi, Andrea; Wade, Claire M; Wallner, Barbara; Orlando, Ludovic; Mickelson, James R; McCue, Molly E

    2017-07-27

    To date, genome-scale analyses in the domestic horse have been limited by suboptimal single nucleotide polymorphism (SNP) density and uneven genomic coverage of the current SNP genotyping arrays. The recent availability of whole genome sequences has created the opportunity to develop a next generation, high-density equine SNP array. Using whole genome sequence from 153 individuals representing 24 distinct breeds collated by the equine genomics community, we cataloged over 23 million de novo discovered genetic variants. Leveraging genotype data from individuals with both whole genome sequence, and genotypes from lower-density, legacy SNP arrays, a subset of ~5 million high-quality, high-density array candidate SNPs were selected based on breed representation and uniform spacing across the genome. Considering probe design recommendations from a commercial vendor (Affymetrix, now Thermo Fisher Scientific) a set of ~2 million SNPs were selected for a next-generation high-density SNP chip (MNEc2M). Genotype data were generated using the MNEc2M array from a cohort of 332 horses from 20 breeds and a lower-density array, consisting of ~670 thousand SNPs (MNEc670k), was designed for genotype imputation. Here, we document the steps taken to design both the MNEc2M and MNEc670k arrays, report genomic and technical properties of these genotyping platforms, and demonstrate the imputation capabilities of these tools for the domestic horse.

  15. Genotyping using whole-genome sequencing is a realistic alternative to surveillance based on phenotypic antimicrobial susceptibility testing

    DEFF Research Database (Denmark)

    Zankari, Ea; Hasman, Henrik; Kaas, Rolf Sommer

    2013-01-01

    Objectives: Antimicrobial susceptibility testing of bacterial isolates is essential for clinical diagnosis, to detect emerging problems and to guide empirical treatment. Current phenotypic procedures are sometimes associated with mistakes and may require further genetic testing. Whole-genome...... on 200 isolates originating from Danish pigs, covering four bacterial species. Genomic DNA was purified from all isolates and sequenced as paired-end reads on the Illumina platform. The web servers ResFinder and MLST (www.genomicepidemiology.org) were used to identify acquired antimicrobial resistance...... sequencing (WGS) may soon be within reach even for routine surveillance and clinical diagnostics. The aim of this study was to evaluate WGS as a routine tool for surveillance of antimicrobial resistance compared with current phenotypic procedures. Methods: Antimicrobial susceptibility tests were performed...

  16. Whole genome expression array profiling highlights differences in mucosal defense genes in Barrett's esophagus and esophageal adenocarcinoma.

    Directory of Open Access Journals (Sweden)

    Derek J Nancarrow

    Full Text Available Esophageal adenocarcinoma (EAC has become a major concern in Western countries due to rapid rises in incidence coupled with very poor survival rates. One of the key risk factors for the development of this cancer is the presence of Barrett's esophagus (BE, which is believed to form in response to repeated gastro-esophageal reflux. In this study we performed comparative, genome-wide expression profiling (using Illumina whole-genome Beadarrays on total RNA extracted from esophageal biopsy tissues from individuals with EAC, BE (in the absence of EAC and those with normal squamous epithelium. We combined these data with publically accessible raw data from three similar studies to investigate key gene and ontology differences between these three tissue states. The results support the deduction that BE is a tissue with enhanced glycoprotein synthesis machinery (DPP4, ATP2A3, AGR2 designed to provide strong mucosal defenses aimed at resisting gastro-esophageal reflux. EAC exhibits the enhanced extracellular matrix remodeling (collagens, IGFBP7, PLAU effects expected in an aggressive form of cancer, as well as evidence of reduced expression of genes associated with mucosal (MUC6, CA2, TFF1 and xenobiotic (AKR1C2, AKR1B10 defenses. When our results are compared to previous whole-genome expression profiling studies keratin, mucin, annexin and trefoil factor gene groups are the most frequently represented differentially expressed gene families. Eleven genes identified here are also represented in at least 3 other profiling studies. We used these genes to discriminate between squamous epithelium, BE and EAC within the two largest cohorts using a support vector machine leave one out cross validation (LOOCV analysis. While this method was satisfactory for discriminating squamous epithelium and BE, it demonstrates the need for more detailed investigations into profiling changes between BE and EAC.

  17. Whole genome transcription profiling of Anaplasma phagocytophilum in human and tick host cells by tiling array analysis

    Directory of Open Access Journals (Sweden)

    Chavez Adela

    2008-07-01

    Full Text Available Abstract Background Anaplasma phagocytophilum (Ap is an obligate intracellular bacterium and the agent of human granulocytic anaplasmosis, an emerging tick-borne disease. Ap alternately infects ticks and mammals and a variety of cell types within each. Understanding the biology behind such versatile cellular parasitism may be derived through the use of tiling microarrays to establish high resolution, genome-wide transcription profiles of the organism as it infects cell lines representative of its life cycle (tick; ISE6 and pathogenesis (human; HL-60 and HMEC-1. Results Detailed, host cell specific transcriptional behavior was revealed. There was extensive differential Ap gene transcription between the tick (ISE6 and the human (HL-60 and HMEC-1 cell lines, with far fewer differentially transcribed genes between the human cell lines, and all disproportionately represented by membrane or surface proteins. There were Ap genes exclusively transcribed in each cell line, apparent human- and tick-specific operons and paralogs, and anti-sense transcripts that suggest novel expression regulation processes. Seven virB2 paralogs (of the bacterial type IV secretion system showed human or tick cell dependent transcription. Previously unrecognized genes and coding sequences were identified, as were the expressed p44/msp2 (major surface proteins paralogs (of 114 total, through elevated signal produced to the unique hypervariable region of each – 2/114 in HL-60, 3/114 in HMEC-1, and none in ISE6. Conclusion Using these methods, whole genome transcription profiles can likely be generated for Ap, as well as other obligate intracellular organisms, in any host cells and for all stages of the cell infection process. Visual representation of comprehensive transcription data alongside an annotated map of the genome renders complex transcription into discernable patterns.

  18. Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.

    Directory of Open Access Journals (Sweden)

    Gu Minghong

    2010-11-01

    Full Text Available Abstract Background Genetic populations provide the basis for a wide range of genetic and genomic studies and have been widely used in genetic mapping, gene discovery and genomics-assisted breeding. Chromosome segment substitution lines (CSSLs are the most powerful tools for the detection and precise mapping of quantitative trait loci (QTLs, for the analysis of complex traits in plant molecular genetics. Results In this study, a wide population consisting of 128 CSSLs was developed, derived from the crossing and back-crossing of two sequenced rice cultivars: 9311, an elite indica cultivar as the recipient and Nipponbare, a japonica cultivar as the donor. First, a physical map of the 128 CSSLs was constructed on the basis of estimates of the lengths and locations of the substituted chromosome segments using 254 PCR-based molecular markers. From this map, the total size of the 142 substituted segments in the population was 882.2 Mb, was 2.37 times that of the rice genome. Second, every CSSL underwent high-throughput genotyping by whole-genome re-sequencing with a 0.13× genome sequence, and an ultrahigh-quality physical map was constructed. This sequencing-based physical map indicated that 117 new segments were detected; almost all were shorter than 3 Mb and were not apparent in the molecular marker map. Furthermore, relative to the molecular marker-based map, the sequencing-based map yielded more precise recombination breakpoint determination and greater accuracy of the lengths of the substituted segments, and provided more accurate background information. Third, using the 128 CSSLs combined with the bin-map converted from the sequencing-based physical map, a multiple linear regression QTL analysis mapped nine QTLs, which explained 89.50% of the phenotypic variance for culm length. A large-effect QTL was located in a 791,655 bp region that contained the rice 'green revolution' gene. Conclusions The present results demonstrated that high

  19. Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.).

    Science.gov (United States)

    Xu, Jianjun; Zhao, Qiang; Du, Peina; Xu, Chenwu; Wang, Baohe; Feng, Qi; Liu, Qiaoquan; Tang, Shuzhu; Gu, Minghong; Han, Bin; Liang, Guohua

    2010-11-24

    Genetic populations provide the basis for a wide range of genetic and genomic studies and have been widely used in genetic mapping, gene discovery and genomics-assisted breeding. Chromosome segment substitution lines (CSSLs) are the most powerful tools for the detection and precise mapping of quantitative trait loci (QTLs), for the analysis of complex traits in plant molecular genetics. In this study, a wide population consisting of 128 CSSLs was developed, derived from the crossing and back-crossing of two sequenced rice cultivars: 9311, an elite indica cultivar as the recipient and Nipponbare, a japonica cultivar as the donor. First, a physical map of the 128 CSSLs was constructed on the basis of estimates of the lengths and locations of the substituted chromosome segments using 254 PCR-based molecular markers. From this map, the total size of the 142 substituted segments in the population was 882.2 Mb, was 2.37 times that of the rice genome. Second, every CSSL underwent high-throughput genotyping by whole-genome re-sequencing with a 0.13× genome sequence, and an ultrahigh-quality physical map was constructed. This sequencing-based physical map indicated that 117 new segments were detected; almost all were shorter than 3 Mb and were not apparent in the molecular marker map. Furthermore, relative to the molecular marker-based map, the sequencing-based map yielded more precise recombination breakpoint determination and greater accuracy of the lengths of the substituted segments, and provided more accurate background information. Third, using the 128 CSSLs combined with the bin-map converted from the sequencing-based physical map, a multiple linear regression QTL analysis mapped nine QTLs, which explained 89.50% of the phenotypic variance for culm length. A large-effect QTL was located in a 791,655 bp region that contained the rice 'green revolution' gene. The present results demonstrated that high throughput genotyped CSSLs combine the advantages of an ultrahigh

  20. Interpreting Whole-Genome Marker Data

    Science.gov (United States)

    Weir, Bruce S.

    2013-01-01

    The challenges of whole-genome data, when genotypes are available from hundreds of thousands of genetic markers, are explored for four topics in statistical genetics: Hardy-Weinberg testing, estimating linkage disequilibrium from unphased genotypic data, association mapping and characterizing population structure. PMID:24273615

  1. Weighted Interaction SNP Hub (WISH) network method for building genetic networks for complex diseases and traits using whole genome genotype data.

    Science.gov (United States)

    Kogelman, Lisette J A; Kadarmideen, Haja N

    2014-01-01

    High-throughput genotype (HTG) data has been used primarily in genome-wide association (GWA) studies; however, GWA results explain only a limited part of the complete genetic variation of traits. In systems genetics, network approaches have been shown to be able to identify pathways and their underlying causal genes to unravel the biological and genetic background of complex diseases and traits, e.g., the Weighted Gene Co-expression Network Analysis (WGCNA) method based on microarray gene expression data. The main objective of this study was to develop a scale-free weighted genetic interaction network method using whole genome HTG data in order to detect biologically relevant pathways and potential genetic biomarkers for complex diseases and traits. We developed the Weighted Interaction SNP Hub (WISH) network method that uses HTG data to detect genome-wide interactions between single nucleotide polymorphism (SNPs) and its relationship with complex traits. Data dimensionality reduction was achieved by selecting SNPs based on its: 1) degree of genome-wide significance and 2) degree of genetic variation in a population. Network construction was based on pairwise Pearson's correlation between SNP genotypes or the epistatic interaction effect between SNP pairs. To identify modules the Topological Overlap Measure (TOM) was calculated, reflecting the degree of overlap in shared neighbours between SNP pairs. Modules, clusters of highly interconnected SNPs, were defined using a tree-cutting algorithm on the SNP dendrogram created from the dissimilarity TOM (1-TOM). Modules were selected for functional annotation based on their association with the trait of interest, defined by the Genome-wide Module Association Test (GMAT). We successfully tested the established WISH network method using simulated and real SNP interaction data and GWA study results for carcass weight in a pig resource population; this resulted in detecting modules and key functional and biological pathways

  2. Whole Genome Selection

    Science.gov (United States)

    Whole genome selection (WGS) is an approach to using DNA markers that are distributed throughout the entire genome. Genes affecting most economically-important traits are distributed throughout the genome and there are relatively few that have large effects with many more genes with progressively sm...

  3. Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias.

    Science.gov (United States)

    Didion, John P; Yang, Hyuna; Sheppard, Keith; Fu, Chen-Ping; McMillan, Leonard; de Villena, Fernando Pardo-Manuel; Churchill, Gary A

    2012-01-19

    High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs) in genetic studies, including human genome-wide association studies. Hybridization intensities are converted to genotype calls by clustering algorithms that assign each sample to a genotype class at each SNP. Data for SNP probes that do not conform to the expected pattern of clustering are often discarded, contributing to ascertainment bias and resulting in lost information - as much as 50% in a recent genome-wide association study in dogs. We identified atypical patterns of hybridization intensities that were highly reproducible and demonstrated that these patterns represent genetic variants that were not accounted for in the design of the array platform. We characterized variable intensity oligonucleotide (VINO) probes that display such patterns and are found in all hybridization-based genotyping platforms, including those developed for human, dog, cattle, and mouse. When recognized and properly interpreted, VINOs recovered a substantial fraction of discarded probes and counteracted SNP ascertainment bias. We developed software (MouseDivGeno) that identifies VINOs and improves the accuracy of genotype calling. MouseDivGeno produced highly concordant genotype calls when compared with other methods but it uniquely identified more than 786000 VINOs in 351 mouse samples. We used whole-genome sequence from 14 mouse strains to confirm the presence of novel variants explaining 28000 VINOs in those strains. We also identified VINOs in human HapMap 3 samples, many of which were specific to an African population. Incorporating VINOs in phylogenetic analyses substantially improved the accuracy of a Mus species tree and local haplotype assignment in laboratory mouse strains. The problems of ascertainment bias and missing information due to genotyping errors are widely recognized as

  4. Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias

    Directory of Open Access Journals (Sweden)

    Didion John P

    2012-01-01

    Full Text Available Abstract Background High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs in genetic studies, including human genome-wide association studies. Hybridization intensities are converted to genotype calls by clustering algorithms that assign each sample to a genotype class at each SNP. Data for SNP probes that do not conform to the expected pattern of clustering are often discarded, contributing to ascertainment bias and resulting in lost information - as much as 50% in a recent genome-wide association study in dogs. Results We identified atypical patterns of hybridization intensities that were highly reproducible and demonstrated that these patterns represent genetic variants that were not accounted for in the design of the array platform. We characterized variable intensity oligonucleotide (VINO probes that display such patterns and are found in all hybridization-based genotyping platforms, including those developed for human, dog, cattle, and mouse. When recognized and properly interpreted, VINOs recovered a substantial fraction of discarded probes and counteracted SNP ascertainment bias. We developed software (MouseDivGeno that identifies VINOs and improves the accuracy of genotype calling. MouseDivGeno produced highly concordant genotype calls when compared with other methods but it uniquely identified more than 786000 VINOs in 351 mouse samples. We used whole-genome sequence from 14 mouse strains to confirm the presence of novel variants explaining 28000 VINOs in those strains. We also identified VINOs in human HapMap 3 samples, many of which were specific to an African population. Incorporating VINOs in phylogenetic analyses substantially improved the accuracy of a Mus species tree and local haplotype assignment in laboratory mouse strains. Conclusion The problems of ascertainment bias and missing

  5. Genomic Variation by Whole-Genome SNP Mapping Arrays Predicts Time-to-Event Outcome in Patients with Chronic Lymphocytic Leukemia

    Science.gov (United States)

    Schweighofer, Carmen D.; Coombes, Kevin R.; Majewski, Tadeusz; Barron, Lynn L.; Lerner, Susan; Sargent, Rachel L.; O'Brien, Susan; Ferrajoli, Alessandra; Wierda, William G.; Czerniak, Bogdan A.; Medeiros, L. Jeffrey; Keating, Michael J.; Abruzzo, Lynne V.

    2013-01-01

    Genomic abnormalities, such as deletions in 11q22 or 17p13, are associated with poorer prognosis in patients with chronic lymphocytic leukemia (CLL). We hypothesized that unknown regions of copy number variation (CNV) affect clinical outcome and can be detected by array-based single-nucleotide polymorphism (SNP) genotyping. We compared SNP genotypes from 168 untreated patients with CLL with genotypes from 73 white HapMap controls. We identified 322 regions of recurrent CNV, 82 of which occurred significantly more often in CLL than in HapMap (CLL-specific CNV), including regions typically aberrant in CLL: deletions in 6q21, 11q22, 13q14, and 17p13 and trisomy 12. In univariate analyses, 35 of total and 11 of CLL-specific CNVs were associated with unfavorable time-to-event outcomes, including gains or losses in chromosomes 2p, 4p, 4q, 6p, 6q, 7q, 11p, 11q, and 17p. In multivariate analyses, six CNVs (ie, CLL-specific variations in 11p15.1-15.4 or 6q27) predicted time-to-treatment or overall survival independently of established markers of prognosis. Moreover, genotypic complexity (ie, the number of independent CNVs per patient) significantly predicted prognosis, with a median time-to-treatment of 64 months versus 23 months in patients with zero to one versus two or more CNVs, respectively (P = 3.3 × 10−8). In summary, a comparison of SNP genotypes from patients with CLL with HapMap controls allowed us to identify known and unknown recurrent CNVs and to determine regions and rates of CNV that predict poorer prognosis in patients with CLL. PMID:23273604

  6. Evaluation of whole genome amplified DNA to decrease material expenditure and increase quality

    Directory of Open Access Journals (Sweden)

    Marie Bækvad-Hansen

    2017-06-01

    Discussion: Whole genome amplified DNA samples from dried blood spots is well suited for array genotyping and produces robust and reliable genotype data. However, the amplification process introduces additional noise to the data, making detection of structural variants such as copy number variants difficult. With this study, we explore ways of optimizing the amplification protocol in order to reduce noise and increase data quality. We found, that the amplification process was very robust, and that changes in amplification time or temperature did not alter the genotyping calls or quality of the array data. Adding additional replicates of each sample also lead to insignificant changes in the array data. Thus, the amount of noise introduced by the amplification process was consistent regardless of changes made to the amplification protocol. We also explored ways of decreasing material expenditure by reducing the spot size or the amplification reaction volume. The reduction did not affect the quality of the genotyping data.

  7. A joint cross-border investigation of a cluster of multidrug-resistant tuberculosis in Austria, Romania and Germany in 2014 using classic, genotyping and whole genome sequencing methods: lessons learnt.

    Science.gov (United States)

    Fiebig, Lena; Kohl, Thomas A; Popovici, Odette; Mühlenfeld, Margarita; Indra, Alexander; Homorodean, Daniela; Chiotan, Domnica; Richter, Elvira; Rüsch-Gerdes, Sabine; Schmidgruber, Beatrix; Beckert, Patrick; Hauer, Barbara; Niemann, Stefan; Allerberger, Franz; Haas, Walter

    2017-01-12

    Molecular surveillance of multidrug-resistant tuberculosis (MDR-TB) using 24-loci MIRU-VNTR in the European Union suggests the occurrence of international transmission. In early 2014, Austria detected a molecular MDR-TB cluster of five isolates. Links to Romania and Germany prompted the three countries to investigate possible cross-border MDR-TB transmission jointly. We searched genotyping databases, genotyped additional isolates from Romania, used whole genome sequencing (WGS) to infer putative transmission links, and investigated pairwise epidemiological links and patient mobility. Ten isolates from 10 patients shared the same 24-loci MIRU-VNTR pattern. Within this cluster, WGS defined two subgroups of four patients each. The first comprised an MDR-TB patient from Romania who had sought medical care in Austria and two patients from Austria. The second comprised patients, two of them epidemiologically linked, who lived in three different countries but had the same city of provenance in Romania. Our findings strongly suggested that the two cases in Austrian citizens resulted from a newly introduced MDR-TB strain, followed by domestic transmission. For the other cases, transmission probably occurred in the same city of provenance. To prevent further MDR-TB transmission, we need to ensure universal access to early and adequate therapy and collaborate closely in tuberculosis care beyond administrative borders. This article is copyright of The Authors, 2017.

  8. Whole Genome Amplification of Day 3 or Day 5 Human Embryos Biopsies Provides a Suitable DNA Template for PCR-Based Techniques for Genotyping, a Complement of Preimplantation Genetic Testing

    Directory of Open Access Journals (Sweden)

    Elizabeth Schaeffer

    2017-01-01

    Full Text Available Our objective was to determine if whole genome amplification (WGA provides suitable DNA for qPCR-based genotyping for human embryos. Single blastomeres (Day 3 or trophoblastic cells (Day 5 were isolated from 342 embryos for WGA. Comparative Genomic Hybridization determined embryo sex as well as Trisomy 18 or Trisomy 21. To determine the embryo’s sex, qPCR melting curve analysis for SRY and DYS14 was used. Logistic regression indicated a 4.4%, 57.1%, or 98.8% probability of a male embryo when neither gene, SRY only, or both genes were detected, respectively (accuracy = 94.1%, kappa = 0.882, and p<0.001. Fluorescent Capillary Electrophoresis for the amelogenin genes (AMEL was also used to determine sex. AMELY peak’s height was higher and this peak’s presence was highly predictive of male embryos (AUC = 0.93, accuracy = 81.7%, kappa = 0.974, and p<0.001. Trisomy 18 and Trisomy 21 were determined using the threshold cycle difference for RPL17 and TTC3, respectively, which were significantly lower in the corresponding embryos. The Ct difference for TTC3 specifically determined Trisomy 21 (AUC = 0.89 and RPL17 for Trisomy 18 (AUC = 0.94. Here, WGA provides adequate DNA for PCR-based techniques for preimplantation genotyping.

  9. Archived neonatal dried blood spot samples can be used for accurate whole genome and exome-targeted next-generation sequencing

    DEFF Research Database (Denmark)

    Hollegaard, Mads Vilhelm; Grauholm, Jonas; Nielsen, Ronni

    2013-01-01

    , for example, to examine the genetics of various disorders. We have previously demonstrated that DNA extracted from a fraction (2×3.2mm discs) of an archived DBSS can be whole genome amplified (wgaDNA) and used for accurate array genotyping. However, until now, it has been uncertain whether wgaDNA from DBSS...... can be used for accurate whole genome sequencing (WGS) and exome sequencing (WES). This study examined two individuals represented by three different types of samples each: whole-blood (reference samples), 3-year-old DBSS spotted with reference material (refDBSS), and 27- to 29-year-old archived...

  10. Genetic mapping using the Diversity Arrays Technology (DArT) : application and validation using the whole-genome sequences of Arabidopsis thaliana and the fungal wheat pathogen Mycosphaerella graminicola

    NARCIS (Netherlands)

    Wittenberg, A.H.J.

    2007-01-01

    Diversity Arrays Technology (DArT) is a microarray-based DNA marker technique for genome-wide discovery and genotyping of genetic variation. DArT allows simultaneous scoring of hundreds- to thousands of restriction site based polymorphisms between genotypes and does not require DNA sequence

  11. Pooled DNA genotyping on Affymetrix SNP genotyping arrays

    Directory of Open Access Journals (Sweden)

    Owen Michael J

    2006-02-01

    Full Text Available Abstract Background Genotyping technology has advanced such that genome-wide association studies of complex diseases based upon dense marker maps are now technically feasible. However, the cost of such projects remains high. Pooled DNA genotyping offers the possibility of applying the same technologies at a fraction of the cost, and there is some evidence that certain ultra-high throughput platforms also perform with an acceptable accuracy. However, thus far, this conclusion is based upon published data concerning only a small number of SNPs. Results In the current study we prepared DNA pools from the parents and from the offspring of 30 parent-child trios that have been extensively genotyped by the HapMap project. We analysed the two pools with Affymetrix 10 K Xba 142 2.0 Arrays. The availability of the HapMap data allowed us to validate the performance of 6843 SNPs for which we had both complete individual and pooled genotyping data. Pooled analyses averaged over 5–6 microarrays resulted in highly reproducible results. Moreover, the accuracy of estimating differences in allele frequency between pools using this ultra-high throughput system was comparable with previous reports of pooling based upon lower throughput platforms, with an average error for the predicted allelic frequencies differences between the two pools of 1.37% and with 95% of SNPs showing an error of Conclusion Genotyping thousands of SNPs with DNA pooling using Affymetrix microarrays produces highly accurate results and can be used for genome-wide association studies.

  12. Whole-genome sequencing and analysis of the Malaysian cynomolgus macaque (Macaca fascicularis) genome.

    Science.gov (United States)

    Higashino, Atsunori; Sakate, Ryuichi; Kameoka, Yosuke; Takahashi, Ichiro; Hirata, Makoto; Tanuma, Reiko; Masui, Tohru; Yasutomi, Yasuhiro; Osada, Naoki

    2012-07-02

    The genetic background of the cynomolgus macaque (Macaca fascicularis) is made complex by the high genetic diversity, population structure, and gene introgression from the closely related rhesus macaque (Macaca mulatta). Herein we report the whole-genome sequence of a Malaysian cynomolgus macaque male with more than 40-fold coverage, which was determined using a resequencing method based on the Indian rhesus macaque genome. We identified approximately 9.7 million single nucleotide variants (SNVs) between the Malaysian cynomolgus and the Indian rhesus macaque genomes. Compared with humans, a smaller nonsynonymous/synonymous SNV ratio in the cynomolgus macaque suggests more effective removal of slightly deleterious mutations. Comparison of two cynomolgus (Malaysian and Vietnamese) and two rhesus (Indian and Chinese) macaque genomes, including previously published macaque genomes, suggests that Indochinese cynomolgus macaques have been more affected by gene introgression from rhesus macaques. We further identified 60 nonsynonymous SNVs that completely differentiated the cynomolgus and rhesus macaque genomes, and that could be important candidate variants for determining species-specific responses to drugs and pathogens. The demographic inference using the genome sequence data revealed that Malaysian cynomolgus macaques have experienced at least three population bottlenecks. This list of whole-genome SNVs will be useful for many future applications, such as an array-based genotyping system for macaque individuals. High-quality whole-genome sequencing of the cynomolgus macaque genome may aid studies on finding genetic differences that are responsible for phenotypic diversity in macaques and may help control genetic backgrounds among individuals.

  13. Analysis of phage Mu DNA transposition by whole-genome ...

    Indian Academy of Sciences (India)

    (Trilink Biotechnologies) were employed. Sample DNA. (ChIP or processed Mu DNA) was amplified with Cy5-9mer primer, and reference DNA (Input or whole genome DNA) with Cy3-9mer primer. The samples were loaded on microarray slides and subjected to standard hybridization procedures (NimbleGen Arrays User's ...

  14. A SNP Genotyping Array for Hexaploid Oat

    Directory of Open Access Journals (Sweden)

    Nicholas A. Tinker

    2014-11-01

    Full Text Available Recognizing a need in cultivated hexaploid oat ( L. for a reliable set of reference single nucleotide polymorphisms (SNPs, we have developed a 6000 (6K BeadChip design containing 257 Infinium I and 5486 Infinium II designs corresponding to 5743 SNPs. Of those, 4975 SNPs yielded successful assays after array manufacturing. These SNPs were discovered based on a variety of bioinformatics pipelines in complementary DNA (cDNA and genomic DNA originating from 20 or more diverse oat cultivars. The array was validated in 1100 samples from six recombinant inbred line (RIL mapping populations and sets of diverse oat cultivars and breeding lines, and provided approximately 3500 discernible Mendelian polymorphisms. Here, we present an annotation of these SNPs, including methods of discovery, gene identification and orthology, population-genetic characteristics, and tentative positions on an oat consensus map. We also evaluate a new cluster-based method of calling SNPs. The SNP design sequences are made publicly available, and the full SNP genotyping platform is available for commercial purchase from an independent third party.

  15. Large SNP arrays for genotyping in crop plants

    Indian Academy of Sciences (India)

    For a number of important crop plants, SNP markers are now being used to design genotyping arrays containing thousands of markers spread over the entire genome and to analyse large numbers of samples. In this article, we discuss aspects that should be considered during the design of such large genotyping arrays and ...

  16. GENOME-WIDE ASSOCIATION ANALYSES BASED ON WHOLE-GENOME SEQUENCING IN SARDINIA PROVIDE INSIGHTS INTO REGULATION OF HEMOGLOBIN LEVELS

    Science.gov (United States)

    Danjou, Fabrice; Zoledziewska, Magdalena; Sidore, Carlo; Steri, Maristella; Busonero, Fabio; Maschio, Andrea; Mulas, Antonella; Perseu, Lucia; Barella, Susanna; Porcu, Eleonora; Pistis, Giorgio; Pitzalis, Maristella; Pala, Mauro; Menzel, Stephan; Metrustry, Sarah; Spector, Timothy D.; Leoni, Lidia; Angius, Andrea; Uda, Manuela; Moi, Paolo; Thein, Swee Lay; Galanello, Renzo; Abecasis, Gonçalo R.; Schlessinger, David; Sanna, Serena; Cucca, Francesco

    2015-01-01

    We report GWAS results for the levels of A1, A2 and fetal hemoglobins, analyzed for the first time concurrently. Integrating high-density array genotyping and whole-genome sequencing in a large general population cohort from Sardinia, we detected 23 associations at 10 loci. Five are due to variants at previously undetected loci: MPHOSPH9, PLTP-PCIF1, FOG1, NFIX, and CCND3. Among those at known loci, 10 are new lead variants and 4 are novel independent signals. Half of all variants also showed pleiotropic associations with different hemoglobins, which further corroborated some of the detected associations and revealed features of coordinated hemoglobin species production. PMID:26366553

  17. Specificity of the Linear Array HPV Genotyping Test for detecting human papillomavirus genotype 52 (HPV-52)

    OpenAIRE

    Kocjan, Boštjan; Poljak, Mario; Oštrbenk, Anja

    2015-01-01

    Introduction: HPV-52 is one of the most frequent human papillomavirus (HPV) genotypes causing significant cervical pathology. The most widely used HPV genotyping assay, the Roche Linear Array HPV Genotyping Test (Linear Array), is unable to identify HPV- 52 status in samples containing HPV-33, HPV-35, and/or HPV-58. Methods: Linear Array HPV-52 analytical specificity was established by testing 100 specimens reactive with the Linear Array HPV- 33/35/52/58 cross-reactive probe, but not with the...

  18. Whole genome amplification - Review of applications and advances

    Energy Technology Data Exchange (ETDEWEB)

    Hawkins, Trevor L.; Detter, J.C.; Richardson, Paul

    2001-11-15

    The concept of Whole Genome Amplification is something that has arisen in the past few years as modifications to the polymerase chain reaction (PCR) have been adapted to replicate regions of genomes which are of biological interest. The applications here are many--forensics, embryonic disease diagnosis, bio terrorism genome detection, ''imoralization'' of clinical samples, microbial diversity, and genotyping. The key question is if DNA can be replicated a genome at a time without bias or non random distribution of the target. Several papers published in the last year and currently in preparation may lead to the conclusion that whole genome amplification may indeed be possible and therefore open up a new avenue to molecular biology.

  19. A Whole Genome Association Study on Meat Palatability in Hanwoo

    Directory of Open Access Journals (Sweden)

    K.-E. Hyeong

    2014-09-01

    Full Text Available A whole genome association (WGA study was carried out to find quantitative trait loci (QTL for sensory evaluation traits in Hanwoo. Carcass samples of 250 Hanwoo steers were collected from National Agricultural Cooperative Livestock Research Institute, Ansung, Gyeonggi province, Korea, between 2011 and 2012 and genotyped with the Affymetrix Bovine Axiom Array 640K single nucleotide polymorphism (SNP chip. Among the SNPs in the chip, a total of 322,160 SNPs were chosen after quality control tests. After adjusting for the effects of age, slaughter-year-season, and polygenic effects using genome relationship matrix, the corrected phenotypes for the sensory evaluation measurements were regressed on each SNP using a simple linear regression additive based model. A total of 1,631 SNPs were detected for color, aroma, tenderness, juiciness and palatability at 0.1% comparison-wise level. Among the significant SNPs, the best set of 52 SNP markers were chosen using a forward regression procedure at 0.05 level, among which the sets of 8, 14, 11, 10, and 9 SNPs were determined for the respectively sensory evaluation traits. The sets of significant SNPs explained 18% to 31% of phenotypic variance. Three SNPs were pleiotropic, i.e. AX-26703353 and AX-26742891 that were located at 101 and 110 Mb of BTA6, respectively, influencing tenderness, juiciness and palatability, while AX-18624743 at 3 Mb of BTA10 affected tenderness and palatability. Our results suggest that some QTL for sensory measures are segregating in a Hanwoo steer population. Additional WGA studies on fatty acid and nutritional components as well as the sensory panels are in process to characterize genetic architecture of meat quality and palatability in Hanwoo.

  20. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected...... this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome sequence data alongside the 54k SNP set....

  1. Whole genome and transcriptome sequencing of a B3 thymoma.

    Directory of Open Access Journals (Sweden)

    Iacopo Petrini

    Full Text Available Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37. Copy number (CN aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs and 2 insertion/deletions (INDELs were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma.

  2. Whole genome sequencing of clinical isolates of Giardia lamblia.

    Science.gov (United States)

    Hanevik, K; Bakken, R; Brattbakk, H R; Saghaug, C S; Langeland, N

    2015-02-01

    Clinical isolates from protozoan parasites such as Giardia lamblia are at present practically impossible to culture. By using simple cyst purification methods, we show that Giardia whole genome sequencing of clinical stool samples is possible. Immunomagnetic separation after sucrose gradient flotation gave superior results compared to sucrose gradient flotation alone. The method enables detailed analysis of a wide range of genes of interest for genotyping, virulence and drug resistance. Copyright © 2014 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.

  3. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  4. A SNP genotyping array for hexaploid oat

    Science.gov (United States)

    Recognizing a need in cultivated hexaploid oat (Avena sativa L.) for a reliable set of reference SNPs, we have developed a 6K BeadChip design containing 257 Infinium I and 5,486 Infinium II designs corresponding to 5,743 SNPs. Of those, 4,975 SNPs yielded successful assays after array manufacturing...

  5. Light whole genome sequence for SNP discovery across domestic cat breeds

    Directory of Open Access Journals (Sweden)

    Driscoll Carlos

    2010-06-01

    Full Text Available Abstract Background The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline coronavirus -- FECV, feline immunodeficiency virus - FIV that are homologues to human scourges (cancer, SARS, and AIDS respectively. However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP map is required in order to accomplish disease and phenotype association discovery. Description To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. Conclusions These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.

  6. Whole genome analysis of a Vietnamese trio

    Indian Academy of Sciences (India)

    We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average ... Wellcome Trust Center for Human Genetics, Oxford University, Oxford, UK; High Performance Computing Center, Hanoi University of Science and Technology, ...

  7. Whole genome sequences are required to fully resolve the linkage disequilibrium structure of human populations.

    Science.gov (United States)

    Pengelly, Reuben J; Tapper, William; Gibson, Jane; Knut, Marcin; Tearle, Rick; Collins, Andrew; Ennis, Sarah

    2015-09-03

    An understanding of linkage disequilibrium (LD) structures in the human genome underpins much of medical genetics and provides a basis for disease gene mapping and investigating biological mechanisms such as recombination and selection. Whole genome sequencing (WGS) provides the opportunity to determine LD structures at maximal resolution. We compare LD maps constructed from WGS data with LD maps produced from the array-based HapMap dataset, for representative European and African populations. WGS provides up to 5.7-fold greater SNP density than array-based data and achieves much greater resolution of LD structure, allowing for identification of up to 2.8-fold more regions of intense recombination. The absence of ascertainment bias in variant genotyping improves the population representativeness of the WGS maps, and highlights the extent of uncaptured variation using array genotyping methodologies. The complete capture of LD patterns using WGS allows for higher genome-wide association study (GWAS) power compared to array-based GWAS, with WGS also allowing for the analysis of rare variation. The impact of marker ascertainment issues in arrays has been greatest for Sub-Saharan African populations where larger sample sizes and substantially higher marker densities are required to fully resolve the LD structure. WGS provides the best possible resource for LD mapping due to the maximal marker density and lack of ascertainment bias. WGS LD maps provide a rich resource for medical and population genetics studies. The increasing availability of WGS data for large populations will allow for improved research utilising LD, such as GWAS and recombination biology studies.

  8. Deep whole-genome sequencing of 100 southeast Asian Malays.

    Science.gov (United States)

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia.

    Directory of Open Access Journals (Sweden)

    Ulziijargal Gurjav

    Full Text Available Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24 genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW, Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841 had complete demographic and genotyping data. East-African Indian (474; 28.0% and Beijing (470; 27.8% lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692 and was highest among Beijing lineage strains (35.7%; 168/470. One Beijing and three East-African Indian (EAI clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates demonstrated diverse single nucleotide polymorphisms (SNPs within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings.

  10. Whole genome phylogenies for multiple Drosophila species

    Directory of Open Access Journals (Sweden)

    Seetharam Arun

    2012-12-01

    Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between

  11. A binary search approach to whole-genome data analysis

    Science.gov (United States)

    Brodsky, Leonid; Kogan, Simon; BenJacob, Eshel; Nevo, Eviatar

    2010-01-01

    A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regulated signals, at equal accuracy. The score of an enriched genome fragment reflects the difference between the actual concentration of up-regulated signals in the fragment and the chromosome signal baseline. The “divide-and-conquer”-type algorithm detects a series of nonintersecting fragments of various lengths with locally optimal scores. The procedure is applied to detected fragments in a nested manner by recalculating the lower-than-baseline signals in the chromosome. The algorithm was applied to simulated whole-genome data, and its sensitivity/specificity were compared with those of several alternative algorithms. The algorithm was also tested with four biological tiling array datasets comprising Arabidopsis (i) expression and (ii) histone 3 lysine 27 trimethylation CHIP-on-chip datasets; Saccharomyces cerevisiae (iii) spliced intron data and (iv) chromatin remodeling factor binding sites. The analyses’ results demonstrate the power of the algorithm in identifying both the short up-regulated fragments (such as exons and transcription factor binding sites) and the long—even moderately up-regulated zones—at their precise genome margins. The algorithm generates an accurate whole-genome landscape that could be used for cross-comparison of signals across the same genome in evolutionary and general genomic studies. PMID:20833816

  12. Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate.

    Directory of Open Access Journals (Sweden)

    Benjamin Georgi

    2014-03-01

    Full Text Available Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders.

  13. Robust SNP genotyping by multiplex PCR and arrayed primer extension

    Directory of Open Access Journals (Sweden)

    Podder Mohua

    2008-01-01

    Full Text Available Abstract Background Arrayed primer extension (APEX is a microarray-based rapid minisequencing methodology that may have utility in 'personalized medicine' applications that involve genetic diagnostics of single nucleotide polymorphisms (SNPs. However, to date there have been few reports that objectively evaluate the assay completion rate, call rate and accuracy of APEX. We have further developed robust assay design, chemistry and analysis methodologies, and have sought to determine how effective APEX is in comparison to leading 'gold-standard' genotyping platforms. Our methods have been tested against industry-leading technologies in two blinded experiments based on Coriell DNA samples and SNP genotype data from the International HapMap Project. Results In the first experiment, we genotyped 50 SNPs across the entire 270 HapMap Coriell DNA sample set. For each Coriell sample, DNA template was amplified in a total of 7 multiplex PCRs prior to genotyping. We obtained good results for 41 of the SNPs, with 99.8% genotype concordance with HapMap data, at an automated call rate of 94.9% (not including the 9 failed SNPs. In the second experiment, involving modifications to the initial DNA amplification so that a single 50-plex PCR could be achieved, genotyping of the same 50 SNPs across each of 49 randomly chosen Coriell DNA samples allowed extremely robust 50-plex genotyping from as little as 5 ng of DNA, with 100% assay completion rate, 100% call rate and >99.9% accuracy. Conclusion We have shown our methods to be effective for robust multiplex SNP genotyping using APEX, with 100% call rate and >99.9% accuracy. We believe that such methodology may be useful in future point-of-care clinical diagnostic applications where accuracy and call rate are both paramount.

  14. A randomization test for controlling population stratification in whole-genome association studies.

    Science.gov (United States)

    Kimmel, Gad; Jordan, Michael I; Halperin, Eran; Shamir, Ron; Karp, Richard M

    2007-11-01

    Population stratification can be a serious obstacle in the analysis of genomewide association studies. We propose a method for evaluating the significance of association scores in whole-genome cohorts with stratification. Our approach is a randomization test akin to a standard permutation test. It conditions on the genotype matrix and thus takes into account not only the population structure but also the complex linkage disequilibrium structure of the genome. As we show in simulation experiments, our method achieves higher power and significantly better control over false-positive rates than do existing methods. In addition, it can be easily applied to whole-genome association studies.

  15. Germline and somatic variant identification using BGISEQ-500 and HiSeq X Ten whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Ann-Marie Patch

    Full Text Available Technological innovation and increased affordability have contributed to the widespread adoption of genome sequencing technologies in biomedical research. In particular large cancer research consortia have embraced next generation sequencing, and have used the technology to define the somatic mutation landscape of multiple cancer types. These studies have primarily utilised the Illumina HiSeq platforms. In this study we performed whole genome sequencing of three malignant pleural mesothelioma and matched normal samples using a new platform, the BGISEQ-500, and compared the results obtained with Illumina HiSeq X Ten. Germline and somatic, single nucleotide variants and small insertions or deletions were independently identified from data aligned human genome reference. The BGISEQ-500 and HiSeq X Ten platforms showed high concordance for germline calls with genotypes from SNP arrays (>99%. The germline and somatic single nucleotide variants identified in both sequencing platforms were highly concordant (86% and 72% respectively. These results indicate the potential applicability of the BGISEQ-500 platform for the identification of somatic and germline single nucleotide variants by whole genome sequencing. The BGISEQ-500 datasets described here represent the first publicly-available cancer genome sequencing performed using this platform.

  16. Harnessing Whole Genome Sequencing in Medical Mycology.

    Science.gov (United States)

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  17. Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle.

    Science.gov (United States)

    Larkin, Denis M; Daetwyler, Hans D; Hernandez, Alvaro G; Wright, Chris L; Hetrick, Lorie A; Boucek, Lisa; Bachman, Sharon L; Band, Mark R; Akraiko, Tatsiana V; Cohen-Zinder, Miri; Thimmapuram, Jyothi; Macleod, Iona M; Harkins, Timothy T; McCague, Jennifer E; Goddard, Michael E; Hayes, Ben J; Lewin, Harris A

    2012-05-15

    Using a combination of whole-genome resequencing and high-density genotyping arrays, genome-wide haplotypes were reconstructed for two of the most important bulls in the history of the dairy cattle industry, Pawnee Farm Arlinda Chief ("Chief") and his son Walkway Chief Mark ("Mark"), each accounting for ∼7% of all current genomes. We aligned 20.5 Gbp (∼7.3× coverage) and 37.9 Gbp (∼13.5× coverage) of the Chief and Mark genomic sequences, respectively. More than 1.3 million high-quality SNPs were detected in Chief and Mark sequences. The genome-wide haplotypes inherited by Mark from Chief were reconstructed using ∼1 million informative SNPs. Comparison of a set of 15,826 SNPs that overlapped in the sequence-based and BovineSNP50 SNPs showed the accuracy of the sequence-based haplotype reconstruction to be as high as 97%. By using the BovineSNP50 genotypes, the frequencies of Chief alleles on his two haplotypes then were determined in 1,149 of his descendants, and the distribution was compared with the frequencies that would be expected assuming no selection. We identified 49 chromosomal segments in which Chief alleles showed strong evidence of selection. Candidate polymorphisms for traits that have been under selection in the dairy cattle population then were identified by referencing Chief's DNA sequence within these selected chromosome blocks. Eleven candidate genes were identified with functions related to milk-production, fertility, and disease-resistance traits. These data demonstrate that haplotype reconstruction of an ancestral proband by whole-genome resequencing in combination with high-density SNP genotyping of descendants can be used for rapid, genome-wide identification of the ancestor's alleles that have been subjected to artificial selection.

  18. Dirofilaria immitis JYD-34 isolate: whole genome analysis

    Directory of Open Access Journals (Sweden)

    Catherine Bourguinat

    2017-11-01

    Full Text Available Abstract Background Macrocyclic lactone (ML anthelmintics are used for chemoprophylaxis for heartworm infection in dogs and cats. Cases of dogs becoming infected with heartworms, despite apparent compliance to recommended chemoprophylaxis with approved preventives, has led to such cases being considered as suspected lack of efficacy (LOE. Recently, microfilariae collected from a small number of LOE isolates were used as a source of infection of new host dogs and confirmed to have reduced susceptibility to ML in controlled efficacy studies using L3 challenge in dogs. A specific Dirofilaria immitis laboratory isolate named JYD-34 has also been confirmed to have less than 100% susceptibility to ML-based preventives. For preventive claims against heartworm disease, evidence of 100% efficacy is required by FDA-CVM. It was therefore of interest to determine whether JYD-34 has a genetic profile similar to other documented LOE and confirmed reduced susceptibility isolates or has a genetic profile similar to known ML-susceptible isolates. Methods In this study, the 90Mbp whole genome of the JYD-34 strain was sequenced. This genome was compared using bioinformatics tools to pooled whole genomes of four well-characterized susceptible D. immitis populations, one susceptible Missouri laboratory isolate, as well as the pooled whole genomes of four LOE D. immitis populations. Fixation indexes (FST, which allow the genetic structure of each population (isolate to be compared at the level of single nucleotide polymorphisms (SNP across the genome, have been calculated. Forty-one previously reported SNP, that appeared to differentiate between susceptible and LOE and confirmed reduced susceptibility isolates, were also investigated in the JYD-34 isolate. Results The FST analysis, and the analysis of the 41 SNP that appeared to differentiate reduced susceptibility from fully susceptible isolates, confirmed that the JYD-34 isolate has a genome similar to previously

  19. Dirofilaria immitis JYD-34 isolate: whole genome analysis.

    Science.gov (United States)

    Bourguinat, Catherine; Lefebvre, Francois; Sandoval, Johanna; Bondesen, Brenda; Moreno, Yovany; Prichard, Roger K

    2017-11-09

    Macrocyclic lactone (ML) anthelmintics are used for chemoprophylaxis for heartworm infection in dogs and cats. Cases of dogs becoming infected with heartworms, despite apparent compliance to recommended chemoprophylaxis with approved preventives, has led to such cases being considered as suspected lack of efficacy (LOE). Recently, microfilariae collected from a small number of LOE isolates were used as a source of infection of new host dogs and confirmed to have reduced susceptibility to ML in controlled efficacy studies using L3 challenge in dogs. A specific Dirofilaria immitis laboratory isolate named JYD-34 has also been confirmed to have less than 100% susceptibility to ML-based preventives. For preventive claims against heartworm disease, evidence of 100% efficacy is required by FDA-CVM. It was therefore of interest to determine whether JYD-34 has a genetic profile similar to other documented LOE and confirmed reduced susceptibility isolates or has a genetic profile similar to known ML-susceptible isolates. In this study, the 90Mbp whole genome of the JYD-34 strain was sequenced. This genome was compared using bioinformatics tools to pooled whole genomes of four well-characterized susceptible D. immitis populations, one susceptible Missouri laboratory isolate, as well as the pooled whole genomes of four LOE D. immitis populations. Fixation indexes (FST), which allow the genetic structure of each population (isolate) to be compared at the level of single nucleotide polymorphisms (SNP) across the genome, have been calculated. Forty-one previously reported SNP, that appeared to differentiate between susceptible and LOE and confirmed reduced susceptibility isolates, were also investigated in the JYD-34 isolate. The FST analysis, and the analysis of the 41 SNP that appeared to differentiate reduced susceptibility from fully susceptible isolates, confirmed that the JYD-34 isolate has a genome similar to previously investigated LOE isolates, and isolates confirmed to

  20. HPV genotype-specific concordance between EuroArray HPV, Anyplex II HPV28 and Linear Array HPV Genotyping test in Australian cervical samples

    Directory of Open Access Journals (Sweden)

    Alyssa M. Cornall

    2017-12-01

    Full Text Available Purpose: To compare human papillomavirus genotype-specific performance of two genotyping assays, Anyplex II HPV28 (Seegene and EuroArray HPV (EuroImmun, with Linear Array HPV (Roche. Methods: DNA extracted from clinican-collected cervical brush specimens in PreservCyt medium (Hologic, from 403 women undergoing management for detected cytological abnormalities, was tested on the three assays. Genotype-specific agreement were assessed by Cohen's kappa statistic and Fisher's z-test of significance between proportions. Results: Agreement between Linear Array and the other 2 assays was substantial to almost perfect (κ = 0.60 − 1.00 for most genotypes, and was almost perfect (κ = 0.81 – 0.98 for almost all high-risk genotypes. Linear Array overall detected most genotypes more frequently, however this was only statistically significant for HPV51 (EuroArray; p = 0.0497, HPV52 (Anyplex II; p = 0.039 and HPV61 (Anyplex II; p=0.047. EuroArray detected signficantly more HPV26 (p = 0.002 and Anyplex II detected more HPV42 (p = 0.035 than Linear Array. Each assay performed differently for HPV68 detection: EuroArray and LA were in moderate to substantial agreement with Anyplex II (κ = 0.46 and 0.62, respectively, but were in poor disagreement with each other (κ = −0.01. Conclusions: EuroArray and Anyplex II had similar sensitivity to Linear Array for most high-risk genotypes, with slightly lower sensitivity for HPV 51 or 52. Keywords: Human papillomavirus, Genotyping, Linear Array, Anyplex II, EuroArray, Cervix

  1. Benchmark Dataset for Whole Genome Sequence Compression.

    Science.gov (United States)

    C L, Biji; S Nair, Achuthsankar

    2017-01-01

    The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.

  2. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  3. Strategies and tools for whole genome alignments

    Energy Technology Data Exchange (ETDEWEB)

    Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas; Ishkhanov,Tigran; Ryaboy, Dmitriy; Rubin, Edward; Pachter, Lior; Dubchak, Inna

    2002-11-25

    The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With a view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.

  4. Whole Genome Sequencing and Newborn Screening.

    Science.gov (United States)

    Botkin, Jeffrey R; Rothwell, Erin

    2016-03-01

    Clinical applications of next generation sequencing are growing at a tremendous pace. Currently the largest application of genetic testing in medicine occurs with newborn screening through state-mandated public health programs, and there are suggestions that sequencing could become a standard component of newborn care within the next decade. As such, newborn screening may appear to be a logical starting point to explore whole genome and whole exome sequencing on a population level. Yet, there are a number of ethical, social and legal implications about the use of a mandatory public health screening program that create challenges for the use of sequencing technologies in this context. Additionally, at this time we still have limited understanding and strategies for managing genomic data, supporting our conclusion that genome sequencing is not justified within population based public health programs for newborn screening.

  5. Principles of Whole-Genome Amplification.

    Science.gov (United States)

    Czyz, Zbigniew Tadeusz; Kirsch, Stefan; Polzer, Bernhard

    2015-01-01

    Modern molecular biology relies on large amounts of high-quality genomic DNA. However, in a number of clinical or biological applications this requirement cannot be met, as starting material is either limited (e.g., preimplantation genetic diagnosis (PGD) or analysis of minimal residual cancer) or of insufficient quality (e.g., formalin-fixed paraffin-embedded tissue samples or forensics). As a consequence, in order to obtain sufficient amounts of material to analyze these demanding samples by state-of-the-art modern molecular assays, genomic DNA has to be amplified. This chapter summarizes available technologies for whole-genome amplification (WGA), bridging the last 25 years from the first developments to currently applied methods. We will especially elaborate on research application, as well as inherent advantages and limitations of various WGA technologies.

  6. Whole-Genome Sequencing for National Surveillance of Shigella flexneri

    Directory of Open Access Journals (Sweden)

    Marie A. Chattaway

    2017-09-01

    Full Text Available National surveillance of Shigella flexneri ensures the rapid detection of outbreaks to facilitate public health investigation and intervention strategies. In this study, we used whole-genome sequencing (WGS to type S. flexneri in order to detect linked cases and support epidemiological investigations. We prospectively analyzed 330 isolates of S. flexneri received at the Gastrointestinal Bacteria Reference Unit at Public Health England between August 2015 and January 2016. Traditional phenotypic and WGS sub-typing methods were compared. PCR was carried out on isolates exhibiting phenotypic/genotypic discrepancies with respect to serotype. Phylogenetic relationships between isolates were analyzed by WGS using single nucleotide polymorphism (SNP typing to facilitate cluster detection. For 306/330 (93% isolates there was concordance between serotype derived from the genome and phenotypic serology. Discrepant results between the phenotypic and genotypic tests were attributed to novel O-antigen synthesis/modification gene combinations or indels identified in O-antigen synthesis/modification genes rendering them dysfunctional. SNP typing identified 36 clusters of two isolates or more. WGS provided microbiological evidence of epidemiologically linked clusters and detected novel O-antigen synthesis/modification gene combinations associated with two outbreaks. WGS provided reliable and robust data for monitoring trends in the incidence of different serotypes over time. SNP typing can be used to facilitate outbreak investigations in real-time thereby informing surveillance strategies and providing the opportunities for implementing timely public health interventions.

  7. Whole genome sequencing for lung cancer.

    Science.gov (United States)

    Daniels, Marissa; Goh, Felicia; Wright, Casey M; Sriram, Krishna B; Relan, Vandana; Clarke, Belinda E; Duhig, Edwina E; Bowman, Rayleen V; Yang, Ian A; Fong, Kwun M

    2012-04-01

    Lung cancer is a leading cause of cancer related morbidity and mortality globally, and carries a dismal prognosis. Improved understanding of the biology of cancer is required to improve patient outcomes. Next-generation sequencing (NGS) is a powerful tool for whole genome characterisation, enabling comprehensive examination of somatic mutations that drive oncogenesis. Most NGS methods are based on polymerase chain reaction (PCR) amplification of platform-specific DNA fragment libraries, which are then sequenced. These techniques are well suited to high-throughput sequencing and are able to detect the full spectrum of genomic changes present in cancer. However, they require considerable investments in time, laboratory infrastructure, computational analysis and bioinformatic support. Next-generation sequencing has been applied to studies of the whole genome, exome, transcriptome and epigenome, and is changing the paradigm of lung cancer research and patient care. The results of this new technology will transform current knowledge of oncogenic pathways and provide molecular targets of use in the diagnosis and treatment of cancer. Somatic mutations in lung cancer have already been identified by NGS, and large scale genomic studies are underway. Personalised treatment strategies will improve care for those likely to benefit from available therapies, while sparing others the expense and morbidity of futile intervention. Organisational, computational and bioinformatic challenges of NGS are driving technological advances as well as raising ethical issues relating to informed consent and data release. Differentiation between driver and passenger mutations requires careful interpretation of sequencing data. Challenges in the interpretation of results arise from the types of specimens used for DNA extraction, sample processing techniques and tumour content. Tumour heterogeneity can reduce power to detect mutations implicated in oncogenesis. Next-generation sequencing will

  8. Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Plant Ramona N

    2006-08-01

    Full Text Available Abstract Background Whole genome amplification is an increasingly common technique through which minute amounts of DNA can be multiplied to generate quantities suitable for genetic testing and analysis. Questions of amplification-induced error and template bias generated by these methods have previously been addressed through either small scale (SNPs or large scale (CGH array, FISH methodologies. Here we utilized whole genome sequencing to assess amplification-induced bias in both coding and non-coding regions of two bacterial genomes. Halobacterium species NRC-1 DNA and Campylobacter jejuni were amplified by several common, commercially available protocols: multiple displacement amplification, primer extension pre-amplification and degenerate oligonucleotide primed PCR. The amplification-induced bias of each method was assessed by sequencing both genomes in their entirety using the 454 Sequencing System technology and comparing the results with those obtained from unamplified controls. Results All amplification methodologies induced statistically significant bias relative to the unamplified control. For the Halobacterium species NRC-1 genome, assessed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 119 times greater than those from unamplified material, 164.0 times greater for Repli-G, 165.0 times greater for PEP-PCR and 252.0 times greater than the unamplified controls for DOP-PCR. For Campylobacter jejuni, also analyzed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 15 times greater than those from unamplified material, 19.8 times greater for Repli-G, 61.8 times greater for PEP-PCR and 220.5 times greater than the unamplified controls for DOP-PCR. Conclusion Of the amplification methodologies examined in this paper, the multiple displacement amplification products generated the least bias, and produced significantly higher yields of amplified DNA.

  9. Diversity and Evolution of Mycobacterium tuberculosis: Moving to Whole-Genome-Based Approaches

    Science.gov (United States)

    Niemann, Stefan; Supply, Philip

    2014-01-01

    Genotyping of clinical Mycobacterium tuberculosis complex (MTBC) strains has become a standard tool for epidemiological tracing and for the investigation of the local and global strain population structure. Of special importance is the analysis of the expansion of multidrug (MDR) and extensively drug-resistant (XDR) strains. Classical genotyping and, more recently, whole-genome sequencing have revealed that the strains of the MTBC are more diverse than previously anticipated. Globally, several phylogenetic lineages can be distinguished whose geographical distribution is markedly variable. Strains of particular (sub)lineages, such as Beijing, seem to be more virulent and associated with enhanced resistance levels and fitness, likely fueling their spread in certain world regions. The upcoming generalization of whole-genome sequencing approaches will expectedly provide more comprehensive insights into the molecular and epidemiological mechanisms involved and lead to better diagnostic and therapeutic tools. PMID:25190252

  10. A Randomization Test for Controlling Population Stratification in Whole-Genome Association Studies

    OpenAIRE

    Kimmel, Gad; Jordan, Michael I.; Halperin, Eran; Shamir, Ron; Karp, Richard M.

    2007-01-01

    Population stratification can be a serious obstacle in the analysis of genomewide association studies. We propose a method for evaluating the significance of association scores in whole-genome cohorts with stratification. Our approach is a randomization test akin to a standard permutation test. It conditions on the genotype matrix and thus takes into account not only the population structure but also the complex linkage disequilibrium structure of the genome. As we show in simulation experime...

  11. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  12. Whole genome sequence analysis of Mycobacterium suricattae.

    Science.gov (United States)

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Small Sample Whole-Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Hara, C A; Nguyen, C P; Wheeler, E K; Sorensen, K J; Arroyo, E S; Vrankovich, G P; Christian, A T

    2005-09-20

    Many challenges arise when trying to amplify and analyze human samples collected in the field due to limitations in sample quantity, and contamination of the starting material. Tests such as DNA fingerprinting and mitochondrial typing require a certain sample size and are carried out in large volume reactions; in cases where insufficient sample is present whole genome amplification (WGA) can be used. WGA allows very small quantities of DNA to be amplified in a way that enables subsequent DNA-based tests to be performed. A limiting step to WGA is sample preparation. To minimize the necessary sample size, we have developed two modifications of WGA: the first allows for an increase in amplified product from small, nanoscale, purified samples with the use of carrier DNA while the second is a single-step method for cleaning and amplifying samples all in one column. Conventional DNA cleanup involves binding the DNA to silica, washing away impurities, and then releasing the DNA for subsequent testing. We have eliminated losses associated with incomplete sample release, thereby decreasing the required amount of starting template for DNA testing. Both techniques address the limitations of sample size by providing ample copies of genomic samples. Carrier DNA, included in our WGA reactions, can be used when amplifying samples with the standard purification method, or can be used in conjunction with our single-step DNA purification technique to potentially further decrease the amount of starting sample necessary for future forensic DNA-based assays.

  14. The Sequencing Bead Array (SBA), a Next-Generation Digital Suspension Array

    OpenAIRE

    Akhras, Michael S.; Pettersson, Erik; Diamond, Lisa; Unemo, Magnus; Okamoto, Jennifer; Davis, Ronald W.; Pourmand, Nader

    2013-01-01

    Here we describe the novel Sequencing Bead Array (SBA), a complete assay for molecular diagnostics and typing applications. SBA is a digital suspension array using Next-Generation Sequencing (NGS), to replace conventional optical readout platforms. The technology allows for reducing the number of instruments required in a laboratory setting, where the same NGS instrument could be employed from whole-genome and targeted sequencing to SBA broad-range biomarker detection and genotyping. As proof...

  15. SNP detection for massively parallel whole-genome resequencing.

    Science.gov (United States)

    Li, Ruiqiang; Li, Yingrui; Fang, Xiaodong; Yang, Huanming; Wang, Jian; Kristiansen, Karsten; Wang, Jun

    2009-06-01

    Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.

  16. Bridging the gap from prenatal karyotyping to whole-genome array comparative genomic hybridization in Hong Kong: survey on knowledge and acceptance of health-care providers and pregnant women.

    Science.gov (United States)

    Cheng, Hiu Yee Heidi; Kan, Anita Sik-Yau; Hui, Pui Wah; Lee, Chin Peng; Tang, Mary Hoi Yin

    2017-12-01

    The use of array comparative genomic hybridization (aCGH) has been increasingly widespread. The challenge of integration of this technology into prenatal diagnosis was the interpretation of results and communicating findings of unclear clinical significance. This study assesses the knowledge and acceptance of prenatal aCGH in Hong Kong obstetricians and pregnant women. The aim is to identify the needs and gaps before implementing the replacement of karyotyping with aCGH. Questionnaires with aCGH information in the form of pamphlets were sent by post to obstetrics and gynecology doctors. For the pregnant women group, a video presentation, pamphlets on aCGH and a self-administered questionnaire were provided at the antenatal clinic. The perception of aCGH between doctors and pregnant women was similar. Doctors not choosing aCGH were more concerned about the difficulty in counseling of variants of unknown significance and adult-onset disease in pregnant women, whereas pregnant women not choosing aCGH were more concerned about the increased waiting time leading to increased anxiety. Prenatal aCGH is perceived as a better test by both doctors and patients. Counseling support, training, and better understanding and communication of findings of unclear clinical significance are necessary to improve doctor-patient experience.

  17. A variant Cri du Chat phenotype and autism spectrum disorder in a subject with de novo cryptic microdeletions involving 5p15.2 and 3p24.3-25 detected using whole genomic array CGH.

    Science.gov (United States)

    Harvard, C; Malenfant, P; Koochek, M; Creighton, S; Mickelson, E C R; Holden, J J A; Lewis, M E S; Rajcan-Separovic, E

    2005-04-01

    manifestations of CdCs. The clinical description of this proband and the characterization of his 5p deletion may help to further refine the phenotype-genotype associations in CdCs and autism spectrum disorder.

  18. Exome genotyping arrays to identify rare and low frequency variants associated with epithelial ovarian cancer risk

    DEFF Research Database (Denmark)

    Permuth, Jennifer B; Pirie, Ailith; Ann Chen, Y

    2016-01-01

    Rare and low frequency variants are not well covered in most germline genotyping arrays and are understudied in relation to epithelial ovarian cancer (EOC) risk. To address this gap, we used genotyping arrays targeting rarer protein-coding variation in 8,165 EOC cases and 11,619 controls from the...

  19. Survey of the Applications of NGS to Whole-Genome Sequencing and Expression Profiling

    Directory of Open Access Journals (Sweden)

    Jong-Sung Lim

    2012-03-01

    Full Text Available Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the next-generation DNA sequencer (NGS Roche/454 and Illumina/Solexa systems, along with bioinformation analysis technologies of whole-genome de novo assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing de novo assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least 2× and 30× depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive short-length reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a whole-genome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through de novo assembly in any whole-genome sequenced species. The 20× and 50× coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average 30× coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.

  20. Sequence and analysis of a whole genome from Kuwaiti population subgroup of Persian ancestry.

    Science.gov (United States)

    Thareja, Gaurav; John, Sumi Elsa; Hebbar, Prashantha; Behbehani, Kazem; Thanaraj, Thangavel Alphonse; Alsmadi, Osama

    2015-02-18

    The 1000 Genome project paved the way for sequencing diverse human populations. New genome projects are being established to sequence underrepresented populations helping in understanding human genetic diversity. The Kuwait Genome Project an initiative to sequence individual genomes from the three subgroups of Kuwaiti population namely, Saudi Arabian tribe; "tent-dwelling" Bedouin; and Persian, attributing their ancestry to different regions in Arabian Peninsula and to modern-day Iran (West Asia). These subgroups were in line with settlement history and are confirmed by genetic studies. In this work, we report whole genome sequence of a Kuwaiti native from Persian subgroup at >37X coverage. We document 3,573,824 SNPs, 404,090 insertions/deletions, and 11,138 structural variations. Out of the reported SNPs and indels, 85,939 are novel. We identify 295 'loss-of-function' and 2,314 'deleterious' coding variants, some of which carry homozygous genotypes in the sequenced genome; the associated phenotypes include pharmacogenomic traits such as greater triglyceride lowering ability with fenofibrate treatment, and requirement of high warfarin dosage to elicit anticoagulation response. 6,328 non-coding SNPs associate with 811 phenotype traits: in congruence with medical history of the participant for Type 2 diabetes and β-Thalassemia, and of participant's family for migraine, 72 (of 159 known) Type 2 diabetes, 3 (of 4) β-Thalassemia, and 76 (of 169) migraine variants are seen in the genome. Intergenome comparisons based on shared disease-causing variants, positions the sequenced genome between Asian and European genomes in congruence with geographical location of the region. On comparison, bead arrays perform better than sequencing platforms in correctly calling genotypes in low-coverage sequenced genome regions however in the event of novel SNP or indel near genotype calling position can lead to false calls using bead arrays. We report, for the first time, reference

  1. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle

    NARCIS (Netherlands)

    Veerkamp, Roel F.; Bouwman, Aniek C.; Schrooten, Chris; Calus, Mario P.L.

    2016-01-01

    Background: Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a

  2. Whole genome sequence of a Turkish individual.

    Directory of Open Access Journals (Sweden)

    Haluk Dogan

    Full Text Available Although whole human genome sequencing can be done with readily available technical and financial resources, the need for detailed analyses of genomes of certain populations still exists. Here we present, for the first time, sequencing and analysis of a Turkish human genome. We have performed 35x coverage using paired-end sequencing, where over 95% of sequencing reads are mapped to the reference genome covering more than 99% of the bases. The assembly of unmapped reads rendered 11,654 contigs, 2,168 of which did not reveal any homology to known sequences, resulting in ∼1 Mbp of unmapped sequence. Single nucleotide polymorphism (SNP discovery resulted in 3,537,794 SNP calls with 29,184 SNPs identified in coding regions, where 106 were nonsense and 259 were categorized as having a high-impact effect. The homo/hetero zygosity (1,415,123∶2,122,671 or 1∶1.5 and transition/transversion ratios (2,383,204∶1,154,590 or 2.06∶1 were within expected limits. Of the identified SNPs, 480,396 were potentially novel with 2,925 in coding regions, including 48 nonsense and 95 high-impact SNPs. Functional analysis of novel high-impact SNPs revealed various interaction networks, notably involving hereditary and neurological disorders or diseases. Assembly results indicated 713,640 indels (1∶1.09 insertion/deletion ratio, ranging from -52 bp to 34 bp in length and causing about 180 codon insertion/deletions and 246 frame shifts. Using paired-end- and read-depth-based methods, we discovered 9,109 structural variants and compared our variant findings with other populations. Our results suggest that whole genome sequencing is a valuable tool for understanding variations in the human genome across different populations. Detailed analyses of genomes of diverse origins greatly benefits research in genetics and medicine and should be conducted on a larger scale.

  3. Isolation and enrichment of Cryptosporidium DNA and verification of DNA purity for whole-genome sequencing.

    Science.gov (United States)

    Guo, Yaqiong; Li, Na; Lysén, Colleen; Frace, Michael; Tang, Kevin; Sammons, Scott; Roellig, Dawn M; Feng, Yaoyu; Xiao, Lihua

    2015-02-01

    Whole-genome sequencing of Cryptosporidium spp. is hampered by difficulties in obtaining sufficient, highly pure genomic DNA from clinical specimens. In this study, we developed procedures for the isolation and enrichment of Cryptosporidium genomic DNA from fecal specimens and verification of DNA purity for whole-genome sequencing. The isolation and enrichment of genomic DNA were achieved by a combination of three oocyst purification steps and whole-genome amplification (WGA) of DNA from purified oocysts. Quantitative PCR (qPCR) analysis of WGA products was used as an initial quality assessment of amplified genomic DNA. The purity of WGA products was assessed by Sanger sequencing of cloned products. Next-generation sequencing tools were used in final evaluations of genome coverage and of the extent of contamination. Altogether, 24 fecal specimens of Cryptosporidium parvum, C. hominis, C. andersoni, C. ubiquitum, C. tyzzeri, and Cryptosporidium chipmunk genotype I were processed with the procedures. As expected, WGA products with low (sequences in Sanger sequencing. The cloning-sequencing analysis, however, showed significant contamination in 5 WGA products (proportion of positive colonies derived from Cryptosporidium genomic DNA, ≤25%). Following this strategy, 20 WGA products from six Cryptosporidium species or genotypes with low (mostly sequencing, generating sequence data covering 94.5% to 99.7% of Cryptosporidium genomes, with mostly minor contamination from bacterial, fungal, and host DNA. These results suggest that the described strategy can be used effectively for the isolation and enrichment of Cryptosporidium DNA from fecal specimens for whole-genome sequencing. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  4. [Progress on whole genome sequencing in woody plants].

    Science.gov (United States)

    Shi, Ji-Sen; Wang, Zhan-Jun; Chen, Jin-Hui

    2012-02-01

    In recent years, the number of sequencing data of plant whole genome have been increasing rapidly and the whole genome sequencing has been also performed widely in woody plants. However, there are a set of obstacles in investigating the whole genome sequencing in woody plants, which include larger genome, complex genome structure, limitations of assembly, annotation, functional analysis, and restriction of the funds for scientific research. Therefore, to promote the efficiency of the whole genome sequencing in woody plants, the development and defect of this field should be analyzed. The three-generation sequencing technologies (i.e., Sanger sequencing, synthesis sequencing, and single molecule sequencing) were compared in our studies. The progress mainly focused on the whole genome sequencing in four woody plants (Populus, Grapevine, Papaya, and Apple), and the application of sequencing results also was analyzed. The future of whole genome sequencing research in woody plants, consisting of material selection, establishment of genetic map and physical map, selection of sequencing technology, bioinformatic analysis, and application of sequencing results, was discussed.

  5. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  6. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak.

    Science.gov (United States)

    Gardy, Jennifer L; Johnston, James C; Ho Sui, Shannan J; Cook, Victoria J; Shah, Lena; Brodkin, Elizabeth; Rempel, Shirley; Moore, Richard; Zhao, Yongjun; Holt, Robert; Varhol, Richard; Birol, Inanc; Lem, Marcus; Sharma, Meenu K; Elwood, Kevin; Jones, Steven J M; Brinkman, Fiona S L; Brunham, Robert C; Tang, Patrick

    2011-02-24

    An outbreak of tuberculosis occurred over a 3-year period in a medium-size community in British Columbia, Canada. The results of mycobacterial interspersed repetitive unit-variable-number tandem-repeat (MIRU-VNTR) genotyping suggested the outbreak was clonal. Traditional contact tracing did not identify a source. We used whole-genome sequencing and social-network analysis in an effort to describe the outbreak dynamics at a higher resolution. We sequenced the complete genomes of 32 Mycobacterium tuberculosis outbreak isolates and 4 historical isolates (from the same region but sampled before the outbreak) with matching genotypes, using short-read sequencing. Epidemiologic and genomic data were overlaid on a social network constructed by means of interviews with patients to determine the origins and transmission dynamics of the outbreak. Whole-genome data revealed two genetically distinct lineages of M. tuberculosis with identical MIRU-VNTR genotypes, suggesting two concomitant outbreaks. Integration of social-network and phylogenetic analyses revealed several transmission events, including those involving "superspreaders." Both lineages descended from a common ancestor and had been detected in the community before the outbreak, suggesting a social, rather than genetic, trigger. Further epidemiologic investigation revealed that the onset of the outbreak coincided with a recorded increase in crack cocaine use in the community. Through integration of large-scale bacterial whole-genome sequencing and social-network analysis, we show that a socioenvironmental factor--most likely increased crack cocaine use--triggered the simultaneous expansion of two extant lineages of M. tuberculosis that was sustained by key members of a high-risk social network. Genotyping and contact tracing alone did not capture the true dynamics of the outbreak. (Funded by Genome British Columbia and others.).

  7. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

    Science.gov (United States)

    Li, Miaoxin; Li, Jiang; Li, Mulin Jun; Pan, Zhicheng; Hsu, Jacob Shujui; Liu, Dajiang J; Zhan, Xiaowei; Wang, Junwen; Song, Youqiang; Sham, Pak Chung

    2017-05-19

    Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Microarray-based whole-genome hybridization as a tool for determining procaryotic species relatedness

    Energy Technology Data Exchange (ETDEWEB)

    Wu, L.; Liu, X.; Fields, M.W.; Thompson, D.K.; Bagwell, C.E.; Tiedje, J. M.; Hazen, T.C.; Zhou, J.

    2008-01-15

    The definition and delineation of microbial species are of great importance and challenge due to the extent of evolution and diversity. Whole-genome DNA-DNA hybridization is the cornerstone for defining procaryotic species relatedness, but obtaining pairwise DNA-DNA reassociation values for a comprehensive phylogenetic analysis of procaryotes is tedious and time consuming. A previously described microarray format containing whole-genomic DNA (the community genome array or CGA) was rigorously evaluated as a high-throughput alternative to the traditional DNA-DNA reassociation approach for delineating procaryotic species relationships. DNA similarities for multiple bacterial strains obtained with the CGA-based hybridization were comparable to those obtained with various traditional whole-genome hybridization methods (r=0.87, P<0.01). Significant linear relationships were also observed between the CGA-based genome similarities and those derived from small subunit (SSU) rRNA gene sequences (r=0.79, P<0.0001), gyrB sequences (r=0.95, P<0.0001) or REP- and BOX-PCR fingerprinting profiles (r=0.82, P<0.0001). The CGA hybridization-revealed species relationships in several representative genera, including Pseudomonas, Azoarcus and Shewanella, were largely congruent with previous classifications based on various conventional whole-genome DNA-DNA reassociation, SSU rRNA and/or gyrB analyses. These results suggest that CGA-based DNA-DNA hybridization could serve as a powerful, high-throughput format for determining species relatedness among microorganisms.

  9. Whole genome analysis provides evidence for porcine-to-simian interspecies transmission of rotavirus-A.

    Science.gov (United States)

    Navarro, Ryan; Aung, Meiji Soe; Cruz, Katalina; Ketzis, Jennifer; Gallagher, Christa Ann; Beierschmitt, Amy; Malik, Yashpal Singh; Kobayashi, Nobumichi; Ghosh, Souvik

    2017-04-01

    We report here whole genome analysis of a porcine rotavirus-A (RVA) strain RVA/Pig-wt/KNA/ET8B/2015/G5P[13] detected in a diarrheic piglet, and nearly whole genome (except for VP4 gene) analysis of a simian RVA strain RVA/Simian-wt/KNA/08979/2015/G5P[X] detected in a non-diarrheic African green monkey (AGM) on the island of St. Kitts, Caribbean region. Strain ET8B exhibited a G5-P[13]-I5-R1-C1-M1-A8-N1-T7-E1-H1 genotype constellation that was identical to those of Brazilian porcine RVA G5P[13] strains RVA/Pig-wt/BRA/ROTA01/2013/G5P[13] and RVA/Pig-wt/BRA/ROTA07/2013/G5P[13], the only porcine G5P[13] RVAs that have been analyzed for the whole genome so far. Phylogenetically, all the 11 gene segments of ET8B were closely related to those of porcine and porcine-like human RVAs within the respective genotypes. Although the porcine G5P[13] RVAs exhibited identical genotype constellations, ET8B did not appear to share common evolutionary pathways with the Brazilian porcine G5P[13] RVAs. Interestingly, the VP2, VP3, VP6, VP7, and NSP1-NSP5 genes of simian RVA strain 08979 were closely related to those of porcine and porcine-like human RVA strains, exhibiting 99%-100% nucleotide sequence identities to cognate genes of co-circulating porcine RVA strain ET8B. On the other hand, the VP1 of 08979 appeared to be genetically divergent from porcine and human RVAs within the R1 genotype, and its exact origin could not be ascertained. Taken together, these observations suggested that simian strain 08979 might have been derived from interspecies transmission events involving transmission of ET8B-like RVAs from pigs to AGMs. In St. Kitts, AGMs often stray from the wild into livestock farms. Therefore, it may be possible that the AGM acquired the infection from a pig farm on the island. To our knowledge, this is the first report on detection of porcine-like RVAs in monkeys. Also, the present study is the first to report whole genomic analysis of a porcine RVA strain from the Caribbean

  10. Design And Performance Of 44,100 SNP Genotyping Array For Rice

    Science.gov (United States)

    To document genome-wide allelic variation within and between the different subpopulations of both O. sativa and O. rufipogon, we developed an Affymetrix custom genotyping array containing 44,100 SNPs well distributed across the 400Mb rice genome. The SNPs on this array were selected from the MBML-in...

  11. Whole genome sequence typing to investigate the Apophysomyces outbreak following a tornado in Joplin, Missouri, 2011.

    Science.gov (United States)

    Etienne, Kizee A; Gillece, John; Hilsabeck, Remy; Schupp, Jim M; Colman, Rebecca; Lockhart, Shawn R; Gade, Lalitha; Thompson, Elizabeth H; Sutton, Deanna A; Neblett-Fanfair, Robyn; Park, Benjamin J; Turabelidze, George; Keim, Paul; Brandt, Mary E; Deak, Eszter; Engelthaler, David M

    2012-01-01

    Case reports of Apophysomyces spp. in immunocompetent hosts have been a result of traumatic deep implantation of Apophysomyces spp. spore-contaminated soil or debris. On May 22, 2011 a tornado occurred in Joplin, MO, leaving 13 tornado victims with Apophysomyces trapeziformis infections as a result of lacerations from airborne material. We used whole genome sequence typing (WGST) for high-resolution phylogenetic SNP analysis of 17 outbreak Apophysomyces isolates and five additional temporally and spatially diverse Apophysomyces control isolates (three A. trapeziformis and two A. variabilis isolates). Whole genome SNP phylogenetic analysis revealed three clusters of genotypically related or identical A. trapeziformis isolates and multiple distinct isolates among the Joplin group; this indicated multiple genotypes from a single or multiple sources. Though no linkage between genotype and location of exposure was observed, WGST analysis determined that the Joplin isolates were more closely related to each other than to the control isolates, suggesting local population structure. Additionally, species delineation based on WGST demonstrated the need to reassess currently accepted taxonomic classifications of phylogenetic species within the genus Apophysomyces.

  12. Whole genome sequencing of Plasmodium falciparum from dried blood spots using selective whole genome amplification.

    Science.gov (United States)

    Oyola, Samuel O; Ariani, Cristina V; Hamilton, William L; Kekre, Mihir; Amenga-Etego, Lucas N; Ghansah, Anita; Rutledge, Gavin G; Redmond, Seth; Manske, Magnus; Jyothi, Dushyanth; Jacob, Chris G; Otto, Thomas D; Rockett, Kirk; Newbold, Chris I; Berriman, Matthew; Kwiatkowski, Dominic P

    2016-12-20

    Translating genomic technologies into healthcare applications for the malaria parasite Plasmodium falciparum has been limited by the technical and logistical difficulties of obtaining high quality clinical samples from the field. Sampling by dried blood spot (DBS) finger-pricks can be performed safely and efficiently with minimal resource and storage requirements compared with venous blood (VB). Here, the use of selective whole genome amplification (sWGA) to sequence the P. falciparum genome from clinical DBS samples was evaluated, and the results compared with current methods that use leucodepleted VB. Parasite DNA with high (>95%) human DNA contamination was selectively amplified by Phi29 polymerase using short oligonucleotide probes of 8-12 mers as primers. These primers were selected on the basis of their differential frequency of binding the desired (P. falciparum DNA) and contaminating (human) genomes. Using sWGA method, clinical samples from 156 malaria patients, including 120 paired samples for head-to-head comparison of DBS and leucodepleted VB were sequenced. Greater than 18-fold enrichment of P. falciparum DNA was achieved from DBS extracts. The parasitaemia threshold to achieve >5× coverage for 50% of the genome was 0.03% (40 parasites per 200 white blood cells). Over 99% SNP concordance between VB and DBS samples was achieved after excluding missing calls. The sWGA methods described here provide a reliable and scalable way of generating P. falciparum genome sequence data from DBS samples. The current data indicate that it will be possible to get good quality sequence on most if not all drug resistance loci from the majority of symptomatic malaria patients. This technique overcomes a major limiting factor in P. falciparum genome sequencing from field samples, and paves the way for large-scale epidemiological applications.

  13. Whole-genome single-nucleotide polymorphism (SNP marker discovery and association analysis with the eicosapentaenoic acid (EPA and docosahexaenoic acid (DHA content in Larimichthys crocea

    Directory of Open Access Journals (Sweden)

    Shijun Xiao

    2016-12-01

    Full Text Available Whole-genome single-nucleotide polymorphism (SNP markers are valuable genetic resources for the association and conservation studies. Genome-wide SNP development in many teleost species are still challenging because of the genome complexity and the cost of re-sequencing. Genotyping-By-Sequencing (GBS provided an efficient reduced representative method to squeeze cost for SNP detection; however, most of recent GBS applications were reported on plant organisms. In this work, we used an EcoRI-NlaIII based GBS protocol to teleost large yellow croaker, an important commercial fish in China and East-Asia, and reported the first whole-genome SNP development for the species. 69,845 high quality SNP markers that evenly distributed along genome were detected in at least 80% of 500 individuals. Nearly 95% randomly selected genotypes were successfully validated by Sequenom MassARRAY assay. The association studies with the muscle eicosapentaenoic acid (EPA and docosahexaenoic acid (DHA content discovered 39 significant SNP markers, contributing as high up to ∼63% genetic variance that explained by all markers. Functional genes that involved in fat digestion and absorption pathway were identified, such as APOB, CRAT and OSBPL10. Notably, PPT2 Gene, previously identified in the association study of the plasma n-3 and n-6 polyunsaturated fatty acid level in human, was re-discovered in large yellow croaker. Our study verified that EcoRI-NlaIII based GBS could produce quality SNP markers in a cost-efficient manner in teleost genome. The developed SNP markers and the EPA and DHA associated SNP loci provided invaluable resources for the population structure, conservation genetics and genomic selection of large yellow croaker and other fish organisms.

  14. Whole genome amplification: Use of advanced isothermal method

    African Journals Online (AJOL)

    Yomi

    2010-12-29

    Dec 29, 2010 ... Whole genome amplification: Use of advanced isothermal method. Sima Moghaddaszadeh Ahrabi1, Safar Farajnia2,3*, Ghodratollah Rahimi-Mianji4, Soheila. Montazer Saheb3 ... Moreover, application of high fidelity and high possessive DNA ..... between I-PEP with MDA by using serial dilutions of.

  15. Whole genome amplification: Use of advanced isothermal method ...

    African Journals Online (AJOL)

    Laboratory method for amplifying genomic deoxyribonucleic acid (DNA) samples aiming to generate more amounts and sufficient quantity DNA for subsequent specific analysis is named whole genome amplification (WGA). This method is only way to increase input material from few cells and limited DNA contents.

  16. Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny

    NARCIS (Netherlands)

    Herniou, E.A.; Luque, T.; Chen, X.; Vlak, J.M.; Winstanley, D.; Cory, J.S.; O'Reilly, D.R.

    2001-01-01

    Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of

  17. Comparative Copy Number Variation From Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2011-01-01

    Whole genome sequencing enables a high resolution view of the humangenome and enables unique insights into copy number variations in anunprecedented scale. Numerous tools and studies have already been introduced that provide confirmatory and new genomic variability datain individuals and across

  18. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.

    2014-08-14

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  19. Whole genome homology-based identification of candidate genes ...

    African Journals Online (AJOL)

    ... and might play some important roles in drought tolerance in sesame. Our results provided genomic resources for further functional analysis and genetic engineering towards drought tolerance improvement in sesame. Keywords: Sesamum indicum, candidate genes, drought tolerance, orthologous gene, whole genome ...

  20. Clinical interpretation and implications of whole-genome sequencing.

    Science.gov (United States)

    Dewey, Frederick E; Grove, Megan E; Pan, Cuiping; Goldstein, Benjamin A; Bernstein, Jonathan A; Chaib, Hassan; Merker, Jason D; Goldfeder, Rachel L; Enns, Gregory M; David, Sean P; Pakdaman, Neda; Ormond, Kelly E; Caleshu, Colleen; Kingham, Kerry; Klein, Teri E; Whirl-Carrillo, Michelle; Sakamoto, Kenneth; Wheeler, Matthew T; Butte, Atul J; Ford, James M; Boxer, Linda; Ioannidis, John P A; Yeung, Alan C; Altman, Russ B; Assimes, Themistocles L; Snyder, Michael; Ashley, Euan A; Quertermous, Thomas

    2014-03-12

    Whole-genome sequencing (WGS) is increasingly applied in clinical medicine and is expected to uncover clinically significant findings regardless of sequencing indication. To examine coverage and concordance of clinically relevant genetic variation provided by WGS technologies; to quantitate inherited disease risk and pharmacogenomic findings in WGS data and resources required for their discovery and interpretation; and to evaluate clinical action prompted by WGS findings. An exploratory study of 12 adult participants recruited at Stanford University Medical Center who underwent WGS between November 2011 and March 2012. A multidisciplinary team reviewed all potentially reportable genetic findings. Five physicians proposed initial clinical follow-up based on the genetic findings. Genome coverage and sequencing platform concordance in different categories of genetic disease risk, person-hours spent curating candidate disease-risk variants, interpretation agreement between trained curators and disease genetics databases, burden of inherited disease risk and pharmacogenomic findings, and burden and interrater agreement of proposed clinical follow-up. Depending on sequencing platform, 10% to 19% of inherited disease genes were not covered to accepted standards for single nucleotide variant discovery. Genotype concordance was high for previously described single nucleotide genetic variants (99%-100%) but low for small insertion/deletion variants (53%-59%). Curation of 90 to 127 genetic variants in each participant required a median of 54 minutes (range, 5-223 minutes) per genetic variant, resulted in moderate classification agreement between professionals (Gross κ, 0.52; 95% CI, 0.40-0.64), and reclassified 69% of genetic variants cataloged as disease causing in mutation databases to variants of uncertain or lesser significance. Two to 6 personal disease-risk findings were discovered in each participant, including 1 frameshift deletion in the BRCA1 gene implicated in

  1. Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data.

    Science.gov (United States)

    Dewey, Frederick E; Grove, Megan E; Priest, James R; Waggott, Daryl; Batra, Prag; Miller, Clint L; Wheeler, Matthew; Zia, Amin; Pan, Cuiping; Karzcewski, Konrad J; Miyake, Christina; Whirl-Carrillo, Michelle; Klein, Teri E; Datta, Somalee; Altman, Russ B; Snyder, Michael; Quertermous, Thomas; Ashley, Euan A

    2015-10-01

    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

  2. Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data

    OpenAIRE

    Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N.; Romm, Jane M.; Doheny, Kimberly F.; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min

    2012-01-01

    DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads al...

  3. A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter

    Directory of Open Access Journals (Sweden)

    Samuel K. Sheppard

    2012-04-01

    Full Text Available Campylobacteriosis remains a major human public health problem world-wide. Genetic analyses of Campylobacter isolates, and particularly molecular epidemiology, have been central to the study of this disease, particularly the characterization of Campylobacter genotypes isolated from human infection, farm animals, and retail food. These studies have demonstrated that Campylobacter populations are highly structured, with distinct genotypes associated with particular wild or domestic animal sources, and that chicken meat is the most likely source of most human infection in countries such as the UK. The availability of multiple whole genome sequences from Campylobacter isolates presents the prospect of identifying those genes or allelic variants responsible for host-association and increased human disease risk, but the diversity of Campylobacter genomes present challenges for such analyses. We present a gene-by-gene approach for investigating the genetic basis of phenotypes in diverse bacteria such as Campylobacter, implemented with the BIGSdb software on the pubMLST.org/campylobacter website.

  4. Whole genome sequencing as the ultimate tool to diagnose tuberculosis.

    Science.gov (United States)

    van Soolingen, Dick; Jajou, Rana; Mulder, Arnout; de Neeling, Han

    2016-12-01

    In the past two decades, DNA techniques have been increasingly used in the laboratory diagnosis of tuberculosis (TB). The (sub) species of the Mycobacterium tuberculosis complex are usually identified using reverse line blot techniques. The resistance is predicted by the detection of mutations in genes associated with resistance. Nevertheless, all cases are still subjected to cumbersome phenotypic resistance testing. The production of a strain-characteristic DNA fingerprint, to investigate the epidemiology of TB, is done by the 24-locus variable number tandem repeat (VNTR) typing. However, most of the molecular techniques in the diagnosis of TB can eventually be replaced by whole genome sequencing (WGS). Many international TB reference laboratories are currently working on the introduction of WGS; however, standardization in the international context is lacking. The European Centre for Infectious Disease Prevention and Control in Stockholm, Sweden organizes a yearly round of quality control on VNTR typing and in 2015 for the first time also WGS. In this first proficiency study, only three out of eight international TB laboratories produced WGS results in line with those of the reference laboratory. The whole process of DNA isolation, purification, quantification, sequencing, and analysis/interpretation of data is still under development. In this presentation, many aspects will be covered that influence the quality and interpretation of WGS results. The turn-around-time, analysis, and utility of WGS will be discussed. Moreover, the experiences in the use of WGS in the molecular epidemiology of TB in The Netherlands are detailed. It can be concluded that many difficulties still have to be conquered. The state of the art is that bacteria still have to be cultured to have sufficient quality and quantity of DNA for succesful WGS. The quality of sequencing has improved significantly over the past 7years, and the detection of mutations has, therefore, become more reliable

  5. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  6. Cgaln: fast and space-efficient whole-genome alignment

    Directory of Open Access Journals (Sweden)

    Nakato Ryuichiro

    2010-04-01

    Full Text Available Abstract Background Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer. Results We previously proposed the CGAT (Coarse-Grained AlignmenT algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory. Conclusions Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and

  7. Cgaln: fast and space-efficient whole-genome alignment.

    Science.gov (United States)

    Nakato, Ryuichiro; Gotoh, Osamu

    2010-04-30

    Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer. We previously proposed the CGAT (Coarse-Grained AlignmenT) algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory. Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and will contribute to various fields of genome science.

  8. Alignathon: a competitive assessment of whole-genome alignment methods

    OpenAIRE

    Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Harris, Robert S.; Fitzgerald, Stephen; Beal, Kathryn; Seledtsov, Igor; Molodtsov, Vladimir; Raney, Brian J.; Clawson, Hiram; Kim, Jaebum; Kemena, Carsten; Chang, Jia-Ming; Erb, Ionas; Poliakov, Alexander

    2014-01-01

    Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assess...

  9. Priors in whole-genome regression: the bayesian alphabet returns.

    Science.gov (United States)

    Gianola, Daniel

    2013-07-01

    Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term "Bayesian alphabet" denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless n ≫ p. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters ("tuning knobs") are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.

  10. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Directory of Open Access Journals (Sweden)

    David Koslicki

    Full Text Available With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  11. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Science.gov (United States)

    Koslicki, David; Foucart, Simon; Rosen, Gail

    2014-01-01

    With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  12. Whole-genome shotgun optical mapping of Rhodospirillum rubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, S. [Univ. Wisc.-Madison; Zhou, S. [Univ. Wisc.-Madison; Place, M. [Univ. Wisc.-Madison; Zhang, Y. [Univ. Wisc.-Madison; Briska, A. [Univ. Wisc.-Madison; Goldstein, S. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Forrest, D. [Univ. Wisc.-Madison; Lim, A. [Univ. Wisc.-Madison; Lapidus, A. [Univ. Wisc.-Madison; Han, C. S. [Univ. Wisc.-Madison; Roberts, G. P. [Univ. Wisc.-Madison; Schwartz, D. C. [Univ. Wisc.-Madison

    2005-09-01

    Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.

  13. Whole-genome shotgun optical mapping of rhodospirillumrubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, Susan; Zhou, Shiguo; Place, Mike; Zhang, Yaoping; Briska, Adam; Goldstein, Steve; Churas, Chris; Runnheim, Rod; Forrest,Dan; Lim, Alex; Lapidus, Alla; Han, Cliff S.; Roberts, Gary P.; Schwartz,David C.

    2004-07-01

    Rhodospirillum rubrum is a phototrophic purple non-sulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems, and as a source of hydrogen and biodegradable plastics production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction maps (Xba I, Nhe I, and Hind III) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction maps from randomly sheared genomic DNA molecules extracted directly from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the Hind III map acted as a scaffold for high resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and validation of genome sequence, our work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a ''molecular cytogenetics'' approach to solving problems in genomic analysis.

  14. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    Energy Technology Data Exchange (ETDEWEB)

    Fröhlich, Eleonore, E-mail: eleonore.froehlich@medunigraz.at [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Meindl, Claudia; Wagner, Karin [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Leitinger, Gerd [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Institute for Cell Biology, Histology and Embryology, Medical University of Graz, Harrachgasse 21, 8010 Graz (Austria); Roblegg, Eva [Institute of Pharmaceutical Sciences, Department of Pharmaceutical Technology, Karl-Franzens-University of Graz, Universitätsplatz 1, 8010 Graz (Austria)

    2014-10-15

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays for NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. - Highlights: • Regulated functions were screened using whole genome expression assays. • Polystyrene particles regulated more genes than short carbon nanotubes. • Protein coating of polystyrene particles did not change regulation pattern. • Functions regulated by microarray were confirmed by cell-based assay.

  15. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data.

    Science.gov (United States)

    Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N; Romm, Jane M; Doheny, Kimberly F; Abecasis, Gonçalo R; Boehnke, Michael; Kang, Hyun Min

    2012-11-02

    DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies. Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  16. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array

    DEFF Research Database (Denmark)

    Eeles, Rosalind A; Olama, Ali Amin Al; Benlloch, Sara

    2013-01-01

    Prostate cancer is the most frequently diagnosed cancer in males in developed countries. To identify common prostate cancer susceptibility alleles, we genotyped 211,155 SNPs on a custom Illumina array (iCOGS) in blood DNA from 25,074 prostate cancer cases and 24,272 controls from the internationa...

  17. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence.

    Science.gov (United States)

    Dewey, Frederick E; Chen, Rong; Cordero, Sergio P; Ormond, Kelly E; Caleshu, Colleen; Karczewski, Konrad J; Whirl-Carrillo, Michelle; Wheeler, Matthew T; Dudley, Joel T; Byrnes, Jake K; Cornejo, Omar E; Knowles, Joshua W; Woon, Mark; Sangkuhl, Katrin; Gong, Li; Thorn, Caroline F; Hebert, Joan M; Capriotti, Emidio; David, Sean P; Pavlovic, Aleksandra; West, Anne; Thakuria, Joseph V; Ball, Madeleine P; Zaranek, Alexander W; Rehm, Heidi L; Church, George M; West, John S; Bustamante, Carlos D; Snyder, Michael; Altman, Russ B; Klein, Teri E; Butte, Atul J; Ashley, Euan A

    2011-09-01

    Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.

  18. A high-density SNP genotyping array for rice biology and molecular breeding.

    Science.gov (United States)

    Chen, Haodong; Xie, Weibo; He, Hang; Yu, Huihui; Chen, Wei; Li, Jing; Yu, Renbo; Yao, Yue; Zhang, Wenhui; He, Yuqing; Tang, Xiaoyan; Zhou, Fasong; Deng, Xing Wang; Zhang, Qifa

    2014-03-01

    A high-density single nucleotide polymorphism (SNP) array is critically important for geneticists and molecular breeders. With the accumulation of huge amounts of genomic re-sequencing data and available technologies for accurate SNP detection, it is possible to design high-density and high-quality rice SNP arrays. Here we report the development of a high-density rice SNP array and its utility. SNP probes were designed by screening more than 10 000 000 SNP loci extracted from the re-sequencing data of 801 rice varieties and an array named RiceSNP50 was produced on the Illumina Infinium platform. The array contained 51 478 evenly distributed markers, 68% of which were within genic regions. Several hundred rice plants with parent/F1 relationships were used to generate a high-quality cluster file for accurate SNP calling. Application tests showed that this array had high genotyping accuracy, and could be used for different objectives. For example, a core collection of elite rice varieties was clustered with fine resolution. Genome-wide association studies (GWAS) analysis correctly identified a characterized QTL. Further, this array was successfully used for variety verification and trait introgression. As an accurate high-throughput genotyping tool, RiceSNP50 will play an important role in both functional genomics studies and molecular breeding.

  19. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits

    DEFF Research Database (Denmark)

    Voight, Benjamin F; Kang, Hyun Min; Ding, Jun

    2012-01-01

    Genome-wide association studies have identified hundreds of loci for type 2 diabetes, coronary artery disease and myocardial infarction, as well as for related traits such as body mass index, glucose and insulin levels, lipid levels, and blood pressure. These studies also have pointed to thousands...... of loci with promising but not yet compelling association evidence. To establish association at additional loci and to characterize the genome-wide significant loci by fine-mapping, we designed the "Metabochip," a custom genotyping array that assays nearly 200,000 SNP markers. Here, we describe...... dramatic cost efficiencies compared to designing single-trait follow-up reagents, and provides the opportunity to compare results across a range of related traits. The metabochip and similar custom genotyping arrays offer a powerful and cost-effective approach to follow-up large-scale genotyping...

  20. The use of mycobacterial interspersed repetitive unit typing and whole genome sequencing to inform tuberculosis prevention and control activities.

    Science.gov (United States)

    Gilbert, Gwendolyn L; Sintchenko, Vitali

    2013-07-01

    Molecular strain typing of Mycobacterium tuberculosis has been possible for only about 20 years; it has significantly improved our understanding of the evolution and epidemiology of Mycobacterium tuberculosis and tuberculosis disease. Mycobacterial interspersed repetitive unit typing, based on 24 variable number tandem repeat unit loci, is highly discriminatory, relatively easy to perform and interpret and is currently the most widely used molecular typing system for tuberculosis surveillance. Nevertheless, clusters identified by mycobacterial interspersed repetitive unit typing sometimes cannot be confirmed or adequately defined by contact tracing and additional methods are needed. Recently, whole genome sequencing has been used to identify single nucleotide polymorphisms and other mutations, between genotypically indistinguishable isolates from the same cluster, to more accurately trace transmission pathways. Rapidly increasing speed and quality and reduced costs will soon make large scale whole genome sequencing feasible, combined with the use of sophisticated bioinformatics tools, for epidemiological surveillance of tuberculosis.

  1. Linkage and whole genome sequencing identify a locus on 6q25-26 for formal thought disorder and implicate MEF2A regulation

    DEFF Research Database (Denmark)

    Thygesen, Johan Hilge; Zambach, Sine Katharina; Ingason, Andrés

    2015-01-01

    analysis of the implicated region using phased microsatellite and SNP genotypes. Whole genome sequencing (N=3) was used in the attempt to identify causative variants in the linkage region. Linkage analysis of formal thought disorder resulted in a single peak at chromosome 6(q26-q27) centred on marker D6S...... disorder index score (P=4.9 × 10(-5)) and qualitatively severe forms of thought disturbances. Whole genome sequencing identified a novel nucleotide deletion (chr6:164377205 AG>A, hg18) predicted to disrupt the potential binding of the transcription factor MEF2A. The MEF2A binding site is located between...

  2. Dried reagents for multiplex genotyping by tag-array minisequencing to be used in microfluidic devices

    DEFF Research Database (Denmark)

    Ahlford, Annika; Kjeldsen, Bastian; Reimers, Jakob

    2010-01-01

    We present an optimized procedure for freeze-drying and storing reagents for multiplex PCR followed by genotyping using a tag-array minisequencing assay with four color fluorescence detection which is suitable for microfluidic assay formats. A test panel was established for five cancer mutations...... phosphatase were estimated to 55 and 200 days, respectively. We conducted a systematic genotyping comparison using freeze-dried and liquid reagents. The accuracy of successful genotyping was 99.1% using freeze-dried reagents compared to liquid reagents. As a proof of concept, the genotyping protocol...... was carried out with freeze-dried reagents stored in reaction chambers fabricated by micromilling in a cyclic olefin copolymer substrate. The results reported in this study are a key step towards the development of an integrated microfluidic device for point-of-care DNA-based diagnostics....

  3. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Science.gov (United States)

    Alkan, Can; Ventura, Mario; Archidiacono, Nicoletta; Rocchi, Mariano; Sahinalp, S Cenk; Eichler, Evan E

    2007-09-01

    The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  4. Whole genome microarray analysis, from neonatal blood cards

    Directory of Open Access Journals (Sweden)

    Hogan Michael E

    2009-07-01

    Full Text Available Abstract Background Neonatal blood, obtained from a heel stick and stored dry on paper cards, has been the standard for birth defects screening for 50 years. Such dried blood samples are used, primarily, for analysis of small-molecule analytes. More recently, the DNA complement of such dried blood cards has been used for targeted genetic testing, such as for single nucleotide polymorphism in cystic fibrosis. Expansion of such testing to include polygenic traits, and perhaps whole genome scanning, has been discussed as a formal possibility. However, until now the amount of DNA that might be obtained from such dried blood cards has been limiting, due to inefficient DNA recovery technology. Results A new technology is employed for efficient DNA release from a standard neonatal blood card. Using standard Guthrie cards, stored an average of ten years post-collection, about 1/40th of the air-dried neonatal blood specimen (two 3 mm punches was processed to obtain DNA that was sufficient in mass and quality for direct use in microarray-based whole genome scanning. Using that same DNA release technology, it is also shown that approximately 1/250th of the original purified DNA (about 1 ng could be subjected to whole genome amplification, thus yielding an additional microgram of amplified DNA product. That amplified DNA product was then used in microarray analysis and yielded statistical concordance of 99% or greater to the primary, unamplified DNA sample. Conclusion Together, these data suggest that DNA obtained from less than 10% of a standard neonatal blood specimen, stored dry for several years on a Guthrie card, can support a program of genome-wide neonatal genetic testing.

  5. Comparison of genotyping using pooled DNA samples (allelotyping) and individual genotyping using the affymetrix genome-wide human SNP array 6.0

    NARCIS (Netherlands)

    A. Teumer (Alexander); F.D.J. Ernst (Florian); M.A. Wiechert; K. Uhr (Katharina); M.A. Nauck (Matthias); A. Petersmann (Astrid); H. Völzke (Henry); U. Völker (Uwe); G. Homuth (Georg)

    2013-01-01

    textabstractBackground: Genome-wide association studies (GWAS) using array-based genotyping technology are widely used to identify genetic loci associated with complex diseases or other phenotypes. The costs of GWAS projects based on individual genotyping are still comparatively high and increase

  6. Whole-Genome Sequencing of a Canine Family Trio Reveals a FAM83G Variant Associated with Hereditary Footpad Hyperkeratosis

    Directory of Open Access Journals (Sweden)

    Shumaila Sayyab

    2016-03-01

    Full Text Available Over 250 Mendelian traits and disorders, caused by rare alleles have been mapped in the canine genome. Although each disease is rare in the dog as a species, they are collectively common and have major impact on canine health. With SNP-based genotyping arrays, genome-wide association studies (GWAS have proven to be a powerful method to map the genomic region of interest when 10–20 cases and 10–20 controls are available. However, to identify the genetic variant in associated regions, fine-mapping and targeted resequencing is required. Here we present a new approach using whole-genome sequencing (WGS of a family trio without prior GWAS. As a proof-of-concept, we chose an autosomal recessive disease known as hereditary footpad hyperkeratosis (HFH in Kromfohrländer dogs. To our knowledge, this is the first time this family trio WGS-approach has been used successfully to identify a genetic variant that perfectly segregates with a canine disorder. The sequencing of three Kromfohrländer dogs from a family trio (an affected offspring and both its healthy parents resulted in an average genome coverage of 9.2X per individual. After applying stringent filtering criteria for candidate causative coding variants, 527 single nucleotide variants (SNVs and 15 indels were found to be homozygous in the affected offspring and heterozygous in the parents. Using the computer software packages ANNOVAR and SIFT to functionally annotate coding sequence differences, and to predict their functional effect, resulted in seven candidate variants located in six different genes. Of these, only FAM83G:c155G > C (p.R52P was found to be concordant in eight additional cases, and 16 healthy Kromfohrländer dogs.

  7. Sequence variants from whole genome sequencing a large group of Icelanders.

    Science.gov (United States)

    Gudbjartsson, Daniel F; Sulem, Patrick; Helgason, Hannes; Gylfason, Arnaldur; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Kong, Augustine; Helgason, Agnar; Masson, Gisli; Magnusson, Olafur Th; Thorsteinsdottir, Unnur; Stefansson, Kari

    2015-01-01

    We have accumulated considerable data on the genetic makeup of the Icelandic population by sequencing the whole genomes of 2,636 Icelanders to depth of at least 10X and by chip genotyping 101,584 more. The sequencing was done with Illumina technology. The median sequencing depth was 20X and 909 individuals were sequenced to a depth of at least 30X. We found 20 million single nucleotide polymorphisms (SNPs) and 1.5 million insertions/deletions (indels) that passed stringent quality control. Almost all the common SNPs (derived allele frequency (DAF) over 2%) that we identified in Iceland have been observed by either dbSNP (build 137) or the Exome Sequencing Project (ESP) while only 60 and 20% of rare (DAFgenome, have been observed in the public databases. Features of our variant data, such as the transition/transversion ratio and the length distribution of indels, are similar to published reports.

  8. Multiplex SNP analysis on whole genome amplified DNA from archived dried bloodspots, a validation study

    DEFF Research Database (Denmark)

    Tvedegaard, Kristine C.; Parner, Erik; Hooper, Craig W.

    on whole genome amplified (WGA) DNA from archived dried bloodspots. METHODS AND MATERIAL: The chemically synthesized new base pair (isoC and isoG) allow for the creation of new DNA strands that provide superior specificity and allow development of assays with greater sensitivity than conventional methods...... is a further development of allele specific primer extension (ASPE) for multiplex SNP analysis based on the Luminex 100 IS platform. It uses isobases (isoC and isoG) and the software MultiCode-PLx platform for data analysis and data handling. We validate the EraGen multicode system in two 6-plex assays used....... To validate the method 900 WGA DNA samples were genotyped in duplets. 10-20 % of all samples were sequenced to be used as reference. Accuracy and repeatability was estimated for each SNP. Robustness was estimated as rate of conclusive outcomes for each SNP. RESULTS: The accuracy ranged from 98...

  9. Tracing Mycobacterium tuberculosis transmission by whole genome sequencing in a high incidence setting

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Soborg, B; Koch, A

    2016-01-01

    In East Greenland, a dramatic increase of tuberculosis (TB) incidence has been observed in recent years. Classical genotyping suggests a genetically similar Mycobacterium tuberculosis (Mtb) strain population as cause, however, precise transmission patterns are unclear. We performed whole genome...... sequencing (WGS) of Mtb isolates from 98% of culture-positive TB cases through 21 years (n = 182) which revealed four genomic clusters of the Euro-American lineage (mainly sub-lineage 4.8 (n = 134)). The time to the most recent common ancestor of lineage 4.8 strains was found to be 100 years. This sub...... and the uniformity of circulating Mtb strains indicated that the majority of East Greenlandic TB cases originated from one or few strains introduced within the last century. Thereby, the study shows the consequences of even short interruptions in TB control efforts in previously TB high incidence areas...

  10. Whole-Genome Sequencing: Automated, Indexed Library Preparation.

    Science.gov (United States)

    Mardis, Elaine; McCombie, W Richard

    2017-03-01

    This protocol describes an automated procedure for constructing an indexed Illumina DNA library. With this method, genomic DNA fragments are produced by sonication, using high-frequency acoustic energy to shear DNA. Double-stranded DNA (dsDNA) will fragment when exposed to the energy of adaptive focused acoustic shearing (AFA). The resulting DNA fragments are ligated to adaptors, amplified by polymer chain reaction (PCR), and subjected to size selection using magnetic beads. The product is suitable for use as template in whole-genome sequencing. © 2017 Cold Spring Harbor Laboratory Press.

  11. Genomic prediction using QTL derived from whole genome sequence data

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    This study investigated the gain in accuracy of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k SNP data. Analyses were performed for Nordic Holstein and Danish Jersey animals, using either...... a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model, results showed increases in accuracy of up to two percentage points for production traits in both Holstein and Jersey animals by including the extra variants in the analysis, and an extra 1.5 percentage points...

  12. Estimating telomere length from whole genome sequence data.

    Science.gov (United States)

    Ding, Zhihao; Mangino, Massimo; Aviv, Abraham; Spector, Tim; Durbin, Richard

    2014-05-01

    Telomeres play a key role in replicative ageing and undergo age-dependent attrition in vivo. Here, we report a novel method, TelSeq, to measure average telomere length from whole genome or exome shotgun sequence data. In 260 leukocyte samples, we show that TelSeq results correlate with Southern blot measurements of the mean length of terminal restriction fragments (mTRFs) and display age-dependent attrition comparably well as mTRFs. © The Author(s) 2014. Published by Oxford University Press [on behalf of insert name of society].

  13. Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    William P. Gilks

    2016-11-01

    Full Text Available As part of a study into the molecular genetics of sexually dimorphic complex traits, we used next-generation sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6, and a unique haplotype from the outbred base population (LHM. The use of a static and known genetic background enabled us to obtain sequences from whole genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502. We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp. Additionally we detected and genotyped 167 large structural variants (1-100Kb in size using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591. We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/.

  14. Whole genome sequence-based serogrouping of Listeria monocytogenes isolates.

    Science.gov (United States)

    Hyden, Patrick; Pietzka, Ariane; Lennkh, Anna; Murer, Andrea; Springer, Burkhard; Blaschitz, Marion; Indra, Alexander; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner; Sensen, Christoph W

    2016-10-10

    Whole genome sequencing (WGS) is currently becoming the method of choice for characterization of Listeria monocytogenes isolates in national reference laboratories (NRLs). WGS is superior with regards to accuracy, resolution and analysis speed in comparison to several other methods including serotyping, PCR, pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable number tandem repeat analysis (MLVA), and multivirulence-locus sequence typing (MVLST), which have been used thus far for the characterization of bacterial isolates (and are still important tools in reference laboratories today) to control and prevent listeriosis, one of the major sources of foodborne diseases for humans. Backward compatibility of WGS to former methods can be maintained by extraction of the respective information from WGS data. Serotyping was the first subtyping method for L. monocytogenes capable of differentiating 12 serovars and national reference laboratories still perform serotyping and PCR-based serogrouping as a first level classification method for Listeria monocytogenes surveillance. Whole genome sequence based core genome MLST analysis of a L. monocytogenes collection comprising 172 isolates spanning all 12 serotypes was performed for serogroup determination. These isolates clustered according to their serotypes and it was possible to group them either into the IIa, IIc, IVb or IIb clusters, respectively, which were generated by minimum spanning tree (MST) and neighbor joining (NJ) tree data analysis, demonstrating the power of the new approach. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  15. Ultrahigh-density linkage map for cultivated cucumber (Cucumis sativus L. using a single-nucleotide polymorphism genotyping array.

    Directory of Open Access Journals (Sweden)

    Mor Rubinstein

    Full Text Available Genotyping arrays are tools for high-throughput genotyping, which is beneficial in constructing saturated genetic maps and therefore high-resolution mapping of complex traits. Since the report of the first cucumber genome draft, genetic maps have been constructed mainly based on simple-sequence repeats (SSRs or on combinations of SSRs and sequence-related amplified polymorphism (SRAP. In this study, we developed the first cucumber genotyping array consisting of 32,864 single-nucleotide polymorphisms (SNPs. These markers cover the cucumber genome with a median interval of ~2 Kb and have expected genotype calls in parents/F1 hybridizations as a training set. The training set was validated with Fluidigm technology and showed 96% concordance with the genotype calls in the parents/F1 hybridizations. Application of the genotyping array was illustrated by constructing a 598.7 cM genetic map based on a '9930' × 'Gy14' recombinant inbred line (RIL population comprised of 11,156 SNPs. Marker collinearity between the genetic map and reference genomes of the two parents was estimated at R2 = 0.97. We also used the array-derived genetic map to investigate chromosomal rearrangements, regional recombination rate, and specific regions with segregation distortions. Finally, 82% of the linkage-map bins were polymorphic in other cucumber variants, suggesting that the array can be applied for genotyping in other lines. The genotyping array presented here, together with the genotype calls of the parents/F1 hybridizations as a training set, should be a powerful tool in future studies with high-throughput cucumber genotyping. An ultrahigh-density linkage map constructed by this genotyping array on RIL population may be invaluable for assembly improvement, and for mapping important cucumber QTLs.

  16. Arrayed primer extension in the "array of arrays" format: a rational approach for microarray-based SNP genotyping

    DEFF Research Database (Denmark)

    Klitø, Niels G F; Tan, Qihua; Nyegaard, Mette

    2007-01-01

    . Linkage disequilibrium (LD) results from the experimental data are used in a novel comparison to baseline data defined by the international HapMap SNP database. Comparison on the LD results reveals a strong linear correlation irrespective of LD measure considered: R2 (D') = 0.73 and R2(r2) = 0.......54. In conclusion, our results show that this setup is strong enough to support high-throughput genotyping, and these observations support that the HapMap genotype resource is important for defining SNP panels aimed at gene mapping in local subpopulations from Europe....

  17. Concept and design of a genome-wide association genotyping array tailored for transplantation-specific studies

    DEFF Research Database (Denmark)

    Li, Yun R.; van Setten, Jessica; Verma, Shefali S.

    2015-01-01

    compared to reference samples and to other genome-wide genotyping platforms. Conclusions: We have designed a comprehensive genome-wide genotyping tool which enables accurate association testing and imputation of ungenotyped SNPs, facilitating powerful and cost-effective large-scale genotyping of transplant....... We designed a genome-wide genotyping tool based on the most recent human genomic reference datasets, and included customization for known and potentially relevant metabolic and pharmacological loci relevant to transplantation. Methods: We describe here the design and implementation of a customized...... genome-wide genotyping array, the 'TxArray', comprising approximately 782,000 markers with tailored content for deeper capture of variants across HLA, KIR, pharmacogenomic, and metabolic loci important in transplantation. To test concordance and genotyping quality, we genotyped 85 HapMap samples...

  18. Probe selection algorithm for oligonucleotide array-based medium-resolution genotyping.

    Science.gov (United States)

    Zhou, Y; Peng, S; Gao, H; Cheng, J

    2004-11-01

    Medium-resolution genotyping has the goal of distinguishing different subgroups instead of each element in a group. An oligonucleotide array provides an inexpensive, high-throughput method to identify differences in DNA sequence among individuals, which is fundamental for genotyping. As the cost and difficulty of designing and fabricating the oligonucleotide array dramatically increase with the number of probes used, it is therefore important to have a design with a minimum number of probes meeting the requirement of medium-resolution genotyping. The first algorithm for designing and selecting probes for oligonucleotide array-based medium-resolution typing is reported. The goal in deriving the algorithm was to select a minimum number of probes from a large probe set on the premise of minimum loss of resolution. The algorithm, which was based on entropy, conditional entropy and mutual information theory, was used to select the minimum number of probes from a large probe set. The algorithm was tested on a human leukocyte antigen (HLA) sequence data set Thirty probes were selected from 390 probes for HLA-A, and 60 probes were selected from 767 probes for HLA-B. Although the number of probes was reduced by almost ten times, the distinguishability was reduced only a little, by 0.45% (from 99.90% to 99.45%) for HLA-A and 0.27% (from 99.84% to 99.57%) for HLA-B, respectively. This is a satisfactory and practical result.

  19. SNP Discovery and Development of a High-Density Genotyping Array for Sunflower

    Science.gov (United States)

    Bachlava, Eleni; Taylor, Christopher A.; Tang, Shunxue; Bowers, John E.; Mandel, Jennifer R.; Burke, John M.; Knapp, Steven J.

    2012-01-01

    Recent advances in next-generation DNA sequencing technologies have made possible the development of high-throughput SNP genotyping platforms that allow for the simultaneous interrogation of thousands of single-nucleotide polymorphisms (SNPs). Such resources have the potential to facilitate the rapid development of high-density genetic maps, and to enable genome-wide association studies as well as molecular breeding approaches in a variety of taxa. Herein, we describe the development of a SNP genotyping resource for use in sunflower (Helianthus annuus L.). This work involved the development of a reference transcriptome assembly for sunflower, the discovery of thousands of high quality SNPs based on the generation and analysis of ca. 6 Gb of transcriptome re-sequencing data derived from multiple genotypes, the selection of 10,640 SNPs for inclusion in the genotyping array, and the use of the resulting array to screen a diverse panel of sunflower accessions as well as related wild species. The results of this work revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, greater than 95% of successful SNP assays revealed polymorphism, and more than 90% of these assays could be successfully transferred to related wild species. Analysis of the polymorphism data revealed patterns of genetic differentiation that were largely congruent with the evolutionary history of sunflower, though the large number of markers allowed for finer resolution than has previously been possible. PMID:22238659

  20. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits.

    Directory of Open Access Journals (Sweden)

    Benjamin F Voight

    Full Text Available Genome-wide association studies have identified hundreds of loci for type 2 diabetes, coronary artery disease and myocardial infarction, as well as for related traits such as body mass index, glucose and insulin levels, lipid levels, and blood pressure. These studies also have pointed to thousands of loci with promising but not yet compelling association evidence. To establish association at additional loci and to characterize the genome-wide significant loci by fine-mapping, we designed the "Metabochip," a custom genotyping array that assays nearly 200,000 SNP markers. Here, we describe the Metabochip and its component SNP sets, evaluate its performance in capturing variation across the allele-frequency spectrum, describe solutions to methodological challenges commonly encountered in its analysis, and evaluate its performance as a platform for genotype imputation. The metabochip achieves dramatic cost efficiencies compared to designing single-trait follow-up reagents, and provides the opportunity to compare results across a range of related traits. The metabochip and similar custom genotyping arrays offer a powerful and cost-effective approach to follow-up large-scale genotyping and sequencing studies and advance our understanding of the genetic basis of complex human diseases and traits.

  1. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy.

    Science.gov (United States)

    Bouwman, Aniek C; Veerkamp, Roel F

    2014-10-03

    The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken breeders, who have to choose wisely how to spend their sequencing efforts over all the breeds or lines they evaluate. Sequence data from cattle breeds was used, because there are currently relatively many individuals from several breeds sequenced within the 1,000 Bull Genomes project. The advantage of whole-genome sequence data is that it carries the causal mutations, but the question is whether it is possible to impute the causal variants accurately. This study therefore focussed on imputation accuracy of variants with low minor allele frequency and breed specific variants. Imputation accuracy was assessed for chromosome 1 and 29 as the correlation between observed and imputed genotypes. For chromosome 1, the average imputation accuracy was 0.70 with a reference population of 20 Holstein, and increased to 0.83 when the reference population was increased by including 3 other dairy breeds with 20 animals each. When the same amount of animals from the Holstein breed were added the accuracy improved to 0.88, while adding the 3 other breeds to the reference population of 80 Holstein improved the average imputation accuracy marginally to 0.89. For chromosome 29, the average imputation accuracy was lower. Some variants benefitted from the inclusion of other breeds in the reference population, initially determined by the MAF of the variant in each breed, but even Holstein specific variants did gain imputation accuracy from the multi-breed reference population. This study shows that splitting sequencing effort over multiple breeds and combining the reference populations is a good strategy for imputation from high-density SNP panels towards whole-genome sequence when reference

  2. Whole-Genome Sequencing: Automated, Nonindexed Library Preparation.

    Science.gov (United States)

    Mardis, Elaine; McCombie, W Richard

    2017-03-01

    This protocol describes an automated procedure for constructing a nonindexed Illumina DNA library and relies on the use of a CyBi-SELMA automated pipetting machine, the Covaris E210 shearing instrument, and the epMotion 5075. With this method, genomic DNA fragments are produced by sonication, using high-frequency acoustic energy to shear DNA. Here, double-stranded DNA is fragmented when exposed to the energy of adaptive focused acoustic shearing (AFA). The resulting DNA fragments are ligated to adaptors, amplified by polymerase chain reaction (PCR), and subjected to size selection using magnetic beads. The product is suitable for use as template in whole-genome sequencing. © 2017 Cold Spring Harbor Laboratory Press.

  3. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  4. Whole-genome sequencing to control antimicrobial resistance

    Science.gov (United States)

    Köser, Claudio U.; Ellington, Matthew J.; Peacock, Sharon J.

    2014-01-01

    Following recent improvements in sequencing technologies, whole-genome sequencing (WGS) is positioned to become an essential tool in the control of antibiotic resistance, a major threat in modern healthcare. WGS has already found numerous applications in this area, ranging from the development of novel antibiotics and diagnostic tests through to antibiotic stewardship of currently available drugs via surveillance and the elucidation of the factors that allow the emergence and persistence of resistance. Numerous proof-of-principle studies have also highlighted the value of WGS as a tool for day-to-day infection control and, for some pathogens, as a primary diagnostic tool to detect antibiotic resistance. However, appropriate data analysis platforms will need to be developed before routine WGS can be introduced on a large scale. PMID:25096945

  5. Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.

    Science.gov (United States)

    Li, Qing; Hermanson, Peter J; Springer, Nathan M

    2018-01-01

    DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.

  6. Whole genome sequencing: an efficient approach to ensuring food safety

    Science.gov (United States)

    Lakicevic, B.; Nastasijevic, I.; Dimitrijevic, M.

    2017-09-01

    Whole genome sequencing is an effective, powerful tool that can be applied to a wide range of public health and food safety applications. A major difference between WGS and the traditional typing techniques is that WGS allows all genes to be included in the analysis, instead of a well-defined subset of genes or variable intergenic regions. Also, the use of WGS can facilitate the understanding of contamination/colonization routes of foodborne pathogens within the food production environment, and can also afford efficient tracking of pathogens’ entry routes and distribution from farm-to-consumer. Tracking foodborne pathogens in the food processing-distribution-retail-consumer continuum is of the utmost importance for facilitation of outbreak investigations and rapid action in controlling/preventing foodborne outbreaks. Therefore, WGS likely will replace most of the numerous workflows used in public health laboratories to characterize foodborne pathogens into one consolidated, efficient workflow.

  7. Whole genome sequencing in clinical and public health microbiology

    Science.gov (United States)

    Kwong, J. C.; McCallum, N.; Sintchenko, V.; Howden, B. P.

    2015-01-01

    SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure. PMID:25730631

  8. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  9. Computational operon prediction in whole-genomes and metagenomes.

    Science.gov (United States)

    Zaidi, Syed Shujaat Ali; Zhang, Xuegong

    2017-07-01

    Microbial diversity in unique environmental settings enables abrupt responses catalysed by altering the gene regulation and formation of gene clusters called operons. Operons increases bacterial adaptability, which in turn increases their survival. This review article presents the emergence of computational operon prediction methods for whole microbial genomes and metagenomes, and discusses their strengths and limitations. Most of the whole-genome operon prediction methods struggle to generalize on unrelated genomes. The applicability of universal whole-genome operon prediction methods to metagenomic data is an interesting yet less investigated question. We have evaluated the potential of various operon prediction features for genomic and metagenomic data. Most of operon prediction methods with high accuracy have been compiled into databases. Despite of the high predictive performance, the data among many databases are not completely consistent for similar species. We performed a correlation analysis between the computationally predicted operon databases and experimentally validated data for Escherichia coli, Bacillus subtilis and Mycobacterium tuberculosis. Operon prediction for most of the less characterized microbes cannot be verified due to absence of experimentally validated operons. The generation of validated information for other microbes would test the authenticity of operon databases for other less annotated microbes as well. Advances in sequencing technologies and development of better analysis methods will help researchers to overcome the technological hurdles (such as long sequencing reads and improved contig size) and further improve operon predictions and better utilize operonic information. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  10. Whole genome amplification and sequencing of a Daphnia resting egg.

    Science.gov (United States)

    Lack, Justin B; Weider, Lawrence J; Jeyasingh, Punidan D

    2017-09-19

    Resting eggs banks are unique windows that allow us to directly observe shifts in population genetics, and phenotypes over time as natural populations evolve. Though a variety of planktonic organisms also produce resting stages, the keystone freshwater consumer, Daphnia, is a well-known model for paleogenetics and resurrection ecology. Nevertheless, paleogenomic investigations are limited largely because resting eggs do not contain enough DNA for genomic sequencing. In fact, genomic studies even on extant populations include a laborious preparatory phase of batch culturing dozens of individuals to generate sufficient genomic DNA. Here, we furnish a protocol to generate whole genomes of single ephippial (resting) eggs and single daphniids. Whole genomes of single ephippial eggs and single adults were amplified using Qiagen REPLI-g Single Cell kit reaction, followed by NEBNext Ultra DNA Library Prep Kit for library construction and Illumina sequencing. We compared the quality of the single-egg and single-individual amplified genomes to the standard batch genomic DNA extraction in the absence of genome amplification. At mean 20× depth, coverage was essentially identical for the amplified single individual relative to the unamplified batch extracted genome (>90% of the genome was covered and callable). Finally, while amplification resulted in the slight loss of heterozygosity for the amplified genomes, estimates were largely comparable and illustrate the utility and limitations of this approach in estimating population genetic parameters over long periods of time in natural populations of Daphnia and also other small species known to produce resting stages. © 2017 John Wiley & Sons Ltd.

  11. Whole genomic constellation of the first human G8 rotavirus strain detected in Japan.

    Science.gov (United States)

    Agbemabiese, Chantal Ama; Nakagomi, Toyoko; Doan, Yen Hai; Nakagomi, Osamu

    2015-10-01

    Human G8 Rotavirus A (RVA) strains are commonly detected in Africa but are rarely detected in Japan and elsewhere in the world. In this study, the whole genome sequence of the first human G8 RVA strain designated AU109 isolated in a child with acute gastroenteritis in 1994 was determined in order to understand how the strain was generated including the host species origin of its genes. The genotype constellation of AU109 was G8-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2. Phylogenetic analyses of the 11 genome segments revealed that its VP7 and VP1 genes were closely related to those of a Hungarian human G8P[14] RVA strain and these genes shared the most recent common ancestors in 1988 and 1982, respectively. AU109 possessed an NSP2 gene closely related to those of Chinese sheep and goat RVA strains. The remaining eight genome segments were closely related to Japanese human G2P[4] strains which circulated around 1985-1990. Bayesian evolutionary analyses revealed that the NSP2 gene of AU109 and those of the Chinese sheep and goat RVA strains diverged from a common ancestor around 1937. In conclusion, AU109 was generated through genetic reassortment event where Japanese DS-1-like G2P[4] strains circulating around 1985-1990 obtained the VP7, VP1 and NSP2 genes from unknown ruminant G8 RVA strains. These observations highlight the need for comprehensive examination of the whole genomes of RVA strains of less explored host species. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Addictions biology: haplotype-based analysis for 130 candidate genes on a single array.

    Science.gov (United States)

    Hodgkinson, Colin A; Yuan, Qiaoping; Xu, Ke; Shen, Pei-Hong; Heinz, Elizabeth; Lobos, Elizabeth A; Binder, Elizabeth B; Cubells, Joe; Ehlers, Cindy L; Gelernter, Joel; Mann, John; Riley, Brien; Roy, Alec; Tabakoff, Boris; Todd, Richard D; Zhou, Zhifeng; Goldman, David

    2008-01-01

    To develop a panel of markers able to extract full haplotype information for candidate genes in alcoholism, other addictions and disorders of mood and anxiety. A total of 130 genes were haplotype tagged and genotyped in 7 case/control populations and 51 reference populations using Illumina GoldenGate SNP genotyping technology, determining haplotype coverage. We also constructed and determined the efficacy of a panel of 186 ancestry informative markers. An average of 1465 loci were genotyped at an average completion rate of 91.3%, with an average call rate of 98.3% and replication rate of 99.7%. Completion and call rates were lowered by the performance of two datasets, highlighting the importance of the DNA quality in high throughput assays. A comparison of haplotypes captured by the Addictions Array tagging SNPs and commercially available whole-genome arrays from Illumina and Affymetrix shows comparable performance of the tag SNPs to the best whole-genome array in all populations for which data are available. Arrays of haplotype-tagged candidate genes, such as this addictions-focused array, represent a cost-effective approach to generate high-quality SNP genotyping data useful for the haplotype-based analysis of panels of genes such as these 130 genes of interest to alcohol and addictions researchers. The inclusion of the 186 ancestry informative markers allows for the detection and correction for admixture and further enhances the utility of the array.

  13. SNP detection for massively parallel whole-genome resequencing

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Li, Yingrui; Fang, Xiaodong

    2009-01-01

    .25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered...

  14. Whole genome sequencing reveals mycobacterial microevolution among concurrent isolates from sputum and blood in HIV infected TB patients.

    Science.gov (United States)

    Ssengooba, Willy; de Jong, Bouke C; Joloba, Moses L; Cobelens, Frank G; Meehan, Conor J

    2016-08-05

    In the context of advanced immunosuppression, M. tuberculosis is known to cause detectable mycobacteremia. However, little is known about the intra-patient mycobacterial microevolution and the direction of seeding between the sputum and blood compartments. From a diagnostic study of HIV-infected TB patients, 51 pairs of concurrent blood and sputum M. tuberculosis isolates from the same patient were available. In a previous analysis, we identified a subset with genotypic concordance, based on spoligotyping and 24 locus MIRU-VNTR. These paired isolates with identical genotypes were analyzed by whole genome sequencing and phylogenetic analysis. Of the 25 concordant pairs (49 % of the 51 paired isolates), 15 (60 %) remained viable for extraction of high quality DNA for whole genome sequencing. Two patient pairs were excluded due to poor quality sequence reads. The median CD4 cell count was 32 (IQR; 16-101)/mm(3) and ten (77 %) patients were on ART. No drug resistance mutations were identified in any of the sequences analyzed. Three (23.1 %) of 13 patients had SNPs separating paired isolates from blood and sputum compartments, indicating evidence of microevolution. Using a phylogenetic approach to identify the ancestral compartment, in two (15 %) patients the blood isolate was ancestral to the sputum isolate, in one (8 %) it was the opposite, and ten (77 %) of the pairs were identical. Among HIV-infected patients with poor cellular immunity, infection with multiple strains of M. tuberculosis was found in half of the patients. In those patients with identical strains, whole genome sequencing indicated that M. tuberculosis intra-patient microevolution does occur in a few patients, yet did not reveal a consistent direction of spread between sputum and blood. This suggests that these compartments are highly connected and potentially seed each other repeatedly.

  15. A novel strategy for clustering major depression individuals using whole-genome sequencing variant data.

    Science.gov (United States)

    Yu, Chenglong; Baune, Bernhard T; Licinio, Julio; Wong, Ma-Li

    2017-03-13

    Major depressive disorder (MDD) is highly prevalent, resulting in an exceedingly high disease burden. The identification of generic risk factors could lead to advance prevention and therapeutics. Current approaches examine genotyping data to identify specific variations between cases and controls. Compared to genotyping, whole-genome sequencing (WGS) allows for the detection of private mutations. In this proof-of-concept study, we establish a conceptually novel computational approach that clusters subjects based on the entirety of their WGS. Those clusters predicted MDD diagnosis. This strategy yielded encouraging results, showing that depressed Mexican-American participants were grouped closer; in contrast ethnically-matched controls grouped away from MDD patients. This implies that within the same ancestry, the WGS data of an individual can be used to check whether this individual is within or closer to MDD subjects or to controls. We propose a novel strategy to apply WGS data to clinical medicine by facilitating diagnosis through genetic clustering. Further studies utilising our method should examine larger WGS datasets on other ethnical groups.

  16. seXY: a tool for sex inference from genotype arrays.

    Science.gov (United States)

    Qian, David C; Busam, Jonathan A; Xiao, Xiangjun; O'Mara, Tracy A; Eeles, Rosalind A; Schumacher, Frederick R; Phelan, Catherine M; Amos, Christopher I

    2017-02-15

    Checking concordance between reported sex and genotype-inferred sex is a crucial quality control measure in genome-wide association studies (GWAS). However, limited insights exist regarding the true accuracy of software that infer sex from genotype array data. We present seXY, a logistic regression model trained on both X chromosome heterozygosity and Y chromosome missingness, that consistently demonstrated >99.5% sex inference accuracy in cross-validation for 889 males and 5,361 females enrolled in prostate cancer and ovarian cancer GWAS. Compared to PLINK, one of the most popular tools for sex inference in GWAS that assesses only X chromosome heterozygosity, seXY achieved marginally better male classification and 3% more accurate female classification. https://github.com/Christopher-Amos-Lab/seXY. Christopher.I.Amos@dartmouth.edu. Supplementary data are available at Bioinformatics online.

  17. Assessing the utility of whole genome amplified DNA for next-generation molecular ecology.

    Science.gov (United States)

    Blair, Christopher; Campbell, C Ryan; Yoder, Anne D

    2015-09-01

    DNA quantity can be a hindrance in ecological and evolutionary research programmes due to a range of factors including endangered status of target organisms, available tissue type, and the impact of field conditions on preservation methods. A potential solution to low-quantity DNA lies in whole genome amplification (WGA) techniques that can substantially increase DNA yield. To date, few studies have rigorously examined sequence bias that might result from WGA and next-generation sequencing of nonmodel taxa. To address this knowledge deficit, we use multiple displacement amplification (MDA) and double-digest RAD sequencing on the grey mouse lemur (Microcebus murinus) to quantify bias in genome coverage and SNP calls when compared to raw genomic DNA (gDNA). We focus our efforts in providing baseline estimates of potential bias by following manufacturer's recommendations for starting DNA quantities (>100 ng). Our results are strongly suggestive that MDA enrichment does not introduce systematic bias to genome characterization. SNP calling between samples when genotyping both de-novo and with a reference genome are highly congruent (>98%) when specifying a minimum threshold of 20X stack depth to call genotypes. Relative genome coverage is also similar between MDA and gDNA, and allelic dropout is not observed. SNP concordance varies based on coverage threshold, with 95% concordance reached at ~12X coverage genotyping de-novo and ~7X coverage genotyping with the reference genome. These results suggest that MDA may be a suitable solution for next-generation molecular ecological studies when DNA quantity would otherwise be a limiting factor. © 2015 John Wiley & Sons Ltd.

  18. Application of Whole-Genome Sequencing to an Unusual Outbreak of Invasive Group A Streptococcal Disease.

    Science.gov (United States)

    Galloway-Peña, Jessica; Clement, Meredith E; Sharma Kuinkel, Batu K; Ruffin, Felicia; Flores, Anthony R; Levinson, Howard; Shelburne, Samuel A; Moore, Zack; Fowler, Vance G

    2016-01-01

    Whole-genome analysis was applied to investigate atypical point-source transmission of 2 invasive group A streptococcal (GAS) infections. Isolates were serotype M4, ST39, and genetically indistinguishable. Comparison with MGAS10750 revealed nonsynonymous polymorphisms in ropB and increased speB transcription. This study demonstrates the usefulness of whole-genome analyses for GAS outbreaks.

  19. Accuracy of genomic prediction using imputed whole-genome sequence data in white layers

    NARCIS (Netherlands)

    Heidaritabar, M.; Calus, M.P.L.; Megens, H.J.; Vereijken, A.; Groenen, M.A.M.; Bastiaansen, J.W.M.

    2016-01-01

    There is an increasing interest in using whole-genome sequence data in genomic selection breeding programmes. Prediction of breeding values is expected to be more accurate when whole-genome sequence is used, because the causal mutations are assumed to be in the data. We performed genomic

  20. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    The Genome of the Netherlands Consortium; T. Marschall (Tobias); A. Schönhuth (Alexander)

    2014-01-01

    htmlabstractWhole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch

  1. Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle

    NARCIS (Netherlands)

    Binsbergen, van R.; Bink, M.C.A.M.; Calus, M.P.L.; Eeuwijk, van F.A.; Hayes, B.J.; Hulsegge, B.; Veerkamp, R.F.

    2014-01-01

    Background The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina

  2. Prospects of whole-genome sequence data in animal and plant breeding

    NARCIS (Netherlands)

    Binsbergen, van Rianne

    2017-01-01

    The rapid decrease in costs of DNA sequencing implies that whole-genome sequence data will be widely available in the coming few years. Whole-genome sequence data includes all base-pairs on the genome that show variation in the sequenced population. Consequently, it is assumed that the causal

  3. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls...

  4. Signatures of selection in tilapia revealed by whole genome resequencing.

    Science.gov (United States)

    Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua

    2015-09-16

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.

  5. Genomic V exons from whole genome shotgun data in reptiles.

    Science.gov (United States)

    Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

    2014-08-01

    Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).

  6. Whole genome sequencing of Chinese clearhead icefish, Protosalanx hyalocranius.

    Science.gov (United States)

    Liu, Kai; Xu, Dongpo; Li, Jia; Bian, Chao; Duan, Jinrong; Zhou, Yanfeng; Zhang, Minying; You, Xinxin; You, Yang; Chen, Jieming; Yu, Hui; Xu, Gangchun; Fang, Di-An; Qiang, Jun; Jiang, Shulun; He, Jie; Xu, Junmin; Shi, Qiong; Zhang, Zhiyong; Xu, Pao

    2017-04-01

    Chinese clearhead icefish, Protosalanx hyalocranius , is a representative icefish species with economic importance and special appearance. Due to its great economic value in China, the fish was introduced into Lake Dianchi and several other lakes from the Lake Taihu half a century ago. Similar to the Sinocyclocheilus cavefish, the clearhead icefish has certain cavefish-like traits, such as transparent body and nearly scaleless skin. Here, we provide the whole genome sequence of this surface-dwelling fish and generated a draft genome assembly, aiming at exploring molecular mechanisms for the biological interests. A total of 252.1 Gb of raw reads were sequenced. Subsequently, a novel draft genome assembly was generated, with the scaffold N50 reaching 1.163 Mb. The genome completeness was estimated to be 98.39 % by using the CEGMA evaluation. Finally, we annotated 19 884 protein-coding genes and observed that repeat sequences account for 24.43 % of the genome assembly. We report the first draft genome of the Chinese clearhead icefish. The genome assembly will provide a solid foundation for further molecular breeding and germplasm resource protection in Chinese clearhead icefish, as well as other icefishes. It is also a valuable genetic resource for revealing the molecular mechanisms for the cavefish-like characters.

  7. GPCR genes are preferentially retained after whole genome duplication.

    Directory of Open Access Journals (Sweden)

    Jenia Semyonov

    Full Text Available One of the most interesting questions in biology is whether certain pathways have been favored during evolution, and if so, what properties could cause such a preference. Due to the lack of experimental evidence, whether select gene families have been preferentially retained over time after duplication in metazoan organisms remains unclear. Here, by syntenic mapping of nonchemosensory G protein-coupled receptor genes (nGPCRs which represent half the receptome for transmembrane signaling in the vertebrate genomes, we found that, as opposed to the 8-15% retention rate for whole genome duplication (WGD-derived gene duplicates in the entire genome of pufferfish, greater than 27.8% of WGD-derived nGPCRs which interact with a nonpeptide ligand were retained after WGD in pufferfish Tetraodon nigroviridis. In addition, we show that concurrent duplication of cognate ligand genes by WGD could impose selection of nGPCRs that interact with a polypeptide ligand. Against less than 2.25% probability for parallel retention of a pair of WGD-derived ligands and a pair of cognate receptor duplicates, we found a more than 8.9% retention of WGD-derived ligand-nGPCR pairs--threefold greater than one would surmise. These results demonstrate that gene retention is not uniform after WGD in vertebrates, and suggest a Darwinian selection of GPCR-mediated intercellular communication in metazoan organisms.

  8. Current Developments in Prokaryotic Single Cell Whole Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Goudeau, Danielle; Nath, Nandita; Ciobanu, Doina; Cheng, Jan-Fang; Malmstrom, Rex

    2014-03-14

    Our approach to prokaryotic single-cell Whole Genome Amplification at the JGI continues to evolve. To increase both the quality and number of single-cell genomes produced, we explore all aspects of the process from cell sorting to sequencing. For example, we now utilize specialized reagents, acoustic liquid handling, and reduced reaction volumes eliminate non-target DNA contamination in WGA reactions. More specifically, we use a cleaner commercial WGA kit from Qiagen that employs a UV decontamination procedure initially developed at the JGI, and we use the Labcyte Echo for tip-less liquid transfer to set up 2uL reactions. Acoustic liquid handling also dramatically reduces reagent costs. In addition, we are exploring new cell lysis methods including treatment with Proteinase K, lysozyme, and other detergents, in order to complement standard alkaline lysis and allow for more efficient disruption of a wider range of cells. Incomplete lysis represents a major hurdle for WGA on some environmental samples, especially rhizosphere, peatland, and other soils. Finding effective lysis strategies that are also compatible with WGA is challenging, and we are currently assessing the impact of various strategies on genome recovery.

  9. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  10. Whole genomes redefine the mutational landscape of pancreatic cancer

    Science.gov (United States)

    Waddell, Nicola; Pajic, Marina; Patch, Ann-Marie; Chang, David K.; Kassahn, Karin S.; Bailey, Peter; Johns, Amber L.; Miller, David; Nones, Katia; Quek, Kelly; Quinn, Michael C. J.; Robertson, Alan J.; Fadlullah, Muhammad Z. H.; Bruxner, Tim J. C.; Christ, Angelika N.; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Wilson, Peter J; Markham, Emma; Cloonan, Nicole; Anderson, Matthew J.; Fink, J. Lynn; Holmes, Oliver; Kazakoff, Stephen H.; Leonard, Conrad; Newell, Felicity; Poudel, Barsha; Song, Sarah; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Wu, Jianmin; Pinese, Mark; Cowley, Mark J.; Lee, Hong C.; Jones, Marc D.; Nagrial, Adnan M.; Humphris, Jeremy; Chantrill, Lorraine A.; Chin, Venessa; Steinmann, Angela M.; Mawson, Amanda; Humphrey, Emily S.; Colvin, Emily K.; Chou, Angela; Scarlett, Christopher J.; Pinho, Andreia V.; Giry-Laterriere, Marc; Rooman, Ilse; Samra, Jaswinder S.; Kench, James G.; Pettitt, Jessica A.; Merrett, Neil D.; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q.; Barbour, Andrew; Zeps, Nikolajs; Jamieson, Nigel B.; Graham, Janet S.; Niclou, Simone P.; Bjerkvig, Rolf; Grützmann, Robert; Aust, Daniela; Hruban, Ralph H.; Maitra, Anirban; Iacobuzio-Donahue, Christine A.; Wolfgang, Christopher L.; Morgan, Richard A.; Lawlor, Rita T.; Corbo, Vincenzo; Bassi, Claudio; Falconi, Massimo; Zamboni, Giuseppe; Tortora, Giampaolo; Tempero, Margaret A.; Gill, Anthony J.; Eshleman, James R.; Pilarsky, Christian; Scarpa, Aldo; Musgrove, Elizabeth A.; Pearson, John V.; Biankin, Andrew V.; Grimmond, Sean M.

    2015-01-01

    Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded. PMID:25719666

  11. Post-Fragmentation Whole Genome Amplification-Based Method

    Science.gov (United States)

    Benardini, James; LaDuc, Myron T.; Langmore, John

    2011-01-01

    This innovation is derived from a proprietary amplification scheme that is based upon random fragmentation of the genome into a series of short, overlapping templates. The resulting shorter DNA strands (fragmentation whole genome amplification-based technology provides a robust and accurate method of amplifying femtogram levels of starting material into microgram yields with no detectable allele bias. The amplified DNA also facilitates the preservation of samples (spacecraft samples) by amplifying scarce amounts of template DNA into microgram concentrations in just a few hours. Based on further optimization of this technology, this could be a feasible technology to use in sample preservation for potential future sample return missions. The research and technology development described here can be pivotal in dealing with backward/forward biological contamination from planetary missions. Such efforts rely heavily on an increasing understanding of the burden and diversity of microorganisms present on spacecraft surfaces throughout assembly and testing. The development and implementation of these technologies could significantly improve the comprehensiveness and resolving power of spacecraft-associated microbial population censuses, and are important to the continued evolution and advancement of planetary protection capabilities. Current molecular procedures for assaying spacecraft-associated microbial burden and diversity have inherent sample loss issues at practically every step, particularly nucleic acid extraction. In engineering a molecular means of amplifying nucleic acids directly from single cells in their native state within the sample matrix, this innovation has circumvented entirely the need for DNA extraction regimes in the sample processing scheme.

  12. Whole genome sequencing in support of wellness and health maintenance

    National Research Council Canada - National Science Library

    Patel, Chirag J; Sivadas, Ambily; Tabassum, Rubina; Preeprem, Thanawadee; Zhao, Jing; Arafat, Dalia; Chen, Rong; Morgan, Alexander A; Martin, Gregory S; Brigham, Kenneth L; Butte, Atul J; Gibson, Greg

    2013-01-01

    .... The present study illustrates novel approaches for integrating genotypic and clinical information for assessment of generalized health risks and to assist individuals in the promotion of wellness...

  13. Use of routinely collected amniotic fluid for whole-genome expression analysis of polygenic disorders.

    Science.gov (United States)

    Nagy, Gyula Richárd; Gyõrffy, Balázs; Galamb, Orsolya; Molnár, Béla; Nagy, Bálint; Papp, Zoltán

    2006-11-01

    Neural tube defects related to polygenic disorders are the second most common birth defects in the world, but no molecular biologic tests are available to analyze the genes involved in the pathomechanism of these disorders. We explored the use of routinely collected amniotic fluid to characterize the differential gene expression profiles of polygenic disorders. We used oligonucleotide microarrays to analyze amniotic fluid samples obtained from pregnant women carrying fetuses with neural tube defects diagnosed during ultrasound examination. The control samples were obtained from pregnant women who underwent routine genetic amniocentesis because of advanced maternal age (>35 years). We also investigated specific folate-related genes because maternal periconceptional folic acid supplementation has been found to have a protective effect with respect to neural tube defects. Fetal mRNA from amniocytes was successfully isolated, amplified, labeled, and hybridized to whole-genome transcript arrays. We detected differential gene expression profiles between cases and controls. Highlighted genes such as SLA, LST1, and BENE might be important in the development of neural tube defects. None of the specific folate-related genes were in the top 100 associated transcripts. This pilot study demonstrated that a routinely collected amount of amniotic fluid (as small as 6 mL) can provide sufficient RNA to successfully hybridize to expression arrays. Analysis of the differences in fetal gene expressions might help us decipher the complex genetic background of polygenic disorders.

  14. Evidence for an ancient whole genome duplication in the cycad lineage.

    Directory of Open Access Journals (Sweden)

    Danielle Roodt

    Full Text Available Contrary to the many whole genome duplication events recorded for angiosperms (flowering plants, whole genome duplications in gymnosperms (non-flowering seed plants seem to be much rarer. Although ancient whole genome duplications have been reported for most gymnosperm lineages as well, some are still contested and need to be confirmed. For instance, data for ginkgo, but particularly cycads have remained inconclusive so far, likely due to the quality of the data available and flaws in the analysis. We extracted and sequenced RNA from both the cycad Encephalartos natalensis and Ginkgo biloba. This was followed by transcriptome assembly, after which these data were used to build paralog age distributions. Based on these distributions, we identified remnants of an ancient whole genome duplication in both cycads and ginkgo. The most parsimonious explanation would be that this whole genome duplication event was shared between both species and had occurred prior to their divergence, about 300 million years ago.

  15. Ancestry-informative markers for African Americans based on the Affymetrix Pan-African genotyping array

    Directory of Open Access Journals (Sweden)

    Xu Zhang

    2014-11-01

    Full Text Available Genetic admixture has been utilized as a tool for identifying loci associated with complex traits and diseases in recently admixed populations such as African Americans. In particular, admixture mapping is an efficient approach to identifying genetic basis for those complex diseases with substantial racial or ethnic disparities. Though current advances in admixture mapping algorithms may utilize the entire panel of SNPs, providing ancestry-informative markers (AIMs that can differentiate parental populations and estimate ancestry proportions in an admixed population may particularly benefit admixture mapping in studies of limited samples, help identify unsuitable individuals (e.g., through genotyping the most informative ancestry markers before starting large genome-wide association studies (GWAS, or guide larger scale targeted deep re-sequencing for determining specific disease-causing variants. Defining panels of AIMs based on commercial, high-throughput genotyping platforms will facilitate the utilization of these platforms for simultaneous admixture mapping of complex traits and diseases, in addition to conventional GWAS. Here, we describe AIMs detected based on the Shannon Information Content (SIC or Fst for African Americans with genome-wide coverage that were selected from ∼2.3 million single nucleotide polymorphisms (SNPs covered by the Affymetrix Axiom Pan-African array, a newly developed genotyping platform optimized for individuals of African ancestry.

  16. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction.

    Science.gov (United States)

    Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S

    2015-06-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome

  17. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  18. Parent and Public Interest in Whole Genome Sequencing

    Science.gov (United States)

    Dodson, Daniel S.; Goldenberg, Aaron J.; Davis, Matthew M.; Singer, Dianne C.; Tarini, Beth A.

    2015-01-01

    Objective To assess the baseline interest of the public in whole genome sequencing (WGS) for themselves, parents’ interest in WGS for their youngest children, and factors associated with such interest. Methods A random sample of adults from a probability-based nationally representative online panel was surveyed. All participants were provided basic information about WGS and then asked their interest in WGS for themselves. Those participants who self-identified as parents were asked about their interest in WGS for their children. The order in which parents were asked about their interest in WGS for themselves and their child was randomized. The relationship between parent/child characteristics and interest in WGS was examined. Results Overall response rate was 62% (55% among parents). 58.6% of the total population (parents and non-parents) was interested in WGS for themselves. Similarly, 61.8% of parents were interested in WGS for themselves and 57.8% were interested in WGS for their youngest children. Of note, 84.7% of parents showed an identical interest level in WGS for themselves and their youngest children. Mothers as a whole, and parents whose youngest children had ≥2 health conditions had significantly more interest in WGS for themselves and their youngest children, while those with conservative political ideologies had considerably less. Conclusions While U.S. adults have varying interest levels in WGS, parents appear to have similar interests in genome testing for themselves and their youngest children. As WGS technology becomes available in the clinic and private market, clinicians should be prepared to discuss WGS risks and benefits with their patients. PMID:25765282

  19. Parent and public interest in whole-genome sequencing.

    Science.gov (United States)

    Dodson, Daniel S; Goldenberg, Aaron J; Davis, Matthew M; Singer, Dianne C; Tarini, Beth A

    2015-01-01

    The aim of this study was to assess the baseline interest of the public in whole-genome sequencing (WGS) for oneself, parents' interest in WGS for their youngest children, and factors associated with such interest. A random sample of adults from a probability-based nationally representative online panel was surveyed. All participants were provided basic information about WGS and then asked about their interest in WGS for themselves. Those participants who were parents were additionally asked about their interest in WGS for their children. The order in which parents were asked about their interest in WGS for themselves and for their child was randomized. The relationship between parent/child characteristics and interest in WGS was examined. The overall response rate was 62% (55% among parents). 58.6% of the total population (parents and nonparents) was interested in WGS for themselves. Similarly, 61.8% of the parents were interested in WGS for themselves and 57.8% were interested in WGS for their youngest children. Of note, 84.7% of the parents showed an identical interest level in WGS for themselves and their youngest children. Mothers as a group and parents whose youngest children had ≥2 health conditions had significantly more interest in WGS for themselves and their youngest children, while those with conservative political ideologies had considerably less. While US adults have varying interest levels in WGS, parents appear to have similar interests in genome testing for themselves and their youngest children. As WGS technology becomes available in the clinic and private market, clinicians should be prepared to discuss WGS risks and benefits with their patients. © 2015 S. Karger AG, Basel.

  20. Whole genome sequencing analysis of lung adenocarcinoma in Xuanwei, China.

    Science.gov (United States)

    Wang, Xiao; Li, Jing; Duan, Yong; Wu, Huifei; Xu, Qiuyue; Zhang, Yanliang

    2017-03-01

    The lung cancer mortality rate in Xuanwei city is among the highest in China and adenocarcinoma is the major histological type. Lung cancer has been associated with exposure to indoor smoky coal emissions that contain high levels of polycyclic aromatic hydrocarbons; however, the pathogenesis of lung cancer has not yet been fully elucidated. We performed whole genome sequencing with lung adenocarcinoma and corresponding non-tumor tissue to explore the genomic features of Xuanwei lung cancer. We used the Molecule Annotation System to determine and plot alterations in genes and signaling pathways. A total of 3 428 060 and 3 416 989 single nucleotide variants were detected in tumor and normal genomes, respectively. After comparison of these two genomes, 977 high-confidence somatic single nucleotide variants were identified. We observed a remarkably high proportion of C·G-A·T transversions. HECTD4, RCBTB2, KLF15, and CACNA1C may be cancer-related genes. Nine copy number variations increased in chromosome 5 and one in chromosome 7. The novel junctions were detected via clustered discordant paired ends and 1955 structural variants were discovered. Among these, we found 44 novel chromosome structural variations. In addition, EGFR and CACNA1C in the mitogen-activated protein kinase signaling pathway were mutated or amplified in lung adenocarcinoma tumor tissue. We obtained a comprehensive view of somatic alterations of Xuanwei lung adenocarcinoma. These findings provide insight into the genomic landscape in order to further learn about the progress and development of Xuanwei lung adenocarcinoma. © 2017 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.

  1. Whole genome analysis of epidemiologically closely related Staphylococcus aureus isolates.

    Directory of Open Access Journals (Sweden)

    Maarten Schijffelen

    Full Text Available The change of the bacteria from colonizers to pathogens is accompanied by a drastic change in expression profiles. These changes may be due to environmental signals or to mutational changes. We therefore compared the whole genome sequences of four sets of S. aureus isolates. Three sets were from the same patients. The isolates of each pair (S1800/S1805, S2396/S2395, S2398/S2397, an isolate from colonization and an isolate from infection, respectively were obtained within <30 days of each other and the isolate from infection caused skin infections. The isolates were then compared for differences in gene content and SNPs. In addition, a set of isolates from a colonized pig and a farmer from the same farm at the same time (S0462 and S0460 were analyzed. The isolates pair S1800/S1805 showed a difference in a prophage, but these are easily lost or acquired. However, S1805 contained an integrative conjugative element not present in S1800. In addition, 92 SNPs were present in a variety of genes and the isolates S1800 and S1805 were not considered a pair. Between S2395/S2396 two SNPs were present: one was in an intergenic region and one was a synonymous mutation in a putative membrane protein. Between S2397/S2398 only one synonymous mutation in a putative lipoprotein was found. The two farm isolates were very similar and showed 12 SNPs in genes that belong to a number of different functional categories. However, we cannot pinpoint any gene that explains the change from carrier status to infection. The data indicate that differences between the isolate from infection and the colonizing isolate for S2395/S2396 and S2397/S2398 exist as well as between isolates from different hosts, but S1800/S1805 are not clonal.

  2. Using whole genome sequencing to study American foulbrood epidemiology in honeybees

    Science.gov (United States)

    Ågren, Joakim; Schäfer, Marc Oliver

    2017-01-01

    American foulbrood (AFB), caused by Paenibacillus larvae, is a devastating disease in honeybees. In most countries, the disease is controlled through compulsory burning of symptomatic colonies causing major economic losses in apiculture. The pathogen is endemic to honeybees world-wide and is readily transmitted via the movement of hive equipment or bees. Molecular epidemiology of AFB currently largely relies on placing isolates in one of four ERIC-genotypes. However, a more powerful alternative is multi-locus sequence typing (MLST) using whole-genome sequencing (WGS), which allows for high-resolution studies of disease outbreaks. To evaluate WGS as a tool for AFB-epidemiology, we applied core genome MLST (cgMLST) on isolates from a recent outbreak of AFB in Sweden. The high resolution of the cgMLST allowed different bacterial clones involved in the disease outbreak to be identified and to trace the source of infection. The source was found to be a beekeeper who had sold bees to two other beekeepers, proving the epidemiological link between them. No such conclusion could have been made using conventional MLST or ERIC-typing. This is the first time that WGS has been used to study the epidemiology of AFB. The results show that the technique is very powerful for high-resolution tracing of AFB-outbreaks. PMID:29140998

  3. Using whole genome sequencing to study American foulbrood epidemiology in honeybees.

    Directory of Open Access Journals (Sweden)

    Joakim Ågren

    Full Text Available American foulbrood (AFB, caused by Paenibacillus larvae, is a devastating disease in honeybees. In most countries, the disease is controlled through compulsory burning of symptomatic colonies causing major economic losses in apiculture. The pathogen is endemic to honeybees world-wide and is readily transmitted via the movement of hive equipment or bees. Molecular epidemiology of AFB currently largely relies on placing isolates in one of four ERIC-genotypes. However, a more powerful alternative is multi-locus sequence typing (MLST using whole-genome sequencing (WGS, which allows for high-resolution studies of disease outbreaks. To evaluate WGS as a tool for AFB-epidemiology, we applied core genome MLST (cgMLST on isolates from a recent outbreak of AFB in Sweden. The high resolution of the cgMLST allowed different bacterial clones involved in the disease outbreak to be identified and to trace the source of infection. The source was found to be a beekeeper who had sold bees to two other beekeepers, proving the epidemiological link between them. No such conclusion could have been made using conventional MLST or ERIC-typing. This is the first time that WGS has been used to study the epidemiology of AFB. The results show that the technique is very powerful for high-resolution tracing of AFB-outbreaks.

  4. Sensitive and specific KRAS somatic mutation analysis on whole-genome amplified DNA from archival tissues.

    Science.gov (United States)

    van Eijk, Ronald; van Puijenbroek, Marjo; Chhatta, Amiet R; Gupta, Nisha; Vossen, Rolf H A M; Lips, Esther H; Cleton-Jansen, Anne-Marie; Morreau, Hans; van Wezel, Tom

    2010-01-01

    Kirsten RAS (KRAS) is a small GTPase that plays a key role in Ras/mitogen-activated protein kinase signaling; somatic mutations in KRAS are frequently found in many cancers. The most common KRAS mutations result in a constitutively active protein. Accurate detection of KRAS mutations is pivotal to the molecular diagnosis of cancer and may guide proper treatment selection. Here, we describe a two-step KRAS mutation screening protocol that combines whole-genome amplification (WGA), high-resolution melting analysis (HRM) as a prescreen method for mutation carrying samples, and direct Sanger sequencing of DNA from formalin-fixed, paraffin-embedded (FFPE) tissue, from which limited amounts of DNA are available. We developed target-specific primers, thereby avoiding amplification of homologous KRAS sequences. The addition of herring sperm DNA facilitated WGA in DNA samples isolated from as few as 100 cells. KRAS mutation screening using high-resolution melting analysis on wgaDNA from formalin-fixed, paraffin-embedded tissue is highly sensitive and specific; additionally, this method is feasible for screening of clinical specimens, as illustrated by our analysis of pancreatic cancers. Furthermore, PCR on wgaDNA does not introduce genotypic changes, as opposed to unamplified genomic DNA. This method can, after validation, be applied to virtually any potentially mutated region in the genome.

  5. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

    Directory of Open Access Journals (Sweden)

    Ruibang Luo

    2014-06-01

    Full Text Available This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels, BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads, or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  6. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis.

    Directory of Open Access Journals (Sweden)

    Mingyu Gan

    Full Text Available Mixed infection by multiple Mycobacterium tuberculosis (MTB strains is associated with poor treatment outcome of tuberculosis (TB. Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates.

  7. Care and cost consequences of pediatric whole genome sequencing compared to chromosome microarray.

    Science.gov (United States)

    Hayeems, Robin Z; Bhawra, Jasmin; Tsiplova, Kate; Meyn, M Stephen; Monfared, Nasim; Bowdin, Sarah; Stavropoulos, D James; Marshall, Christian R; Basran, Raveen; Shuman, Cheryl; Ito, Shinya; Cohn, Iris; Hum, Courtney; Girdea, Marta; Brudno, Michael; Cohn, Ronald D; Scherer, Stephen W; Ungar, Wendy J

    2017-11-20

    The clinical use of whole-genome sequencing (WGS) is expected to alter pediatric medical management. The study aimed to describe the type and cost of healthcare activities following pediatric WGS compared to chromosome microarray (CMA). Healthcare activities prompted by WGS and CMA were ascertained for 101 children with developmental delay over 1 year. Activities following receipt of non-diagnostic CMA were compared to WGS diagnostic and non-diagnostic results. Activities were costed in 2016 Canadian dollars (CDN). Ongoing care accounted for 88.6% of post-test activities. The mean number of lab tests was greater following CMA than WGS (0.55 vs. 0.09; p = 0.007). The mean number of specialist visits was greater following WGS than CMA (0.41 vs. 0; p = 0.016). WGS results (diagnostic vs. non-diagnostic) modified the effect of test type on mean number of activities (p WGS exceeded $557CDN for 10% of cases. In complex pediatric care, CMA prompted additional diagnostic investigations while WGS prompted tailored care guided by genotypic variants. Costs for prompted activities were low for the majority and constitute a small proportion of total test costs. Optimal use of WGS depends on robust evaluation of downstream care and cost consequences.

  8. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

    Science.gov (United States)

    Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  9. High Whole-Genome Sequence Diversity of Human Papillomavirus Type 18 Isolates

    Directory of Open Access Journals (Sweden)

    Pascal van der Weele

    2018-02-01

    Full Text Available Background: The most commonly found human papillomavirus (HPV types in cervical cancer are HPV16 and HPV18. Genome variants of these types have been associated with differential carcinogenic potential. To date, only a handful of studies have described HPV18 whole genome sequencing results. Here we describe HPV18 variant diversity and conservation of persistent infections in a longitudinal retrospective cohort study. Methods: Cervical self-samples were obtained annually over four years and genotyped on the SPF10-DEIA-LiPA25 platform. Clearing and persistent HPV18 positive infections were selected, amplified in two overlapping fragments, and sequenced using 32 sequence primers. Results: Complete viral genomes were obtained from 25 participants with persistent and 26 participants with clearing HPV18 infections, resulting in 52 unique HPV18 genomes. Sublineage A3 was predominant in this population. The consensus viral genome was completely conserved over time in persistent infections, with one exception, where different HPV18 variants were identified in follow-up samples. Conclusions: This study identified a diverse set of HPV18 variants. In persistent infections, the consensus viral genome is conserved. The identification of only one HPV18 infection with different major variants in follow-up implies that this is a potentially rare event. This dataset adds 52 HPV18 genome variants to Genbank, more than doubling the currently available HPV18 information resource, and all but one variant are unique additions.

  10. Structural and functional-annotation of an equine whole genome oligoarray

    Directory of Open Access Journals (Sweden)

    Chowdhary Bhanu

    2009-10-01

    Full Text Available Abstract Background The horse genome is sequenced, allowing equine researchers to use high-throughput functional genomics platforms such as microarrays; next-generation sequencing for gene expression and proteomics. However, for researchers to derive value from these functional genomics datasets, they must be able to model this data in biologically relevant ways; to do so requires that the equine genome be more fully annotated. There are two interrelated types of genomic annotation: structural and functional. Structural annotation is delineating and demarcating the genomic elements (such as genes, promoters, and regulatory elements. Functional annotation is assigning function to structural elements. The Gene Ontology (GO is the de facto standard for functional annotation, and is routinely used as a basis for modelling and hypothesis testing, large functional genomics datasets. Results An Equine Whole Genome Oligonucleotide (EWGO array with 21,351 elements was developed at Texas A&M University. This 70-mer oligoarray was designed using the approximately 7× assembled and annotated sequence of the equine genome to be one of the most comprehensive arrays available for expressed equine sequences. To assist researchers in determining the biological meaning of data derived from this array, we have structurally annotated it by mapping the elements to multiple database accessions, including UniProtKB, Entrez Gene, NRPD (Non-Redundant Protein Database and UniGene. We next provided GO functional annotations for the gene transcripts represented on this array. Overall, we GO annotated 14,531 gene products (68.1% of the gene products represented on the EWGO array with 57,912 annotations. GAQ (GO Annotation Quality scores were calculated for this array both before and after we added GO annotation. The additional annotations improved the meanGAQ score 16-fold. This data is publicly available at AgBase http://www.agbase.msstate.edu/. Conclusion Providing

  11. Rapid Bacterial Whole-Genome Sequencing to Enhance Diagnostic and Public Health Microbiology

    Science.gov (United States)

    Reuter, Sandra; Ellington, Matthew J.; Cartwright, Edward J. P.; Köser, Claudio U.; Török, M. Estée; Gouliouris, Theodore; Harris, Simon R.; Brown, Nicholas M.; Holden, Matthew T. G.; Quail, Mike; Parkhill, Julian; Smith, Geoffrey P.; Bentley, Stephen D.; Peacock, Sharon J.

    2014-01-01

    IMPORTANCE The latest generation of benchtop DNA sequencing platforms can provide an accurate whole-genome sequence (WGS) for a broad range of bacteria in less than a day. These could be used to more effectively contain the spread of multidrug-resistant pathogens. OBJECTIVE To compare WGS with standard clinical microbiology practice for the investigation of nosocomial outbreaks caused by multidrug-resistant bacteria, the identification of genetic determinants of antimicrobial resistance, and typing of other clinically important pathogens. DESIGN, SETTING, AND PARTICIPANTS A laboratory-based study of hospital inpatients with a range of bacterial infections at Cambridge University Hospitals NHS Foundation Trust, a secondary and tertiary referral center in England, comparing WGS with standard diagnostic microbiology using stored bacterial isolates and clinical information. MAIN OUTCOMES AND MEASURES Specimens were taken and processed as part of routine clinical care, and cultured isolates stored and referred for additional reference laboratory testing as necessary. Isolates underwent DNA extraction and library preparation prior to sequencing on the Illumina MiSeq platform. Bioinformatic analyses were performed by persons blinded to the clinical, epidemiologic, and antimicrobial susceptibility data. RESULTS We investigated 2 putative nosocomial outbreaks, one caused by vancomycin-resistant Enterococcus faecium and the other by carbapenem-resistant Enterobacter cloacae; WGS accurately discriminated between outbreak and nonoutbreak isolates and was superior to conventional typing methods. We compared WGS with standard methods for the identification of the mechanism of carbapenem resistance in a range of gram-negative bacteria (Acinetobacter baumannii, E cloacae, Escherichia coli, and Klebsiella pneumoniae). This demonstrated concordance between phenotypic and genotypic results, and the ability to determine whether resistance was attributable to the presence of

  12. Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

    Science.gov (United States)

    van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

    2017-10-01

    Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially (P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related.IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is

  13. Mycobacterium tuberculosis Whole Genome Sequences From Southern India Suggest Novel Resistance Mechanisms and the Need for Region-Specific Diagnostics.

    Science.gov (United States)

    Manson, Abigail L; Abeel, Thomas; Galagan, James E; Sundaramurthi, Jagadish Chandrabose; Salazar, Alex; Gehrmann, Thies; Shanmugam, Siva Kumar; Palaniyandi, Kannan; Narayanan, Sujatha; Swaminathan, Soumya; Earl, Ashlee M

    2017-06-01

    India is home to 25% of all tuberculosis cases and the second highest number of multidrug resistant cases worldwide. However, little is known about the genetic diversity and resistance determinants of Indian Mycobacterium tuberculosis, particularly for the primary lineages found in India, lineages 1 and 3. We whole genome sequenced 223 randomly selected M. tuberculosis strains from 196 patients within the Tiruvallur and Madurai districts of Tamil Nadu in Southern India. Using comparative genomics, we examined genetic diversity, transmission patterns, and evolution of resistance. Genomic analyses revealed (11) prevalence of strains from lineages 1 and 3, (11) recent transmission of strains among patients from the same treatment centers, (11) emergence of drug resistance within patients over time, (11) resistance gained in an order typical of strains from different lineages and geographies, (11) underperformance of known resistance-conferring mutations to explain phenotypic resistance in Indian strains relative to studies focused on other geographies, and (11) the possibility that resistance arose through mutations not previously implicated in resistance, or through infections with multiple strains that confound genotype-based prediction of resistance. In addition to substantially expanding the genomic perspectives of lineages 1 and 3, sequencing and analysis of M. tuberculosis whole genomes from Southern India highlight challenges of infection control and rapid diagnosis of resistant tuberculosis using current technologies. Further studies are needed to fully explore the complement of diversity and resistance determinants within endemic M. tuberculosis populations.

  14. Application of Whole Genome Sequencing Technology in the Investigation of Genetic Causes of Fetal, Perinatal, and Early Infant Death.

    Science.gov (United States)

    Armes, Jane E; Williams, Mark; Price, Gareth; Wallis, Tristan; Gallagher, Renee; Matsika, Admire; Joy, Christopher; Galea, Melanie; Gardener, Glenn; Leach, Rick; Swagemakers, Sigrid Ma; Tearle, Rick; Stubbs, Andrew; Harraway, James; van der Spek, Peter J; Venter, Deon J

    2017-01-01

    Death in the fetal, perinatal, and early infant age-group has a multitude of causes, a proportion of which is presumed to be genetic. Defining a specific genetic aberration leading to the death is problematic at this young age, due to limited phenotype-genotype correlation inherent in the underdeveloped phenotype, the inability to assess certain phenotypic traits after death, and the problems of dealing with rare disorders. In this study, our aim was to increase the yield of identification of a defined genetic cause of an early death. Therefore, we employed whole genome sequencing and bioinformatic filtering techniques as a comprehensive, unbiased genetic investigation into 16 fetal, perinatal, and early infant deaths, which had undergone a full autopsy. A likely genetic cause was identified in two cases (in genes; COL2A1 and RYR1) and a speculative genetic cause in a further six cases (in genes: ARHGAP35, BBS7, CASZ1, CRIM1, DHCR7, HADHB, HAPLN3, HSPG2, MYO18B, and SRGAP2). This investigation indicates that whole genome sequencing is a significantly enabling technology when determining genetic causes of early death.

  15. Functional and evolutionary analysis of Korean bob-tailed native dog using whole-genome sequencing data.

    Science.gov (United States)

    Lee, Daehwan; Lim, Dajeong; Kwon, Daehong; Kim, Juyeon; Lee, Jongin; Sim, Mikang; Choi, Bong-Hwan; Choi, Seog-Gyu; Kim, Jaebum

    2017-12-11

    Rapid and cost effective production of large-scale genome data through next-generation sequencing has enabled population-level studies of various organisms to identify their genotypic differences and phenotypic consequences. This is also used to study indigenous animals with historical and economical values, although they are less studied than model organisms. The objective of this study was to perform functional and evolutionary analysis of Korean bob-tailed native dog Donggyeong with distinct tail and agility phenotype using whole-genome sequencing data by using population and comparative genomics approaches. Based on the uniqueness of non-synonymous single nucleotide polymorphisms obtained from next-generation sequencing data, Donggyeong dog-specific genes/proteins and their functions were identified by comparison with 12 other dog breeds and six other related species. These proteins were further divided into subpopulation-specific ones with different tail length and protein interaction-level signatures were investigated. Finally, the trajectory of shaping protein interactions of subpopulation-specific proteins during evolution was uncovered. This study expands our knowledge of Korean native dogs. Our results also provide a good example of using whole-genome sequencing data for population-level analysis in closely related species.

  16. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array

    Science.gov (United States)

    Eeles, Rosalind A; Olama, Ali Amin Al; Benlloch, Sara; Saunders, Edward J; Leongamornlert, Daniel A; Tymrakiewicz, Malgorzata; Ghoussaini, Maya; Luccarini, Craig; Dennis, Joe; Jugurnauth-Little, Sarah; Dadaev, Tokhir; Neal, David E; Hamdy, Freddie C; Donovan, Jenny L; Muir, Ken; Giles, Graham G; Severi, Gianluca; Wiklund, Fredrik; Gronberg, Henrik; Haiman, Christopher A; Schumacher, Fredrick; Henderson, Brian; Le Marchand, Loic; Lindstrom, Sara; Kraft, Peter; Hunter, David J; Gapstur, Susan; Chanock, Stephen J; Berndt, Sonja I; Albanes, Demetrius; Andriole, Gerald; Schleutker, Johanna; Weischer, Maren; Canzian, Federico; Riboli, Elio; Key, Tim J; Travis, Ruth; Campa, Daniele; Ingles, Sue A; John, Esther M; Hayes, Richard B; Pharoah, Paul DP; Pashayan, Nora; Khaw, Kay-Tee; Stanford, Janet; Ostrander, Elaine A; Signorello, Lisa B; Thibodeau, Stephen N; Schaid, Dan; Maier, Christiane; Vogel, Walther; Kibel, Adam S; Cybulski, Cezary; Lubinski, Jan; Cannon-Albright; Brenner, Hermann; Park, Jong Y; Kaneva, Radka; Batra, Jyotsna; Spurdle, Amanda B; Clements, Judith A; Teixeira, Manuel R; Dicks, Ed; Lee, Andrew; Dunning, Alison; Baynes, Caroline; Conroy, Don; Maranian, Melanie J; Ahmed, Shahana; Govindasami, Koveela; Guy, Michelle; Wilkinson, Rosemary A; Sawyer, Emma J; Morgan, Angela; Dearnaley, David P; Horwich, Alan; Huddart, Robert A; Khoo, Vincent S; Parker, Christopher C; Van As, Nicholas J; Woodhouse, J; Thompson, Alan; Dudderidge, Tim; Ogden, Chris; Cooper, Colin; Lophatananon, Artitaya; Cox, Angela; Southey, Melissa; Hopper, John L; English, Dallas R; Aly, Markus; Adolfsson, Jan; Xu, Jiangfeng; Zheng, Siqun; Yeager, Meredith; Kaaks, Rudolf; Diver, W Ryan; Gaudet, Mia M; Stern, Mariana; Corral, Roman; Joshi, Amit D; Shahabi, Ahva; Wahlfors, Tiina; Tammela, Teuvo J; Auvinen, Anssi; Virtamo, Jarmo; Klarskov, Peter; Nordestgaard, Børge G; Røder, Andreas; Nielsen, Sune F; Bojesen, Stig E; Siddiq, Afshan; FitzGerald, Liesel; Kolb, Suzanne; Kwon, Erika; Karyadi, Danielle; Blot, William J; Zheng, Wei; Cai, Qiuyin; McDonnell, Shannon K; Rinckleb, Antje; Drake, Bettina; Colditz, Graham; Wokolorczyk, Dominika; Stephenson, Robert A; Teerlink, Craig; Muller, Heiko; Rothenbacher, Dietrich; Sellers, Thomas A; Lin, Hui-Yi; Slavov, Chavdar; Mitev, Vanio; Lose, Felicity; Srinivasan, Srilakshmi; Maia, Sofia; Paulo, Paula; Lange, Ethan; Cooney, Kathleen A; Antoniou, Antonis; Vincent, Daniel; Bacot, François; Tessier; Kote-Jarai, Zsofia; Easton, Douglas F

    2013-01-01

    Prostate cancer is the most frequently diagnosed cancer in males in developed countries. To identify common prostate cancer susceptibility alleles, we genotyped 211,155 SNPs on a custom Illumina array (iCOGS) in blood DNA from 25,074 prostate cancer cases and 24,272 controls from the international PRACTICAL Consortium. Twenty-three new prostate cancer susceptibility loci were identified at genome-wide significance (P < 5 × 10−8). More than 70 prostate cancer susceptibility loci, explaining ~30% of the familial risk for this disease, have now been identified. On the basis of combined risks conferred by the new and previously known risk loci, the top 1% of the risk distribution has a 4.7-fold higher risk than the average of the population being profiled. These results will facilitate population risk stratification for clinical studies. PMID:23535732

  17. Whole genome analysis of selected human and animal rotaviruses identified in Uganda from 2012 to 2014 reveals complex genome reassortment events between human, bovine, caprine and porcine strains.

    Science.gov (United States)

    Bwogi, Josephine; Jere, Khuzwayo C; Karamagi, Charles; Byarugaba, Denis K; Namuwulya, Prossy; Baliraine, Frederick N; Desselberger, Ulrich; Iturriza-Gomara, Miren

    2017-01-01

    Rotaviruses of species A (RVA) are a common cause of diarrhoea in children and the young of various other mammals and birds worldwide. To investigate possible interspecies transmission of RVAs, whole genomes of 18 human and 6 domestic animal RVA strains identified in Uganda between 2012 and 2014 were sequenced using the Illumina HiSeq platform. The backbone of the human RVA strains had either a Wa- or a DS-1-like genetic constellation. One human strain was a Wa-like mono-reassortant containing a DS-1-like VP2 gene of possible animal origin. All eleven genes of one bovine RVA strain were closely related to those of human RVAs. One caprine strain had a mixed genotype backbone, suggesting that it emerged from multiple reassortment events involving different host species. The porcine RVA strains had mixed genotype backbones with possible multiple reassortant events with strains of human and bovine origin.Overall, whole genome characterisation of rotaviruses found in domestic animals in Uganda strongly suggested the presence of human-to animal RVA transmission, with concomitant circulation of multi-reassortant strains potentially derived from complex interspecies transmission events. However, whole genome data from the human RVA strains causing moderate and severe diarrhoea in under-fives in Uganda indicated that they were primarily transmitted from person-to-person.

  18. Whole genome comparative analysis of four Georgian grape cultivars.

    Science.gov (United States)

    Tabidze, V; Pipia, I; Gogniashvili, M; Kunelauri, N; Ujmajuridze, L; Pirtskhalava, M; Vishnepolsky, B; Hernandez, A G; Fields, C J; Beridze, Tengiz

    2017-12-01

    Grapevine is the one of the most important fruit species in the world. Comparative genome sequencing of grape cultivars is very important for the interpretation of the grape genome and understanding its evolution. The genomes of four Georgian grape cultivars-Chkhaveri, Saperavi, Meskhetian green, and Rkatsiteli, belonging to different haplogroups, were resequenced. The shotgun genomic libraries of grape cultivars were sequenced on an Illumina HiSeq. Pinot Noir nuclear, mitochondrial, and chloroplast DNA were used as reference. Mitochondrial DNA of Chkhaveri closely matches that of the reference Pinot noir mitochondrial DNA, with the exception of 16 SNPs found in the Chkhaveri mitochondrial DNA. The number of SNPs in mitochondrial DNA from Saperavi, Meskhetian green, and Rkatsiteli was 764, 702, and 822, respectively. Nuclear DNA differs from the reference by 1,800,675 nt in Chkhaveri, 1,063,063 nt in Meskhetian green, 2,174,995 in Saperavi, and 5,011,513 in Rkatsiteli. Unlike mtDNA Pinot noir, chromosomal DNA is closer to the Meskhetian green than to other cultivars. Substantial differences in the number of SNPs in mitochondrial and nuclear DNA of Chkhaveri and Pinot noir cultivars are explained by backcrossing or introgression of their wild predecessors before or during the process of domestication. Annotation of chromosomal DNA of Georgian grape cultivars by MEGANTE, a web-based annotation system, shows 66,745 predicted genes (Chkhaveri-17,409; Saperavi-17,021; Meskhetian green-18,355; and Rkatsiteli-13,960). Among them, 106 predicted genes and 43 pseudogenes of terpene synthase genes were found in chromosomes 12, 18 random (18R), and 19. Four novel TPS genes not present in reference Pinot noir DNA were detected. Two of them-germacrene A synthase (Chromosome 18R) and (-) germacrene D synthase (Chromosome 19) can be identified as putatively full-length proteins. This work performs the first attempt of the comparative whole genome analysis of different haplogroups

  19. Whole genome expression profiling in chewing-tobacco-associated oral cancers: a pilot study.

    Science.gov (United States)

    Chakrabarti, Sanjukta; Multani, Shaleen; Dabholkar, Jyoti; Saranath, Dhananjaya

    2015-03-01

    The current study was undertaken with a view to identify differential biomarkers in chewing-tobacco-associated oral cancer tissues in patients of Indian ethnicity. The gene expression profile was analyzed in oral cancer tissues as compared to clinically normal oral buccal mucosa. We examined 30 oral cancer tissues and 27 normal oral tissues with 16 paired samples from contralateral site of the patient and 14 unpaired samples from different oral cancer patients, for whole genome expression using high-throughput IlluminaSentrix Human Ref-8 v2 Expression BeadChip array. The cDNA microarray analysis identified 425 differentially expressed genes with >1.5-fold expression in the oral cancer tissues as compared to normal tissues in the oral cancer patients. Overexpression of 255 genes and downregulation of 170 genes (p TNFSF13B, TMPRSS11A); signal transduction (FOLR2, MME, HTR3B); invasion and metastasis (SPP1, TNFAIP6, EPHB6); differentiation (CLEC4A, ELF5); angiogenesis (CXCL1); apoptosis (GLIPR1, WISP1, DAPL1); and immune responses (CD300A, IFIT2, TREM2); and metabolism (NNMT; ALDH3A1). Besides, several of the genes have been differentially expressed in human cancers including oral cancer. Our data indicated differentially expressed genes in oral cancer tissues and may identify prognostic and therapeutic biomarkers in oral cancers, postvalidation in larger numbers and varied population samples.

  20. Whole-genome sequencing of a laboratory-evolved yeast strain

    Directory of Open Access Journals (Sweden)

    Dunham Maitreya J

    2010-02-01

    Full Text Available Abstract Background Experimental evolution of microbial populations provides a unique opportunity to study evolutionary adaptation in response to controlled selective pressures. However, until recently it has been difficult to identify the precise genetic changes underlying adaptation at a genome-wide scale. New DNA sequencing technologies now allow the genome of parental and evolved strains of microorganisms to be rapidly determined. Results We sequenced >93.5% of the genome of a laboratory-evolved strain of the yeast Saccharomyces cerevisiae and its ancestor at >28× depth. Both single nucleotide polymorphisms and copy number amplifications were found, with specific gains over array-based methodologies previously used to analyze these genomes. Applying a segmentation algorithm to quantify structural changes, we determined the approximate genomic boundaries of a 5× gene amplification. These boundaries guided the recovery of breakpoint sequences, which provide insights into the nature of a complex genomic rearrangement. Conclusions This study suggests that whole-genome sequencing can provide a rapid approach to uncover the genetic basis of evolutionary adaptations, with further applications in the study of laboratory selections and mutagenesis screens. In addition, we show how single-end, short read sequencing data can provide detailed information about structural rearrangements, and generate predictions about the genomic features and processes that underlie genome plasticity.

  1. Whole genomic analysis of G2P[4] human Rotaviruses in Mymensingh, north-central Bangladesh

    Directory of Open Access Journals (Sweden)

    Satoru Aida

    2016-09-01

    Full Text Available Rotavirus A (RVA is a dominant causative agent of acute gastroenteritis in children worldwide. G2P[4] is one of the most common genotypes among human rotavirus (HRV strains, and has been persistently prevalent in South Asia including Bangladesh. In the present study, whole genome sequences of a total of 16 G2P[4] HRV strains (8 strains each in 2010 and 2013 detected in Mymensingh, north-central Bangladesh were determined. These strains had typical DS-1-like genotype constellation. Most of gene segments from DS-1 genogroup exhibited high level sequence identities to each other (>98%, while slight diversity was observed for VP1, VP3, and NSP4 genes. By phylogenetic analysis, individual RNA segments were classified into one (V or two-three lineages (V–VI or V–VII. In terms of lineages (sublineages of 11 gene segments, the 16 Bangladeshi strains could be further classified into four clades (A-D containing 8 lineage constellations, revealing the presence of three clades (A-C with three lineage constellations in 2010, and a single clade (D with four constellations in 2013. Therefore, co-existence of multiple G2P[4] HRV strains with different lineage constellations, and change in clades for the study period were demonstrated. Although amino acids in the antigenic regions on VP7 and VP4 were mostly identical to those of global G2P[4] strains after 2000, VP4 of clade D RVAs in 2013 had alanine and proline at positions 88 and 114, respectively, which are novel substitutions compared with recent global G2P[4] strains. Replacement of lineage constellations associated with unique amino acid changes in the antigenic region in VP4 suggested continuous genetic evolutionary state for emerging new G2P[4] rotavirus strains in Bangladesh.

  2. Whole-Genome Sequencing of Bacterial Pathogens: the Future of Nosocomial Outbreak Analysis.

    Science.gov (United States)

    Quainoo, Scott; Coolen, Jordy P M; van Hijum, Sacha A F T; Huynen, Martijn A; Melchers, Willem J G; van Schaik, Willem; Wertheim, Heiman F L

    2017-10-01

    Outbreaks of multidrug-resistant bacteria present a frequent threat to vulnerable patient populations in hospitals around the world. Intensive care unit (ICU) patients are particularly susceptible to nosocomial infections due to indwelling devices such as intravascular catheters, drains, and intratracheal tubes for mechanical ventilation. The increased vulnerability of infected ICU patients demonstrates the importance of effective outbreak management protocols to be in place. Understanding the transmission of pathogens via genotyping methods is an important tool for outbreak management. Recently, whole-genome sequencing (WGS) of pathogens has become more accessible and affordable as a tool for genotyping. Analysis of the entire pathogen genome via WGS could provide unprecedented resolution in discriminating even highly related lineages of bacteria and revolutionize outbreak analysis in hospitals. Nevertheless, clinicians have long been hesitant to implement WGS in outbreak analyses due to the expensive and cumbersome nature of early sequencing platforms. Recent improvements in sequencing technologies and analysis tools have rapidly increased the output and analysis speed as well as reduced the overall costs of WGS. In this review, we assess the feasibility of WGS technologies and bioinformatics analysis tools for nosocomial outbreak analyses and provide a comparison to conventional outbreak analysis workflows. Moreover, we review advantages and limitations of sequencing technologies and analysis tools and present a real-world example of the implementation of WGS for antimicrobial resistance analysis. We aimed to provide health care professionals with a guide to WGS outbreak analysis that highlights its benefits for hospitals and assists in the transition from conventional to WGS-based outbreak analysis. Copyright © 2017 American Society for Microbiology.

  3. Whole genome association mapping by incompatibilities and local perfect phylogenies

    DEFF Research Database (Denmark)

    Mailund, Thomas; Besenbacher, Søren; Schierup, Mikkel Heide

    2006-01-01

    . Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order...... for this dataset the highest association score is about 60kb from the CYP2D6 gene. Conclusions: Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours....

  4. TCGA's Pan-Cancer Efforts and Expansion to Include Whole Genome Sequence - TCGA

    Science.gov (United States)

    Carolyn Hutter, Ph.D., Program Director of NHGRI's Division of Genomic Medicine, discusses the expansion of TCGA's Pan-Cancer efforts to include the Pan-Cancer Analysis of Whole Genomes (PAWG) project.

  5. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria

    NARCIS (Netherlands)

    Ellington, M.J.; Ekelund, O.; Aarestrup, F.M.; Canton, R.; Doumith, M.; Giske, C.; Grundman, H.; Hasman, H.; Holden, M.T.G.; Hopkins, K.L.; Iredell, J.; Kahlmeter, G.; Köser, C.U.; MacGowan, A.; Mevius, D.; Mulvey, M.; Naas, T.; Peto, T.; Rolain, J.M.; Samuelsen,; Woodford, N.

    2017-01-01

    Whole genome sequencing (WGS) offers the potential to predict antimicrobial susceptibility from a single assay. The European Committee on Antimicrobial Susceptibility Testing established a subcommittee to review the current development status of WGS for bacterial antimicrobial susceptibility testing

  6. Towards a whole-genome sequence for rye (Secale cereale L.)

    National Research Council Canada - National Science Library

    Bauer, Eva; Schmutzer, Thomas; Barilar, Ivan; Mascher, Martin; Gundlach, Heidrun; Martis, Mihaela-Maria; Twardziok, Sven O; Hackauf, Bernd; Gordillo, Andres; Wilde, Peer; Schmidt, Malthe; Korzun, Viktor; Mayer, Klaus F. X; Schmid, Karl; Schoen, Chris-Carolin; Scholz, Uwe

    2017-01-01

    We report on a whole-genome draft sequence of rye (Secale cereale L.). Rye is a diploid Triticeae species closely related to wheat and barley, and an important crop for food and feed in Central and Eastern Europe...

  7. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing

    National Research Council Canada - National Science Library

    Helman, Elena; Lawrence, Michael S; Stewart, Chip; Sougnez, Carrie; Getz, Gad; Meyerson, Matthew

    2014-01-01

    .... Here, we applied TranspoSeq, a computational framework that identifies retrotransposon insertions from sequencing data, to whole genomes from 200 tumor/normal pairs across 11 tumor types as part...

  8. Identification of molecular phenotypic descriptors of breast capsular contracture formation using informatics analysis of the whole genome transcriptome.

    Science.gov (United States)

    Kyle, Daniel J T; Harvey, Alison G; Shih, Barbara; Tan, Kian T; Chaudhry, Iskander H; Bayat, Ardeshir

    2013-01-01

    Breast capsular contracture formation following silicone implant augmentation/reconstruction is a common complication that remains poorly understood. The aim of this study was to identify potential biomarkers implicated in breast capsular contracture formation by using, for the first time, whole genome arrays. Biopsy samples were taken from 18 patients (23 breast capsules) with Baker Grade I-II (Control) and Baker Grade III-IV (Contracted). Whole genome microarrays were performed and six significantly dysregulated genes were selected for further validation with quantitative reverse transcriptase polymerase chain reaction and immunohistochemistry. Hematoxylin and eosin was also carried out to compare the histological characteristics of control and contracted samples. Microarray results showed that aggrecan, tissue inhibitor of metalloproteinase 4 (TIMP4), and tumor necrosis factor superfamily (ligand) member 11 were significantly down-regulated in contracted capsules; while matrix metallopeptidase 12, serum amyloid A 1, and interleukin 8 (IL8) were significantly up-regulated. The dysregulation of aggrecan, tumor necrosis factor superfamily (ligand) member 11, TIMP4, and IL8 was validated by quantitative reverse transcriptase polymerase chain reaction (p contracture formation. IL8 and TIMP4 may serve as potential key diagnostic, therapeutic, and prognostic biomarkers in capsular contracture formation. © 2013 by the Wound Healing Society.

  9. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian

    2015-01-01

    BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate-oligonucleoti......BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate...

  10. HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data

    OpenAIRE

    Nariai, Naoki; Kojima, Kaname; Saito, Sakae; Mimori, Takahiro; Sato, Yukuto; Kawai, Yosuke; Yamaguchi-Kabata, Yumi; Yasuda, Jun; Nagasaki, Masao

    2015-01-01

    Background Human leucocyte antigen (HLA) genes play an important role in determining the outcome of organ transplantation and are linked to many human diseases. Because of the diversity and polymorphisms of HLA loci, HLA typing at high resolution is challenging even with whole-genome sequencing data. Results We have developed a computational tool, HLA-VBSeq, to estimate the most probable HLA alleles at full (8-digit) resolution from whole-genome sequence data. HLA-VBSeq simultaneously optimiz...

  11. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    Directory of Open Access Journals (Sweden)

    Nedenia Bonvino Stafuzza

    Full Text Available Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose, Gyr, Girolando and Holstein (dairy production. A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs and 3,828,041 insertions/deletions (InDels were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  12. Whole genome amplification and real-time PCR in forensic casework

    Directory of Open Access Journals (Sweden)

    Asili Paola

    2009-04-01

    Full Text Available Abstract Background WGA (Whole Genome Amplification in forensic genetics can eliminate the technical limitations arising from low amounts of genomic DNA (gDNA. However, it has not been used to date because any amplification bias generated may complicate the interpretation of results. Our aim in this paper was to assess the applicability of MDA to forensic SNP genotyping by performing a comparative analysis of genomic and amplified DNA samples. A 26-SNPs TaqMan panel specifically designed for low copy number (LCN and/or severely degraded genomic DNA was typed on 100 genomic as well as amplified DNA samples. Results Aliquots containing 1, 0.1 and 0.01 ng each of 100 DNA samples were typed for a 26-SNPs panel. Similar aliquots of the same DNA samples underwent multiple displacement amplification (MDA before being typed for the same panel. Genomic DNA samples showed 0% PCR failure rate for all three dilutions, whilst the PCR failure rate of the amplified DNA samples was 0% for the 1 ng and 0.1 ng dilutions and 0.077% for the 0.01 ng dilution. The genotyping results of both the amplified and genomic DNA samples were also compared with reference genotypes of the same samples obtained by direct sequencing. The genomic DNA samples showed genotype concordance rates of 100% for all three dilutions while the concordance rates of the amplified DNA samples were 100% for the 1 ng and 0.1 ng dilutions and 99.923% for the 0.01 ng dilution. Moreover, ten artificially-degraded DNA samples, which gave no results when analyzed by current forensic methods, were also amplified by MDA and genotyped with 100% concordance. Conclusion We investigated the suitability of MDA material for forensic SNP typing. Comparative analysis of amplified and genomic DNA samples showed that a large number of SNPs could be accurately typed starting from just 0.01 ng of template. We found that the MDA genotyping call and accuracy rates were only slightly lower than those for genomic DNA

  13. Whole genome amplification and real-time PCR in forensic casework

    Science.gov (United States)

    Giardina, Emiliano; Pietrangeli, Ilenia; Martone, Claudia; Zampatti, Stefania; Marsala, Patrizio; Gabriele, Luciano; Ricci, Omero; Solla, Gianluca; Asili, Paola; Arcudi, Giovanni; Spinella, Aldo; Novelli, Giuseppe

    2009-01-01

    Background WGA (Whole Genome Amplification) in forensic genetics can eliminate the technical limitations arising from low amounts of genomic DNA (gDNA). However, it has not been used to date because any amplification bias generated may complicate the interpretation of results. Our aim in this paper was to assess the applicability of MDA to forensic SNP genotyping by performing a comparative analysis of genomic and amplified DNA samples. A 26-SNPs TaqMan panel specifically designed for low copy number (LCN) and/or severely degraded genomic DNA was typed on 100 genomic as well as amplified DNA samples. Results Aliquots containing 1, 0.1 and 0.01 ng each of 100 DNA samples were typed for a 26-SNPs panel. Similar aliquots of the same DNA samples underwent multiple displacement amplification (MDA) before being typed for the same panel. Genomic DNA samples showed 0% PCR failure rate for all three dilutions, whilst the PCR failure rate of the amplified DNA samples was 0% for the 1 ng and 0.1 ng dilutions and 0.077% for the 0.01 ng dilution. The genotyping results of both the amplified and genomic DNA samples were also compared with reference genotypes of the same samples obtained by direct sequencing. The genomic DNA samples showed genotype concordance rates of 100% for all three dilutions while the concordance rates of the amplified DNA samples were 100% for the 1 ng and 0.1 ng dilutions and 99.923% for the 0.01 ng dilution. Moreover, ten artificially-degraded DNA samples, which gave no results when analyzed by current forensic methods, were also amplified by MDA and genotyped with 100% concordance. Conclusion We investigated the suitability of MDA material for forensic SNP typing. Comparative analysis of amplified and genomic DNA samples showed that a large number of SNPs could be accurately typed starting from just 0.01 ng of template. We found that the MDA genotyping call and accuracy rates were only slightly lower than those for genomic DNA. Indeed, when 10 pg of

  14. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture

    DEFF Research Database (Denmark)

    Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang

    2015-01-01

    . Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication...... was also associated with a decreased risk of fracture (odds ratio = 0.85; P = 2 × 10(-11); ncases = 98,742 and ncontrols = 409,511). Using an En1(cre/flox) mouse model, we observed that conditional loss of En1 results in low bone mass, probably as a consequence of high bone turnover. We also identified...

  15. Genetic-linkage mapping of complex hereditary disorders to a whole-genome molecular-interaction network.

    Science.gov (United States)

    Iossifov, Ivan; Zheng, Tian; Baron, Miron; Gilliam, T Conrad; Rzhetsky, Andrey

    2008-07-01

    Common hereditary neurodevelopmental disorders such as autism, bipolar disorder, and schizophrenia are most likely both genetically multifactorial and heterogeneous. Because of these characteristics traditional methods for genetic analysis fail when applied to such diseases. To address the problem we propose a novel probabilistic framework that combines the standard genetic linkage formalism with whole-genome molecular-interaction data to predict pathways or networks of interacting genes that contribute to common heritable disorders. We apply the model to three large genotype-phenotype data sets, identify a small number of significant candidate genes for autism (24), bipolar disorder (21), and schizophrenia (25), and predict a number of gene targets likely to be shared among the disorders.

  16. A flexible and fully integrated system for amplification, detection and genotyping of genomic DNA targets based on microfluidic oligonucleotide arrays.

    Science.gov (United States)

    Summerer, Daniel; Hevroni, Dona; Jain, Amit; Oldenburger, Olga; Parker, Jefferson; Caruso, Anthony; Stähler, Cord F; Stähler, Peer F; Beier, Markus

    2010-05-31

    A strategy allowing for amplification, detection and genotyping of different genomic DNA targets in a single reaction container is described. The method makes use of primer-directed solution-phase amplification with integrated labeling in a closed, microfluidic oligonucleotide array. Selective array probes allow for subsequent detection and genotyping of generated amplicons by hybridization. The array contains up to 15,624 programmable features that can be designed, de novo synthesized and tested within 24 hours using an automated benchtop microarray synthesizer. This enables rapid prototyping and adaptation of the system to newly emerging targets such as pathogenic bacterial or viral subtypes. The system was evaluated by amplifying and detecting different loci of viral (HPV), bacterial (Bacillus sp.) and eukaryotic (human) genomes. Multiplex PCR and semi-quantitative detection with excellent detection limits of automation grade of the system reduces contamination risk and workload and should enhance safety and reproducibility. 2010 Elsevier B.V. All rights reserved.

  17. Whole genome sequencing and evolutionary analysis of human respiratory syncytial virus A and B from Milwaukee, WI 1998-2010.

    Directory of Open Access Journals (Sweden)

    Cecilia Rebuffo-Scheer

    Full Text Available BACKGROUND: Respiratory Syncytial Virus (RSV is the leading cause of lower respiratory-tract infections in infants and young children worldwide. Despite this, only six complete genome sequences of original strains have been previously published, the most recent of which dates back 35 and 26 years for RSV group A and group B respectively. METHODOLOGY/PRINCIPAL FINDINGS: We present a semi-automated sequencing method allowing for the sequencing of four RSV whole genomes simultaneously. We were able to sequence the complete coding sequences of 13 RSV A and 4 RSV B strains from Milwaukee collected from 1998-2010. Another 12 RSV A and 5 RSV B strains sequenced in this study cover the majority of the genome. All RSV A and RSV B sequences were analyzed by neighbor-joining, maximum parsimony and Bayesian phylogeny methods. Genetic diversity was high among RSV A viruses in Milwaukee including the circulation of multiple genotypes (GA1, GA2, GA5, GA7 with GA2 persisting throughout the 13 years of the study. However, RSV B genomes showed little variation with all belonging to the BA genotype. For RSV A, the same evolutionary patterns and clades were seen consistently across the whole genome including all intergenic, coding, and non-coding regions sequences. CONCLUSIONS/SIGNIFICANCE: The sequencing strategy presented in this work allows for RSV A and B genomes to be sequenced simultaneously in two working days and with a low cost. We have significantly increased the amount of genomic data that is available for both RSV A and B, providing the basic molecular characteristics of RSV strains circulating in Milwaukee over the last 13 years. This information can be used for comparative analysis with strains circulating in other communities around the world which should also help with the development of new strategies for control of RSV, specifically vaccine development and improvement of RSV diagnostics.

  18. Whole genome sequencing for typing and characterisation of Listeria monocytogenes isolated in a rabbit meat processing plant

    Science.gov (United States)

    Palma, Federica; Pasquali, Frédérique; Lucchi, Alex; Cesare, Alessandra De; Manfreda, Gerardo

    2017-01-01

    Listeria monocytogenes is a food-borne pathogen able to survive and grow in different environments including food processing plants where it can persist for month or years. In the present study the discriminatory power of Whole Genome Sequencing (WGS)-based analysis (cgMLST) was compared to that of molecular typing methods on 34 L. monocytogenes isolates collected over one year in the same rabbit meat processing plant and belonging to three genotypes (ST14, ST121, ST224). Each genotype included isolates indistinguishable by standard molecular typing methods. The virulence potential of all isolates was assessed by Multi Virulence-Locus Sequence Typing (MVLST) and the investigation of a representative database of virulence determinant genes. The whole genome of each isolate was sequenced on a MiSeq platform. The cgMLST, MVLST, and in silico identification of virulence genes were performed using publicly available tools. Draft genomes included a number of contigs ranging from 13 to 28 and N50 ranging from 456298 to 580604. The coverage ranged from 41 to 187X. The cgMLST showed a significantly superior discriminatory power only in comparison to ribotyping, nevertheless it allows the detection of two singletons belonging to ST14 that were not observed by other molecular methods. All ST14 isolates belonged to VT107, which 7-loci concatenated sequence differs for only 4 nucleotides to VT1 (Epidemic clone III). Analysis of virulence genes showed the presence of a fulllength inlA version in all ST14 isolates and of a mutated version including a premature stop codon (PMSC) associated to attenuated virulence in all ST121 isolates. PMID:29071246

  19. Whole genome sequencing for typing and characterisation ofListeria monocytogenesisolated in a rabbit meat processing plant.

    Science.gov (United States)

    Palma, Federica; Pasquali, Frédérique; Lucchi, Alex; Cesare, Alessandra De; Manfreda, Gerardo

    2017-08-16

    Listeria monocytogenes is a food-borne pathogen able to survive and grow in different environments including food processing plants where it can persist for month or years. In the present study the discriminatory power of Whole Genome Sequencing (WGS)-based analysis (cgMLST) was compared to that of molecular typing methods on 34 L. monocytogenes isolates collected over one year in the same rabbit meat processing plant and belonging to three genotypes (ST14, ST121, ST224). Each genotype included isolates indistinguishable by standard molecular typing methods. The virulence potential of all isolates was assessed by Multi Virulence-Locus Sequence Typing (MVLST) and the investigation of a representative database of virulence determinant genes. The whole genome of each isolate was sequenced on a MiSeq platform. The cgMLST, MVLST, and in silico identification of virulence genes were performed using publicly available tools. Draft genomes included a number of contigs ranging from 13 to 28 and N50 ranging from 456298 to 580604. The coverage ranged from 41 to 187X. The cgMLST showed a significantly superior discriminatory power only in comparison to ribotyping, nevertheless it allows the detection of two singletons belonging to ST14 that were not observed by other molecular methods. All ST14 isolates belonged to VT107, which 7-loci concatenated sequence differs for only 4 nucleotides to VT1 (Epidemic clone III). Analysis of virulence genes showed the presence of a fulllength inlA version in all ST14 isolates and of a mutated version including a premature stop codon (PMSC) associated to attenuated virulence in all ST121 isolates.

  20. Whole genome sequencing for typing and characterisation of Listeria monocytogenes isolated in a rabbit meat processing plant

    Directory of Open Access Journals (Sweden)

    Federica Palma

    2017-09-01

    Full Text Available Listeria monocytogenes is a food-borne pathogen able to survive and grow in different environments including food processing plants where it can persist for month or years. In the present study the discriminatory power of Whole Genome Sequencing (WGS-based analysis (cgMLST was compared to that of molecular typing methods on 34 L. monocytogenes isolates collected over one year in the same rabbit meat processing plant and belonging to three genotypes (ST14, ST121, ST224. Each genotype included isolates indistinguishable by standard molecular typing methods. The virulence potential of all isolates was assessed by Multi Virulence-Locus Sequence Typing (MVLST and the investigation of a representative database of virulence determinant genes. The whole genome of each isolate was sequenced on a MiSeq platform. The cgMLST, MVLST, and in silico identification of virulence genes were performed using publicly available tools. Draft genomes included a number of contigs ranging from 13 to 28 and N50 ranging from 456298 to 580604. The coverage ranged from 41 to 187X. The cgMLST showed a significantly superior discriminatory power only in comparison to ribotyping, nevertheless it allows the detection of two singletons belonging to ST14 that were not observed by other molecular methods. All ST14 isolates belonged to VT107, which 7-loci concatenated sequence differs for only 4 nucleotides to VT1 (Epidemic clone III. Analysis of virulence genes showed the presence of a fulllength inlA version in all ST14 isolates and of a mutated version including a premature stop codon (PMSC associated to attenuated virulence in all ST121 isolates.

  1. Impact of whole-genome amplification on the reliability of pre-transfer cattle embryo breeding value estimates.

    Science.gov (United States)

    Shojaei Saadi, Habib A; Vigneault, Christian; Sargolzaei, Mehdi; Gagné, Dominic; Fournier, Éric; de Montera, Béatrice; Chesnais, Jacques; Blondin, Patrick; Robert, Claude

    2014-10-12

    Genome-wide profiling of single-nucleotide polymorphisms is receiving increasing attention as a method of pre-implantation genetic diagnosis in humans and of commercial genotyping of pre-transfer embryos in cattle. However, the very small quantity of genomic DNA in biopsy material from early embryos poses daunting technical challenges. A reliable whole-genome amplification (WGA) procedure would greatly facilitate the procedure. Several PCR-based and non-PCR based WGA technologies, namely multiple displacement amplification, quasi-random primed library synthesis followed by PCR, ligation-mediated PCR, and single-primer isothermal amplification were tested in combination with different DNA extractions protocols for various quantities of genomic DNA inputs. The efficiency of each method was evaluated by comparing the genotypes obtained from 15 cultured cells (representative of an embryonic biopsy) to unamplified reference gDNA. The gDNA input, gDNA extraction method and amplification technology were all found to be critical for successful genome-wide genotyping. The selected WGA platform was then tested on embryo biopsies (n = 226), comparing their results to that of biopsies collected after birth. Although WGA inevitably leads to a random loss of information and to the introduction of erroneous genotypes, following genomic imputation the resulting genetic index of both sources of DNA were highly correlated (r = 0.99, P<0.001). It is possible to generate high-quality DNA in sufficient quantities for successful genome-wide genotyping starting from an early embryo biopsy. However, imputation from parental and population genotypes is a requirement for completing and correcting genotypic data. Judicious selection of the WGA platform, careful handling of the samples and genomic imputation together, make it possible to perform extremely reliable genomic evaluations for pre-transfer embryos.

  2. Whole genome sequencing reveals genetic heterogeneity of G3P[8] rotaviruses circulating in Italy.

    Science.gov (United States)

    Medici, Maria Cristina; Tummolo, Fabio; Martella, Vito; Arcangeletti, Maria Cristina; De Conto, Flora; Chezzi, Carlo; Magrì, Alessandro; Fehér, Enikő; Marton, Szilvia; Calderaro, Adriana; Bányai, Krisztián

    2016-06-01

    After a sporadic detection in 1990s, G3P[8] rotaviruses emerged as a predominant genotype during recent years in many areas worldwide, including parts of Italy. The present study describes the molecular epidemiology and evolution of G3P[8] rotaviruses detected in Italian children with gastroenteritis during two survey periods (2004-2005 and 2008-2013). Whole genome of selected G3P[8] strains was determined and antigenic differences between these strains and rotavirus vaccine strains were analyzed. Among 819 (271 in 2004-2005 and 548 in 2008-2013) rotaviruses genotyped during the survey periods, the number of G3P[8] rotavirus markedly varied over the years (0/83 in 2004, 30/188 in 2005 and 0/96 in 2008, 6/88 in 2009, 4/97 in 2010, 0/83 in 2011, 9/82 in 2012, 56/102 cases in 2013). The genotypes of the 11 gene segments of 15 selected strains were assigned to G3-P[8]-I1-R1-C1-M1-A1-N1-T1-E1-H1; thus all strains belonged to the Wa genogroup. Phylogenetic analysis of the Italian G3P[8] strains showed a peculiar picture of segregation with a 2012 lineage for VP1-VP3, NSP1, NSP2, NSP4 and NSP5 genes and a 2013 lineage for VP6, NSP1 and NSP3 genes, with a 1.3-20.2% nucleotide difference from the oldest Italian G3P[8] strains. The genetic variability of the Italian G3P[8] observed in comparison with sequences of rotaviruses available in GenBank suggested a process of selection acting on a global scale, rather than the emergence of local strains, as several lineages were already circulating globally. Compared with the vaccine strains, the Italian G3P[8] rotaviruses segregated in different lineages (5-5.3% and 7.2-11.4% nucleotide differences in the VP7 and VP4, respectively) with some mismatches in the putative neutralizing epitopes of VP7 and VP4 antigens. The accumulation of point mutations and amino acid differences between vaccine strains and currently circulating rotaviruses might generate, over the years, vaccine-resistant variants. Copyright © 2016 Elsevier B.V. All

  3. Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery

    Directory of Open Access Journals (Sweden)

    Stothard Paul

    2011-11-01

    Full Text Available Abstract Background One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle. Results The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs, 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs. Ten

  4. Strategies for Enriching Variant Coverage in Candidate Disease Loci on a Multiethnic Genotyping Array.

    Directory of Open Access Journals (Sweden)

    Stephanie A Bien

    Full Text Available Investigating genetic architecture of complex traits in ancestrally diverse populations is imperative to understand the etiology of disease. However, the current paucity of genetic research in people of African and Latin American ancestry, Hispanic and indigenous peoples in the United States is likely to exacerbate existing health disparities for many common diseases. The Population Architecture using Genomics and Epidemiology, Phase II (PAGE II, Study was initiated in 2013 by the National Human Genome Research Institute to expand our understanding of complex trait loci in ethnically diverse and well characterized study populations. To meet this goal, the Multi-Ethnic Genotyping Array (MEGA was designed to substantially improve fine-mapping and functional discovery by increasing variant coverage across multiple ethnicities at known loci for metabolic, cardiovascular, renal, inflammatory, anthropometric, and a variety of lifestyle traits. Studying the frequency distribution of clinically relevant mutations, putative risk alleles, and known functional variants across multiple populations will provide important insight into the genetic architecture of complex diseases and facilitate the discovery of novel, sometimes population-specific, disease associations. DNA samples from 51,650 self-identified African ancestry (17,328, Hispanic/Latino (22,379, Asian/Pacific Islander (8,640, and American Indian (653 and an additional 2,650 participants of either South Asian or European ancestry, and other reference panels have been genotyped on MEGA by PAGE II. MEGA was designed as a new resource for studying ancestrally diverse populations. Here, we describe the methodology for selecting trait-specific content for use in multi-ethnic populations and how enriching MEGA for this content may contribute to deeper biological understanding of the genetic etiology of complex disease.

  5. Whole-genome sequencing overcomes pseudogene homology to diagnose autosomal dominant polycystic kidney disease.

    Science.gov (United States)

    Mallawaarachchi, Amali C; Hort, Yvonne; Cowley, Mark J; McCabe, Mark J; Minoche, André; Dinger, Marcel E; Shine, John; Furlong, Timothy J

    2016-11-01

    Autosomal dominant polycystic kidney disease (ADPKD) is the most common monogenic kidney disorder and is due to disease-causing variants in PKD1 or PKD2. Strong genotype-phenotype correlation exists although diagnostic sequencing is not part of routine clinical practice. This is because PKD1 bears 97.7% sequence similarity with six pseudogenes, requiring laborious and error-prone long-range PCR and Sanger sequencing to overcome. We hypothesised that whole-genome sequencing (WGS) would be able to overcome the problem of this sequence homology, because of 150 bp, paired-end reads and avoidance of capture bias that arises from targeted sequencing. We prospectively recruited a cohort of 28 unique pedigrees with ADPKD phenotype. Standard DNA extraction, library preparation and WGS were performed using Illumina HiSeq X and variants were classified following standard guidelines. Molecular diagnosis was made in 24 patients (86%), with 100% variant confirmation by current gold standard of long-range PCR and Sanger sequencing. We demonstrated unique alignment of sequencing reads over the pseudogene-homologous region. In addition to identifying function-affecting single-nucleotide variants and indels, we identified single- and multi-exon deletions affecting PKD1 and PKD2, which would have been challenging to identify using exome sequencing. We report the first use of WGS to diagnose ADPKD. This method overcomes pseudogene homology, provides uniform coverage, detects all variant types in a single test and is less labour-intensive than current techniques. This technique is translatable to a diagnostic setting, allows clinicians to make better-informed management decisions and has implications for other disease groups that are challenged by regions of confounding sequence homology.

  6. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.

    Science.gov (United States)

    Belkadi, Aziz; Bolze, Alexandre; Itan, Yuval; Cobat, Aurélie; Vincent, Quentin B; Antipenko, Alexander; Shang, Lei; Boisson, Bertrand; Casanova, Jean-Laurent; Abel, Laurent

    2015-04-28

    We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels was identified exclusively by WES whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger-sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656 variants, ∼3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26 variants). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy-number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs.

  7. The American cranberry: first insights into the whole genome of a species adapted to bog habitat.

    Science.gov (United States)

    Polashock, James; Zelzion, Ehud; Fajardo, Diego; Zalapa, Juan; Georgi, Laura; Bhattacharya, Debashish; Vorsa, Nicholi

    2014-06-13

    The American cranberry (Vaccinium macrocarpon Ait.) is one of only three widely-cultivated fruit crops native to North America- the other two are blueberry (Vaccinium spp.) and native grape (Vitis spp.). In terms of taxonomy, cranberries are in the core Ericales, an order for which genome sequence data are currently lacking. In addition, cranberries produce a host of important polyphenolic secondary compounds, some of which are beneficial to human health. Whereas next-generation sequencing technology is allowing the advancement of whole-genome sequencing, one major obstacle to the successful assembly from short-read sequence data of complex diploid (and higher ploidy) organisms is heterozygosity. Cranberry has the advantage of being diploid (2n = 2x = 24) and self-fertile. To minimize the issue of heterozygosity, we sequenced the genome of a fifth-generation inbred genotype (F ≥ 0.97) derived from five generations of selfing originating from the cultivar Ben Lear. The genome size of V. macrocarpon has been estimated to be about 470 Mb. Genomic sequences were assembled into 229,745 scaffolds representing 420 Mbp (N50 = 4,237 bp) with 20X average coverage. The number of predicted genes was 36,364 and represents 17.7% of the assembled genome. Of the predicted genes, 30,090 were assigned to candidate genes based on homology. Genes supported by transcriptome data totaled 13,170 (36%). Shotgun sequencing of the cranberry genome, with an average sequencing coverage of 20X, allowed efficient assembly and gene calling. The candidate genes identified represent a useful collection to further study important biochemical pathways and cellular processes and to use for marker development for breeding and the study of horticultural characteristics, such as disease resistance.

  8. Whole-genome DNA methylation characteristics in pediatric precursor B cell acute lymphoblastic leukemia (BCP ALL.

    Directory of Open Access Journals (Sweden)

    Radosław Chaber

    Full Text Available In addition to genetic alterations, epigenetic abnormalities have been shown to underlie the pathogenesis of acute lymphoblastic leukemia (ALL-the most common pediatric cancer. The purpose of this study was to characterize the whole genome DNA methylation profile in children with precursor B-cell ALL (BCP ALL and to compare this profile with methylation observed in normal bone marrow samples. Additional efforts were made to correlate the observed methylation patterns with selected clinical features. We assessed DNA methylation from bone marrow samples obtained from 38 children with BCP ALL at the time of diagnosis along with 4 samples of normal bone marrow cells as controls using Infinium MethylationEPIC BeadChip Array. Patients were diagnosed and stratified into prognosis groups according to the BFM ALL IC 2009 protocol. The analysis of differentially methylated sites across the genome as well as promoter methylation profiles allowed clear separation of the leukemic and control samples into two clusters. 86.6% of the promoter-associated differentially methylated sites were hypermethylated in BCP ALL. Seven sites were found to correlate with the BFM ALL IC 2009 high risk group. Amongst these, one was located within the gene body of the MBP gene and another was within the promoter region- PSMF1 gene. Differentially methylated sites that were significantly related with subsets of patients with ETV6-RUNX1 fusion and hyperdiploidy. The analyzed translocations and change of genes' sequence context does not affect methylation and methylation seems not to be a mechanism for the regulation of expression of the resulting fusion genes.

  9. Recent advances in understanding the roles of whole genome duplications in evolution.

    Science.gov (United States)

    MacKintosh, Carol; Ferrier, David E K

    2017-01-01

    Ancient whole-genome duplications (WGDs)- paleo polyploidy events-are key to solving Darwin's 'abominable mystery' of how flowering plants evolved and radiated into a rich variety of species. The vertebrates also emerged from their invertebrate ancestors via two WGDs, and genomes of diverse gymnosperm trees, unicellular eukaryotes, invertebrates, fishes, amphibians and even a rodent carry evidence of lineage-specific WGDs. Modern polyploidy is common in eukaryotes, and it can be induced, enabling mechanisms and short-term cost-benefit assessments of polyploidy to be studied experimentally. However, the ancient WGDs can be reconstructed only by comparative genomics: these studies are difficult because the DNA duplicates have been through tens or hundreds of millions of years of gene losses, mutations, and chromosomal rearrangements that culminate in resolution of the polyploid genomes back into diploid ones (rediploidisation). Intriguing asymmetries in patterns of post-WGD gene loss and retention between duplicated sets of chromosomes have been discovered recently, and elaborations of signal transduction systems are lasting legacies from several WGDs. The data imply that simpler signalling pathways in the pre-WGD ancestors were converted via WGDs into multi-stranded parallelised networks. Genetic and biochemical studies in plants, yeasts and vertebrates suggest a paradigm in which different combinations of sister paralogues in the post-WGD regulatory networks are co-regulated under different conditions. In principle, such networks can respond to a wide array of environmental, sensory and hormonal stimuli and integrate them to generate phenotypic variety in cell types and behaviours. Patterns are also being discerned in how the post-WGD signalling networks are reconfigured in human cancers and neurological conditions. It is fascinating to unpick how ancient genomic events impact on complexity, variety and disease in modern life.

  10. Whole-genome sequencing reveals complex mechanisms of intrinsic resistance to BRAF inhibition.

    Science.gov (United States)

    Turajlic, S; Furney, S J; Stamp, G; Rana, S; Ricken, G; Oduko, Y; Saturno, G; Springer, C; Hayes, A; Gore, M; Larkin, J; Marais, R

    2014-05-01

    BRAF is mutated in ∼42% of human melanomas (COSMIC. http://www.sanger.ac.uk/genetics/CGP/cosmic/) and pharmacological BRAF inhibitors such as vemurafenib and dabrafenib achieve dramatic responses in patients whose tumours harbour BRAF(V600) mutations. Objective responses occur in ∼50% of patients and disease stabilisation in a further ∼30%, but ∼20% of patients present primary or innate resistance and do not respond. Here, we investigated the underlying cause of treatment failure in a patient with BRAF mutant melanoma who presented primary resistance. We carried out whole-genome sequencing and single nucleotide polymorphism (SNP) array analysis of five metastatic tumours from the patient. We validated mechanisms of resistance in a cell line derived from the patient's tumour. We observed that the majority of the single-nucleotide variants identified were shared across all tumour sites, but also saw site-specific copy-number alterations in discrete cell populations at different sites. We found that two ubiquitous mutations mediated resistance to BRAF inhibition in these tumours. A mutation in GNAQ sustained mitogen-activated protein kinase (MAPK) signalling, whereas a mutation in PTEN activated the PI3 K/AKT pathway. Inhibition of both pathways synergised to block the growth of the cells. Our analyses show that the five metastases arose from a common progenitor and acquired additional alterations after disease dissemination. We demonstrate that a distinct combination of mutations mediated primary resistance to BRAF inhibition in this patient. These mutations were present in all five tumours and in a tumour sample taken before BRAF inhibitor treatment was administered. Inhibition of both pathways was required to block tumour cell growth, suggesting that combined targeting of these pathways could have been a valid therapeutic approach for this patient.

  11. Whole-genome sequences of Chlamydia trachomatis directly from clinical samples without culture

    Science.gov (United States)

    Seth-Smith, Helena M.B.; Harris, Simon R.; Skilton, Rachel J.; Radebe, Frans M.; Golparian, Daniel; Shipitsyna, Elena; Duy, Pham Thanh; Scott, Paul; Cutcliffe, Lesley T.; O’Neill, Colette; Parmar, Surendra; Pitt, Rachel; Baker, Stephen; Ison, Catherine A.; Marsh, Peter; Jalal, Hamid; Lewis, David A.; Unemo, Magnus; Clarke, Ian N.; Parkhill, Julian; Thomson, Nicholas R.

    2013-01-01

    The use of whole-genome sequencing as a tool for the study of infectious bacteria is of growing clinical interest. Chlamydia trachomatis is responsible for sexually transmitted infections and the blinding disease trachoma, which affect hundreds of millions of people worldwide. Recombination is widespread within the genome of C. trachomatis, thus whole-genome sequencing is necessary to understand the evolution, diversity, and epidemiology of this pathogen. Culture of C. trachomatis has, until now, been a prerequisite to obtain DNA for whole-genome sequencing; however, as C. trachomatis is an obligate intracellular pathogen, this procedure is technically demanding and time consuming. Discarded clinical samples represent a large resource for sequencing the genomes of pathogens, yet clinical swabs frequently contain very low levels of C. trachomatis DNA and large amounts of contaminating microbial and human DNA. To determine whether it is possible to obtain whole-genome sequences from bacteria without the need for culture, we have devised an approach that combines immunomagnetic separation (IMS) for targeted bacterial enrichment with multiple displacement amplification (MDA) for whole-genome amplification. Using IMS-MDA in conjunction with high-throughput multiplexed Illumina sequencing, we have produced the first whole bacterial genome sequences direct from clinical samples. We also show that this method can be used to generate genome data from nonviable archived samples. This method will prove a useful tool in answering questions relating to the biology of many difficult-to-culture or fastidious bacteria of clinical concern. PMID:23525359

  12. Whole genome sequencing in the prevention and control of Staphylococcus aureus infection.

    Science.gov (United States)

    Price, J R; Didelot, X; Crook, D W; Llewelyn, M J; Paul, J

    2013-01-01

    Staphylococcus aureus remains a leading cause of hospital-acquired infection but weaknesses inherent in currently available typing methods impede effective infection prevention and control. The high resolution offered by whole genome sequencing has the potential to revolutionise our understanding and management of S. aureus infection. To outline the practicalities of whole genome sequencing and discuss how it might shape future infection control practice. We review conventional typing methods and compare these with the potential offered by whole genome sequencing. In contrast with conventional methods, whole genome sequencing discriminates down to single nucleotide differences and allows accurate characterisation of transmission events and outbreaks and additionally provides information about the genetic basis of phenotypic characteristics, including antibiotic susceptibility and virulence. However, translating its potential into routine practice will depend on affordability, acceptable turnaround times and on creating a reliable standardised bioinformatic infrastructure. Whole genome sequencing has the potential to provide a universal test that facilitates outbreak investigation, enables the detection of emerging strains and predicts their clinical importance. Copyright © 2012 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.

  13. A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6.

    Science.gov (United States)

    Bengtsson, Henrik; Wirapati, Pratyaksha; Speed, Terence P

    2009-09-01

    High-resolution copy-number (CN) analysis has in recent years gained much attention, not only for the purpose of identifying CN aberrations associated with a certain phenotype, but also for identifying CN polymorphisms. In order for such studies to be successful and cost effective, the statistical methods have to be optimized. We propose a single-array preprocessing method for estimating full-resolution total CNs. It is applicable to all Affymetrix genotyping arrays, including the recent ones that also contain non-polymorphic probes. A reference signal is only needed at the last step when calculating relative CNs. As with our method for earlier generations of arrays, this one controls for allelic crosstalk, probe affinities and PCR fragment-length effects. Additionally, it also corrects for probe sequence effects and co-hybridization of fragments digested by multiple enzymes that takes place on the latest chips. We compare our method with Affymetrix's CN5 method and the dChip method by assessing how well they differentiate between various CN states at the full resolution and various amounts of smoothing. Although CRMA v2 is a single-array method, we observe that it performs as well as or better than alternative methods that use data from all arrays for their preprocessing. This shows that it is possible to do online analysis in large-scale projects where additional arrays are introduced over time.

  14. Whole Genome Sequence Analysis of Salmonella Typhi Isolated in Thailand before and after the Introduction of a National Immunization Program.

    Directory of Open Access Journals (Sweden)

    Zoe A Dyson

    2017-01-01

    Full Text Available Vaccines against Salmonella Typhi, the causative agent of typhoid fever, are commonly used by travellers, however, there are few examples of national immunization programs in endemic areas. There is therefore a paucity of data on the impact of typhoid immunization programs on localised populations of S. Typhi. Here we have used whole genome sequencing (WGS to characterise 44 historical bacterial isolates collected before and after a national typhoid immunization program that was implemented in Thailand in 1977 in response to a large outbreak; the program was highly effective in reducing typhoid case numbers. Thai isolates were highly diverse, including 10 distinct phylogenetic lineages or genotypes. Novel prophage and plasmids were also detected, including examples that were previously only reported in Shigella sonnei and Escherichia coli. The majority of S. Typhi genotypes observed prior to the immunization program were not observed following it. Post-vaccine era isolates were more closely related to S. Typhi isolated from neighbouring countries than to earlier Thai isolates, providing no evidence for the local persistence of endemic S. Typhi following the national immunization program. Rather, later cases of typhoid appeared to be caused by the occasional importation of common genotypes from neighbouring Vietnam, Laos, and Cambodia. These data show the value of WGS in understanding the impacts of vaccination on pathogen populations and provide support for the proposal that large-scale typhoid immunization programs in endemic areas could result in lasting local disease elimination, although larger prospective studies are needed to test this directly.

  15. Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly

    Energy Technology Data Exchange (ETDEWEB)

    Shou, S. [Univ. Wisc.-Madison; Kvikstad, E. [Univ. Wisc.-Madison; Kile, A. [Univ. Wisc.-Madison; Severin, J. [Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly; Forrest, D. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Hickman, J. W. [Univ. Wisc.-Madison; Mackenzie, C. [University of Texas–Houston Medical School; Choudhary, M. [University of Texas–Houston Medical School; Donohue, T. [Univ. Wisc.-Madison; Kaplan, S. [University of Texas–Houston Medical School; Schwartz, D. C. [Univ. Wisc.-Madison

    2003-09-01

    Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verification of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.

  16. Whole Genome Sequencing Based Characterization of Extensively Drug-Resistant Mycobacterium tuberculosis Isolates from Pakistan

    KAUST Repository

    Ali, Asho

    2015-02-26

    Improved molecular diagnostic methods for detection drug resistance in Mycobacterium tuberculosis (MTB) strains are required. Resistance to first- and second- line anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs) in particular genes. However, these SNPs can vary between MTB lineages therefore local data is required to describe different strain populations. We used whole genome sequencing (WGS) to characterize 37 extensively drug-resistant (XDR) MTB isolates from Pakistan and investigated 40 genes associated with drug resistance. Rifampicin resistance was attributable to SNPs in the rpoB hot-spot region. Isoniazid resistance was most commonly associated with the katG codon 315 (92%) mutation followed by inhA S94A (8%) however, one strain did not have SNPs in katG, inhA or oxyR-ahpC. All strains were pyrazimamide resistant but only 43% had pncA SNPs. Ethambutol resistant strains predominantly had embB codon 306 (62%) mutations, but additional SNPs at embB codons 406, 378 and 328 were also present. Fluoroquinolone resistance was associated with gyrA 91-94 codons in 81% of strains; four strains had only gyr B mutations, while others did not have SNPs in either gyrA or gyrB. Streptomycin resistant strains had mutations in ribosomal RNA genes; rpsL codon 43 (42%); rrs 500 region (16%), and gidB (34%) while six strains did not have mutations in any of these genes. Amikacin/kanamycin/capreomycin resistance was associated with SNPs in rrs at nt1401 (78%) and nt1484 (3%), except in seven (19%) strains. We estimate that if only the common hot-spot region targets of current commercial assays were used, the concordance between phenotypic and genotypic testing for these XDR strains would vary between rifampicin (100%), isoniazid (92%), flouroquinolones (81%), aminoglycoside (78%) and ethambutol (62%); while pncA sequencing would provide genotypic resistance in less than half the isolates. This work highlights the importance of expanded

  17. Whole Genome Analysis of a Wine Yeast Strain

    Science.gov (United States)

    Hauser, Nicole C.; Fellenberg, Kurt; Gil, Rosario; Bastuck, Sonja; Hoheisel, Jörg D.

    2001-01-01

    Saccharomyces cerevisiae strains frequently exhibit rather specific phenotypic features needed for adaptation to a special environment. Wine yeast strains are able to ferment musts, for example, while other industrial or laboratory strains fail to do so. The genetic differences that characterize wine yeast strains are poorly understood, however. As a first search of genetic differences between wine and laboratory strains, we performed DNA-array analyses on the typical wine yeast strain T73 and the standard laboratory background in S288c. Our analysis shows that even under normal conditions, logarithmic growth in YPD medium, the two strains have expression patterns that differ significantly in more than 40 genes. Subsequent studies indicated that these differences correlate with small changes in promoter regions or variations in gene copy number. Blotting copy numbers vs. transcript levels produced patterns, which were specific for the individual strains and could be used for a characterization of unknown samples. PMID:18628902

  18. Whole genome association mapping by incompatibilities and local perfect phylogenies

    Directory of Open Access Journals (Sweden)

    Besenbacher Søren

    2006-10-01

    Full Text Available Abstract Background With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed. Results We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1 simulated genotype data under different models of disease determination 2 artificial data sets created from the HapMap ressource, and 3 data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM

  19. Large-scale whole genome sequencing identifies country-wide spread of an emerging G9P[8] rotavirus strain in Hungary, 2012.

    Science.gov (United States)

    Dóró, Renáta; Mihalov-Kovács, Eszter; Marton, Szilvia; László, Brigitta; Deák, Judit; Jakab, Ferenc; Juhász, Ágnes; Kisfali, Péter; Martella, Vito; Melegh, Béla; Molnár, Péter; Sántha, Ildikó; Schneider, Ferenc; Bányai, Krisztián

    2014-12-01

    With the availability of rotavirus vaccines routine strain surveillance has been launched or continued in many countries worldwide. In this study relevant information is provided from Hungary in order to extend knowledge about circulating rotavirus strains. Direct sequencing of the RT-PCR products obtained by VP7 and VP4 genes specific primer sets was utilized as routine laboratory method. In addition we explored the advantage of random primed RT-PCR and semiconductor sequencing of the whole genome of selected strains. During the study year, 2012, we identified an increase in the prevalence of G9P[8] strains across the country. This genotype combination predominated in seven out of nine study sites (detection rates, 45-83%). In addition to G9P[8]s, epidemiologically major strains included genotypes G1P[8] (34.2%), G2P[4] (13.5%), and G4P[8] (7.4%), whereas unusual and rare strains were G3P[8] (1%), G2P[8] (0.5%), G1P[4] (0.2%), G3P[4] (0.2%), and G3P[9] (0.2%). Whole genome analysis of 125 Hungarian human rotaviruses identified nine major genotype constellations and uncovered both intra- and intergenogroup reassortment events in circulating strains. Intergenogroup reassortment resulted in several unusual genotype constellations, including mono-reassortant G1P[8] and G9P[8] strains whose genotype 1 (Wa-like) backbone gene constellations contained DS1-like NSP2 and VP3 genes, respectively, as well as, a putative bovine-feline G3P[9] reassortant strain. The conserved genomic constellations of epidemiologically major genotypes suggested the clonal spread of the re-emerging G9P[8] genotype and several co-circulating strains (e.g., G1P[8] and G2P[4]) in many study sites during 2012. Of interest, medically important G2P[4] strains carried bovine-like VP1 and VP6 genes in their genotype constellation. No evidence for vaccine associated selection, or, interaction between wild-type and vaccine strains was obtained. In conclusion, this study reports the reemergence of G9P[8

  20. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Soybean (Glycine max) SWEET gene family: insights through comparative genomics, transcriptome profiling and whole genome re-sequence analysis.

    Science.gov (United States)

    Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T

    2015-07-11

    SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research

  2. Whole-Genome SNP Association in the Horse: Identification of a Deletion in Myosin Va Responsible for Lavender Foal Syndrome

    Science.gov (United States)

    Brooks, Samantha A.; Gabreski, Nicole; Miller, Donald; Brisbin, Abra; Brown, Helen E.; Streeter, Cassandra; Mezey, Jason; Cook, Deborah; Antczak, Douglas F.

    2010-01-01

    Lavender Foal Syndrome (LFS) is a lethal inherited disease of horses with a suspected autosomal recessive mode of inheritance. LFS has been primarily diagnosed in a subgroup of the Arabian breed, the Egyptian Arabian horse. The condition is characterized by multiple neurological abnormalities and a dilute coat color. Candidate genes based on comparative phenotypes in mice and humans include the ras-associated protein RAB27a (RAB27A) and myosin Va (MYO5A). Here we report mapping of the locus responsible for LFS using a small set of 36 horses segregating for LFS. These horses were genotyped using a newly available single nucleotide polymorphism (SNP) chip containing 56,402 discriminatory elements. The whole genome scan identified an associated region containing these two functional candidate genes. Exon sequencing of the MYO5A gene from an affected foal revealed a single base deletion in exon 30 that changes the reading frame and introduces a premature stop codon. A PCR–based Restriction Fragment Length Polymorphism (PCR–RFLP) assay was designed and used to investigate the frequency of the mutant gene. All affected horses tested were homozygous for this mutation. Heterozygous carriers were detected in high frequency in families segregating for this trait, and the frequency of carriers in unrelated Egyptian Arabians was 10.3%. The mapping and discovery of the LFS mutation represents the first successful use of whole-genome SNP scanning in the horse for any trait. The RFLP assay can be used to assist breeders in avoiding carrier-to-carrier matings and thus in preventing the birth of affected foals. PMID:20419149

  3. Whole-genome SNP association in the horse: identification of a deletion in myosin Va responsible for Lavender Foal Syndrome.

    Directory of Open Access Journals (Sweden)

    Samantha A Brooks

    2010-04-01

    Full Text Available Lavender Foal Syndrome (LFS is a lethal inherited disease of horses with a suspected autosomal recessive mode of inheritance. LFS has been primarily diagnosed in a subgroup of the Arabian breed, the Egyptian Arabian horse. The condition is characterized by multiple neurological abnormalities and a dilute coat color. Candidate genes based on comparative phenotypes in mice and humans include the ras-associated protein RAB27a (RAB27A and myosin Va (MYO5A. Here we report mapping of the locus responsible for LFS using a small set of 36 horses segregating for LFS. These horses were genotyped using a newly available single nucleotide polymorphism (SNP chip containing 56,402 discriminatory elements. The whole genome scan identified an associated region containing these two functional candidate genes. Exon sequencing of the MYO5A gene from an affected foal revealed a single base deletion in exon 30 that changes the reading frame and introduces a premature stop codon. A PCR-based Restriction Fragment Length Polymorphism (PCR-RFLP assay was designed and used to investigate the frequency of the mutant gene. All affected horses tested were homozygous for this mutation. Heterozygous carriers were detected in high frequency in families segregating for this trait, and the frequency of carriers in unrelated Egyptian Arabians was 10.3%. The mapping and discovery of the LFS mutation represents the first successful use of whole-genome SNP scanning in the horse for any trait. The RFLP assay can be used to assist breeders in avoiding carrier-to-carrier matings and thus in preventing the birth of affected foals.

  4. Whole-Genome Sequences of Two Borrelia afzelii and Two Borrelia garinii Lyme Disease Agent Isolates

    Energy Technology Data Exchange (ETDEWEB)

    Casjens, S.R.; Dunn, J.; Mongodin, E. F.; Qiu, W.-G.; Luft, B. J.; Fraser-Liggett, C. M.; Schutzer, S. E.

    2011-12-01

    Human Lyme disease is commonly caused by several species of spirochetes in the Borrelia genus. In Eurasia these species are largely Borrelia afzelii, B. garinii, B. burgdorferi, and B. bavariensis sp. nov. Whole-genome sequencing is an excellent tool for investigating and understanding the influence of bacterial diversity on the pathogenesis and etiology of Lyme disease. We report here the whole-genome sequences of four isolates from two of the Borrelia species that cause human Lyme disease, B. afzelii isolates ACA-1 and PKo and B. garinii isolates PBr and Far04.

  5. Whole-Genome de novo Sequencing Of Quail And Grey Partridge

    DEFF Research Database (Denmark)

    Holm, Lars-Erik; Panitz, Frank; Burt, Dave

    2011-01-01

    The development in sequencing methods has made it possible to perform whole genome de novo sequencing of species without large commercial interests. Within the EU-financed QUANTOMICS project (KBBE-2A-222664), we have performed de novo sequencing of quail (Coturnix coturnix) and grey partridge...... comparative studies towards the chicken genome and will aid in identifying evolutionarily conserved sequences within the Galliformes. The obtained sequences from quail and partridge represent a beginning of generating the whole genome sequence for these species. The continuation of establishing the genome...

  6. Genotyping of 75 SNPs using arrays for individual identification in five population groups.

    Science.gov (United States)

    Hwa, Hsiao-Lin; Wu, Lawrence Shih Hsin; Lin, Chun-Yen; Huang, Tsun-Ying; Yin, Hsiang-I; Tseng, Li-Hui; Lee, James Chun-I

    2016-01-01

    Single nucleotide polymorphism (SNP) typing offers promise to forensic genetics. Various strategies and panels for analyzing SNP markers for individual identification have been published. However, the best panels with fewer identity SNPs for all major population groups are still under discussion. This study aimed to find more autosomal SNPs with high heterozygosity for individual identification among Asian populations. Ninety-six autosomal SNPs of 502 DNA samples from unrelated individuals of five population groups (208 Taiwanese Han, 83 Filipinos, 62 Thais, 69 Indonesians, and 80 individuals with European, Near Eastern, or South Asian ancestry) were analyzed using arrays in an initial screening, and 75 SNPs (group A, 46 newly selected SNPs; groups B, 29 SNPs based on a previous SNP panel) were selected for further statistical analyses. Some SNPs with high heterozygosity from Asian populations were identified. The combined random match probability of the best 40 and 45 SNPs was between 3.16 × 10(-17) and 7.75 × 10(-17) and between 2.33 × 10(-19) and 7.00 × 10(-19), respectively, in all five populations. These loci offer comparable power to short tandem repeats (STRs) for routine forensic profiling. In this study, we demonstrated the population genetic characteristics and forensic parameters of 75 SNPs with high heterozygosity from five population groups. This SNPs panel can provide valuable genotypic information and can be helpful in forensic casework for individual identification among these populations.

  7. Rice–arsenate interactions in hydroponics: whole genome transcriptional analysis

    Science.gov (United States)

    Norton, Gareth J.; Lou-Hing, Daniel E.; Meharg, Andrew A.; Price, Adam H.

    2008-01-01

    Rice (Oryza sativa) varieties that are arsenate-tolerant (Bala) and -sensitive (Azucena) were used to conduct a transcriptome analysis of the response of rice seedlings to sodium arsenate (AsV) in hydroponic solution. RNA extracted from the roots of three replicate experiments of plants grown for 1 week in phosphate-free nutrient with or without 13.3 μM AsV was used to challenge the Affymetrix (52K) GeneChip Rice Genome array. A total of 576 probe sets were significantly up-regulated at least 2-fold in both varieties, whereas 622 were down-regulated. Ontological classification is presented. As expected, a large number of transcription factors, stress proteins, and transporters demonstrated differential expression. Striking is the lack of response of classic oxidative stress-responsive genes or phytochelatin synthases/synthatases. However, the large number of responses from genes involved in glutathione synthesis, metabolism, and transport suggests that glutathione conjugation and arsenate methylation may be important biochemical responses to arsenate challenge. In this report, no attempt is made to dissect differences in the response of the tolerant and sensitive variety, but analysis in a companion article will link gene expression to the known tolerance loci available in the Bala×Azucena mapping population. PMID:18453530

  8. Rice-arsenate interactions in hydroponics: whole genome transcriptional analysis.

    Science.gov (United States)

    Norton, Gareth J; Lou-Hing, Daniel E; Meharg, Andrew A; Price, Adam H

    2008-01-01

    Rice (Oryza sativa) varieties that are arsenate-tolerant (Bala) and -sensitive (Azucena) were used to conduct a transcriptome analysis of the response of rice seedlings to sodium arsenate (AsV) in hydroponic solution. RNA extracted from the roots of three replicate experiments of plants grown for 1 week in phosphate-free nutrient with or without 13.3 muM AsV was used to challenge the Affymetrix (52K) GeneChip Rice Genome array. A total of 576 probe sets were significantly up-regulated at least 2-fold in both varieties, whereas 622 were down-regulated. Ontological classification is presented. As expected, a large number of transcription factors, stress proteins, and transporters demonstrated differential expression. Striking is the lack of response of classic oxidative stress-responsive genes or phytochelatin synthases/synthatases. However, the large number of responses from genes involved in glutathione synthesis, metabolism, and transport suggests that glutathione conjugation and arsenate methylation may be important biochemical responses to arsenate challenge. In this report, no attempt is made to dissect differences in the response of the tolerant and sensitive variety, but analysis in a companion article will link gene expression to the known tolerance loci available in the BalaxAzucena mapping population.

  9. Comparison of genome-wide array genomic hybridization platforms for the detection of copy number variants in idiopathic mental retardation

    Directory of Open Access Journals (Sweden)

    Marra Marco

    2011-03-01

    Full Text Available Abstract Background Clinical laboratories are adopting array genomic hybridization as a standard clinical test. A number of whole genome array genomic hybridization platforms are available, but little is known about their comparative performance in a clinical context. Methods We studied 30 children with idiopathic MR and both unaffected parents of each child using Affymetrix 500 K GeneChip SNP arrays, Agilent Human Genome 244 K oligonucleotide arrays and NimbleGen 385 K Whole-Genome oligonucleotide arrays. We also determined whether CNVs called on these platforms were detected by Illumina Hap550 beadchips or SMRT 32 K BAC whole genome tiling arrays and tested 15 of the 30 trios on Affymetrix 6.0 SNP arrays. Results The Affymetrix 500 K, Agilent and NimbleGen platforms identified 3061 autosomal and 117 X chromosomal CNVs in the 30 trios. 147 of these CNVs appeared to be de novo, but only 34 (22% were found on more than one platform. Performing genotype-phenotype correlations, we identified 7 most likely pathogenic and 2 possibly pathogenic CNVs for MR. All 9 of these putatively pathogenic CNVs were detected by the Affymetrix 500 K, Agilent, NimbleGen and the Illumina arrays, and 5 were found by the SMRT BAC array. Both putatively pathogenic CNVs identified in the 15 trios tested with the Affymetrix 6.0 were identified by this platform. Conclusions Our findings demonstrate that different results are obtained with different platforms and illustrate the trade-off that exists between sensitivity and specificity. The large number of apparently false positive CNV calls on each of the platforms supports the need for validating clinically important findings with a different technology.

  10. Whole genome sequencing of fecal samples as a tool for the diagnosis and genetic characterization of norovirus.

    Science.gov (United States)

    Bavelaar, Herjan H J; Rahamat-Langendoen, Janette; Niesters, Hubert G M; Zoll, Jan; Melchers, Willem J G

    2015-11-01

    Norovirus is a major cause of gastroenteritis, causing yearly epidemics and hospital outbreaks resulting in a high burden on health care. Detection and characterization of norovirus directly from clinical samples could provide a powerful tool in infection control and norovirus epidemiology. To determine whether next-generation sequencing directly on fecal samples can accurately detect and characterize norovirus. Whole genome sequencing was performed on fecal samples from 10 patients with gastro-enteritis. Norovirus infection had previously been confirmed by RT-PCR. Genotyping was performed using phylogenetic analysis. From all clinical samples sufficient amounts of RNA were retrieved to perform whole-transcriptome sequencing for the detection of RNA-viruses. Complete genomic norovirus sequences were obtained from all clinical samples, permitting accurate genotyping by phylogenetic analysis. In addition, a complete coxsackie B1 virus genome was isolated. Detailed information on viral content can be obtained from fecal samples in a single-step approach, supporting clinical and epidemiological purposes. Next-generation sequencing performed directly on clinical samples can become a powerful tool in patient care and infection control. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Landscape of somatic mutations in 560 breast cancer whole-genome sequences

    NARCIS (Netherlands)

    S. Nik-Zainal (Serena); H. Davies (Helen); J. Staaf (Johan); M. Ramakrishna (Manasa); D. Glodzik (Dominik); X. Zou (Xueqing); I. Martincorena (Inigo); L.B. Alexandrov (Ludmil); S. Martin (Sandra); D.C. Wedge (David); P. van Loo (Peter); Y.S. Ju (Young Seok); M. Smid (Marcel); A.B. Brinkman (Arie B.); S. Morganella (Sandro); Aure, M.R. (Miriam R.); Lingjærde, O.C. (Ole Christian); A. Langerød (Anita); Ringnér, M. (Markus); Ahn, S.-M. (Sung-Min); S. Boyault (Sandrine); Brock, J.E. (Jane E.); A. Broeks (Annegien); A. Butler (Adam); C. Desmedt (Christine); L.Y. Dirix (Luc); S. Dronov (Serge); A. Fatima (Aquila); J.A. Foekens (John); M. Gerstung (Moritz); J. Hooijer; Jang, S.J. (Se Jin); Jones, D.R. (David R.); H.-Y. Kim (Hyung-Yong); King, T.A. (Tari A.); Krishnamurthy, S. (Savitri); Lee, H.J. (Hee Jin); Lee, J.-Y. (Jeong-Yeon); Y. Li (Yilong); S. McLaren (Stuart); D. Menzies; Mustonen, V. (Ville); S. O'Meara (Sarah); I. Pauporté (Iris); X. Pivot (Xavier); C.A. Purdie (Colin A.); J.W. Raine (John); Ramakrishnan, K. (Kamna); F.G. Rodriguez-Gonzalez (F. German); Romieu, G. (Gilles); A.M. Sieuwerts (Anieta); Simpson, P.T. (Peter T.); Shepherd, R. (Rebecca); L.A. Stebbings (Lucy); Stefansson, O.A. (Olafur A.); J. Teague (Jon); Tommasi, S. (Stefania); I. Treilleux (Isabelle); G. van den Eynden; P.B. Vermeulen; A. Vincent-Salomon (Anne); L.R. Yates (Lucy); C. Caldas (Carlos); L.J. van 't Veer (Laura); A. Tutt (Andrew); S. Knappskog (Stian); Tan, B.K.T. (Benita Kiat Tee); J. Jonkers (Jos); Å. Borg (Åke); Ueno, N.T. (Naoto T.); C. Sotiriou (Christos); Viari, A. (Alain); P.A. Futreal (Andrew); P.J. Campbell (Peter); P.N. Span (Paul); S.J. van Laere (Steven); S. Lakhani (Sunil); J. Eyfjord; A.M. Thompson (Alastair M.); E. Birney (Ewan); H. Stunnenberg (Henk); M.J. Vijver (Marc ); J.W.M. Martens (John); A.-L. Borresen-Dale (Anne-Lise); A.L. Richardson (Andrea); G. Kong (Gu); G. Thomas (Gilles); M.R. Stratton (Michael)

    2016-01-01

    textabstractWe analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding

  12. Determination of Elizabethkingia Diversity by MALDI-TOF Mass Spectrometry and Whole-Genome Sequencing

    DEFF Research Database (Denmark)

    Eriksen, Helle Brander; Gumpert, Heidi; Faurholt, Cecilie Haase

    2017-01-01

    In a hospital-acquired infection with multidrug-resistant Elizabethkingia, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry and 16S rRNA gene analysis identified the pathogen as Elizabethkingia miricola. Whole-genome sequencing, genus-level core genome analysis...

  13. Diagnosis of Capnocytophaga canimorsus Sepsis by Whole-Genome Next-Generation Sequencing.

    Science.gov (United States)

    Abril, Maria K; Barnett, Adam S; Wegermann, Kara; Fountain, Eric; Strand, Andrew; Heyman, Benjamin M; Blough, Britton A; Swaminathan, Aparna C; Sharma-Kuinkel, Batu; Ruffin, Felicia; Alexander, Barbara D; McCall, Chad M; Costa, Sylvia F; Arcasoy, Murat O; Hong, David K; Blauwkamp, Timothy A; Kertesz, Michael; Fowler, Vance G; Kraft, Bryan D

    2016-09-01

    We report the case of a 60-year-old man with septic shock due to Capnocytophaga canimorsus that was diagnosed in 24 hours by a novel whole-genome next-generation sequencing assay. This technology shows great promise in identifying fastidious pathogens, and, if validated, it has profound implications for infectious disease diagnosis.

  14. The effect of whole genome amplification on samples originating from more than one donor

    DEFF Research Database (Denmark)

    Thacker, C.R.; Balogh, M.K.; Børsting, Claus

    2006-01-01

    In this study, the GenomiPhi(TM) DNA Amplification Kit (Amersham Biosciences) was used to investigate the potential of whole genome amplification (WGA) when considering samples originating from more than one donor. DNA was extracted from blood samples, quantified and normalised before being mixed...

  15. Toxicological effects of benzo[a]pyrene on DNA methylation of whole genome in ICR mice.

    Science.gov (United States)

    Zhao, L; Zhang, S; An, X; Tan, W; Pang, D; Ouyang, H

    2015-10-30

    It has been well known that alterations in DNA methylation - an important regulator of gene transcription - lead to cancer. Therefore a change in the level of DNA methylation of whole genome has been considered as a biomarker of carcinogenesis. Previously, a large number of experimental results in genetic toxicology have showed that benzo[a]pyrene could cause DNA mutation and fragmentation. However, there was little to no studies on alterations in DNA methylation of genome directly result from exposure to benzo[a]pyrene. In this paper, possible mechanisms of alterations in whole genomic DNA methylation by benzo[a]pyrene were investigated using ICR mice after benzo[a]pyrene exposure. The blood, liver, pancreas, skin, lung and bladder of ICR mice were removed and checked after a fixed time interval (6 hours) of benzo[a]pyrene exposure, and whole genomic DNA methylation level was determined by high performance liquid chromatography (HPLC). The results exhibited tissue specificity, that is, the level of whole genomic DNA methylation decreases significantly in blood and liver, rather than pancreas, lung, skin and bladder of ICR mice. This study investigated the direct relationship between aberrant DNA methylation level and benzo[a]pyrene exposure, which might be helpful to clarify the toxicological mechanism of benzo[a]pyrene in epigenetic perspectives.

  16. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    NARCIS (Netherlands)

    Yuen, Ryan K C; Merico, Daniele; Bookman, Matt; Howe, Jennifer L.; Thiruvahindrapuram, Bhooma; Patel, Rohan V.; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A.; Walker, Susan; Marshall, Christian R.; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D'Abate, Lia; Chan, Ada J S; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L.; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J.; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R.; Nalpathamkalam, Thomas; Sung, Wilson W L; Tsoi, Fiona J.; Wei, John; Xu, Lizhen; Tasse, Anne Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie Mackinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M.; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H.; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A.; Parr, Jeremy R.; Spence, Sarah J.; Vorstman, Jacob|info:eu-repo/dai/nl/304817023; Frey, Brendan J.; Robinson, James T.; Strug, Lisa J.; Fernandez, Bridget A.; Elsabbagh, Mayada; Carter, Melissa T.; Hallmayer, Joachim; Knoppers, Bartha M.; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H.; Glazer, David; Pletcher, Mathew T.; Scherer, Stephen W.

    2017-01-01

    We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information,

  17. Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data

    NARCIS (Netherlands)

    Farmery, James H. R.; Smith, Mike L.; Lynch, Andy G.; Huissoon, Aarnoud; Furnell, Abigail; Mead, Adam; Levine, Adam P.; Manzur, Adnan; Thrasher, Adrian; Greenhalgh, Alan; Parker, Alasdair; Sanchis-Juan, Alba; Richter, Alex; Gardham, Alice; Lawrie, Allan; Sohal, Aman; Creaser-Myers, Amanda; Frary, Amy; Greinacher, Andreas; Themistocleous, Andreas; Peacock, Andrew J.; Marshall, Andrew; Mumford, Andrew; Rice, Andrew; Webster, Andrew; Brady, Angie; Koziell, Ania; Manson, Ania; Chandra, Anita; Hensiek, Anke; Veld, Anna Huis In't; Maw, Anna; Kelly, Anne M.; Moore, Anthony; Vonk Noordegraaf, Anton; Attwood, Antony; Herwadkar, Archana; Ghofrani, Ardi; Houweling, Arjan C.; Girerd, Barbara; Furie, Bruce; Treacy, Carmen M.; Millar, Carolyn M.; Sewell, Carrock; Roughley, Catherine; Titterton, Catherine; Williamson, Catherine; Hadinnapola, Charaka; Deshpande, Charu; Toh, Cheng-Hock; Bacchelli, Chiara; Patch, Chris; Geet, Chris Van; Babbs, Christian; Bryson, Christine; Penkett, Christopher J.; Rhodes, Christopher J.; Watt, Christopher; Bethune, Claire; Booth, Claire; Lentaigne, Claire; McJannet, Coleen; Church, Colin; French, Courtney; Samarghitean, Crina; Halmagyi, Csaba; Gale, Daniel; Greene, Daniel; Hart, Daniel; Allsup, David; Bennett, David; Edgar, David; Kiely, David G.; Gosal, David; Perry, David J.; Keeling, David; Montani, David; Shipley, Debbie; Whitehorn, Deborah; Fletcher, Debra; Krishnakumar, Deepa; Grozeva, Detelina; Kumararatne, Dinakantha; Thompson, Dorothy; Josifova, Dragana; Maher, Eamonn; Wong, Edwin K. S.; Murphy, Elaine; Dewhurst, Eleanor; Louka, Eleni; Rosser, Elisabeth; Chalmers, Elizabeth; Colby, Elizabeth; Drewe, Elizabeth; McDermott, Elizabeth; Thomas, Ellen; Staples, Emily; Clement, Emma; Matthews, Emma; Wakeling, Emma; Oksenhendler, Eric; Turro, Ernest; Reid, Evan; Wassmer, Evangeline; Raymond, F. Lucy; Hu, Fengyuan; Kennedy, Fiona; Soubrier, Florent; Flinter, Frances; Kovacs, Gabor; Polwarth, Gary; Ambegaonkar, Gautum; Arno, Gavin; Hudson, Gavin; Woods, Geoff; Coghlan, Gerry; Hayman, Grant; Arumugakani, Gururaj; Schotte, Gwen; Cook, H. Terry; Alachkar, Hana; Lango Allen, Hana; Lango-Allen, Hana; Stark, Hannah; Stauss, Hans; Schulze, Harald; Boggard, Harm J.; Baxendale, Helen; Dolling, Helen; Firth, Helen; Gall, Henning; Watson, Henry; Longhurst, Hilary; Markus, Hugh S.; Watkins, Hugh; Simeoni, Ilenia; Emmerson, Ingrid; Roberts, Irene; Quinti, Isabella; Wanjiku, Ivy; Gibbs, J. Simon R.; Thaventhiran, James; Whitworth, James; Hurst, Jane; Collins, Janine; Suntharalingam, Jay; Payne, Jeanette; Thachil, Jecko; Martin, Jennifer M.; Martin, Jennifer; Carmichael, Jenny; Maimaris, Jesmeen; Paterson, Joan; Pepke-Zaba, Joanna; Heemskerk, Johan W. M.; Gebhart, Johanna; Davis, John; Pasi, John; Bradley, John R.; Wharton, John; Stephens, Jonathan; Rankin, Julia; Anderson, Julie; Vogt, Julie; von Ziegenweldt, Julie; Rehnstrom, Karola; Megy, Karyn; Talks, Kate; Peerlinck, Kathelijne; Yates, Katherine; Freson, Kathleen; Stirrups, Kathleen; Gomez, Keith; Smith, Kenneth G. C.; Carss, Keren; Rue-Albrecht, Kevin; Gilmour, Kimberley; Masati, Larahmie; Scelsi, Laura; Southgate, Laura; Ranganathan, Lavanya; Ginsberg, Lionel; Devlin, Lisa; Willcocks, Lisa; Ormondroyd, Liz; Lorenzo, Lorena; Harper, Lorraine; Allen, Louise; Daugherty, Louise; Chitre, Manali; Kurian, Manju; Humbert, Marc; Tischkowitz, Marc; Bitner-Glindzicz, Maria; Erwood, Marie; Scully, Marie; Veltman, Marijke; Caulfield, Mark; Layton, Mark; McCarthy, Mark; Ponsford, Mark; Toshner, Mark; Bleda, Marta; Wilkins, Martin; Mathias, Mary; Reilly, Mary; Afzal, Maryam; Brown, Matthew; Rondina, Matthew; Stubbs, Matthew; Haimel, Matthias; Lees, Melissa; Laffan, Michael A.; Browning, Michael; Gattens, Michael; Richards, Michael; Michaelides, Michel; Lambert, Michele P.; Makris, Mike; de Vries, Minka; Mahdi-Rogers, Mohamed; Saleem, Moin; Thomas, Moira; Holder, Muriel; Eyries, Mélanie; Clements-Brod, Naomi; Canham, Natalie; Dormand, Natalie; Zuydam, Natalie Van; Kingston, Nathalie; Ghali, Neeti; Cooper, Nichola; Morrell, Nicholas W.; Yeatman, Nigel; Roy, Noémi; Shamardina, Olga; Alavijeh, Omid S.; Gresele, Paolo; Nurden, Paquita; Chinnery, Patrick; Deegan, Patrick; Yong, Patrick; Man, Patrick Yu Wai; Corris, Paul A.; Calleja, Paul; Gissen, Paul; Bolton-Maggs, Paula; Rayner-Matthews, Paula; Ghataorhe, Pavandeep K.; Gordins, Pavel; Stein, Penelope; Collins, Peter; Dixon, Peter; Kelleher, Peter; Ancliff, Phil; Yu, Ping; Tait, R. Campbell; Linger, Rachel; Doffinger, Rainer; Machado, Rajiv; Kazmi, Rashid; Sargur, Ravishankar; Favier, Remi; Tan, Rhea; Liesner, Ri; Antrobus, Richard; Sandford, Richard; Scott, Richard; Trembath, Richard; Horvath, Rita; Hadden, Rob; MackenzieRoss, Rob V.; Henderson, Robert; MacLaren, Robert; James, Roger; Ghurye, Rohit; DaCosta, Rosa; Hague, Rosie; Mapeta, Rutendo; Armstrong, Ruth; Noorani, Sadia; Murng, Sai; Santra, Saikat; Tuna, Salih; Johnson, Sally; Chong, Sam; Lear, Sara; Walker, Sara; Goddard, Sarah; Mangles, Sarah; Westbury, Sarah; Mehta, Sarju; Hackett, Scott; Nejentsev, Sergey; Moledina, Shahin; Bibi, Shahnaz; Meehan, Sharon; Othman, Shokri; Revel-Vilk, Shoshana; Holden, Simon; McGowan, Simon; Staines, Simon; Savic, Sinisa; Burns, Siobhan; Grigoriadou, Sofia; Papadia, Sofia; Ashford, Sofie; Schulman, Sol; Ali, Sonia; Park, Soo-Mi; Davies, Sophie; Stock, Sophie; Ali, Souad; Deevi, Sri V. V.; Gräf, Stefan; Ghio, Stefano; Wort, Stephen J.; Jolles, Stephen; Austin, Steve; Welch, Steve; Meacham, Stuart; Rankin, Stuart; Walker, Suellen; Seneviratne, Suranjith; Holder, Susan; Sivapalaratnam, Suthesh; Richardson, Sylvia; Kuijpers, Taco; Bariana, Tadbir K.; Bakchoul, Tamam; Everington, Tamara; Renton, Tara; Young, Tim; Aitman, Timothy; Warner, Timothy Q.; Vale, Tom; Hammerton, Tracey; Pollock, Val; Matser, Vera; Cookson, Victoria; Clowes, Virginia; Qasim, Waseem; Wei, Wei; Erber, Wendy N.; Ouwehand, Willem H.; Astle, William; Egner, William; Turek, Wojciech; Henskens, Yvonne; Tan, Yvonne

    2018-01-01

    Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously

  18. Effective Normalization for Copy Number Variation Detection from Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2012-01-01

    Background Whole genome sequencing enables a high resolution view ofthe human genome and provides unique insights into genome structureat an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools while validatedalso include a number of

  19. Genomic Epidemiology: Whole-Genome-Sequencing–Powered Surveillance and Outbreak Investigation of Foodborne Bacterial Pathogens

    DEFF Research Database (Denmark)

    Deng, Xiangyu; den Bakker, Henk C.; Hendriksen, Rene S.

    2016-01-01

    -called next-generation sequencing (NGS) technologies that have made whole-genome sequencing (WGS) of foodborne bacterial pathogens a realistic and superior alternative to traditional subtyping methods. Routine, real-time, and widespread application of WGS in food safety and public health is on the horizon...

  20. Whole Genome Selection Project Involving 2,000 Industry AI Sires

    Science.gov (United States)

    Whole genome selection (WGS) uses markers spanning the genome to predict genetic merit for economically important traits. WGS may increase the rate of genetic progress through improved accuracy and reduced generation interval especially for traits that cannot be measured on breeding animals. In cont...

  1. The effect of rare alleles on estimated genomic relationships from whole genome sequence data

    NARCIS (Netherlands)

    Eynard, S.E.; Windig, J.J.; Leroy, G.; Binsbergen, van R.; Calus, M.P.L.

    2015-01-01

    Relationships between individuals and inbreeding coefficients are commonly used for breeding decisions, but may be affected by the type of data used for their estimation. The proportion of variants with low Minor Allele Frequency (MAF) is larger in whole genome sequence (WGS) data compared to Single

  2. Characterization of C. jejuni and C. coli broiler isolates by whole genome sequencing

    DEFF Research Database (Denmark)

    Cantero, G.; Correa-Fiz, F.; Ronco, Troels

    vast majority of infections, which may subsequently lead to serious neuropathologies such as Guillain-Barré syndrome. The aim of this study was to take advantage of whole genome sequencing (WGS) to in-depth characterize a subset of 16 C. jejuni and C. coli isolates from broilers from five farms....

  3. Whole-genome sequence of Bacillus solimangrovi GH 2-4T, isolated from mangrove soil.

    Science.gov (United States)

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-12-01

    Bacillus solimangrovi GH 2-4T was isolated from mangrove soil and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession MJEH00000000.

  4. Whole genome analysis of Klebsiella pneumoniae T2-1-1 from human oral cavity.

    Science.gov (United States)

    Chan, Kok-Gan; Yin, Wai-Fong; Chan, Xin-Yue

    2016-03-01

    Klebsiella pneumoniae T2-1-1 was isolated from the human tongue debris and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession JAQL00000000.

  5. Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle

    NARCIS (Netherlands)

    Binsbergen, van R.; Calus, M.P.L.; Bink, M.C.A.M.; Eeuwijk, van F.A.; Schrooten, C.; Veerkamp, R.F.

    2015-01-01

    Background In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to

  6. The use of whole genome sequence data to estimate genetic relationships including rare alleles information

    NARCIS (Netherlands)

    Eynard, S.E.; Windig, J.J.; Leroy, G.; Verrier, E.; Hiemstra, S.J.; Binsbergen, van R.; Calus, M.P.L.

    2014-01-01

    Whole genome sequencing technologies are rapidly developing. In some ways, the speed of this development has outstripped our capacity to use this type of data in selection strategies, especially in livestock diversity conservation. In this study, relationship matrices were computed for 118 Holstein

  7. Whole-Genome Sequence of the Spodoptera frugiperda Sf9 Insect Cell Line

    OpenAIRE

    Nandakumar, Subhiksha; Ma, Hailun; Khan, Arifa S

    2017-01-01

    ABSTRACT The draft whole-genome sequence of the Spodoptera frugiperda Sf9 insect cell line was obtained using long-read PacBio sequence technology and Canu assembly. The final assembled genome consisted of 451?Mbp in 4,577 contigs, with 12,716? mean coverage and a G+C content of 36.53%.

  8. Whole-genome sequence of aeromonas hydrophila strain AH-1 (Serotype O11)

    OpenAIRE

    Forn-Cun?, Gabriel; Tom?s, Juan M.; Merino, Susana

    2016-01-01

    Aeromonas?hydrophila is an emerging pathogen of aquatic and terrestrial animals, including humans. Here, we report the whole-genome sequence of the septicemic A.?hydrophila AH-1 strain, belonging to the serotype O11, and the first mesophilic Aeromonas with surface layer (S-layer) to be sequenced.

  9. The whole genome sequence assembly of the soybean aphid, Aphis glycines

    Science.gov (United States)

    Aphids are emerging as model organisms for both basic and applied research. Of the 5,000 estimated species, only two aphids have published whole genome sequences: the pea aphid Acyrthosiphon pisum, and the Russian wheat aphid, Diuraphis noxia. The soybean aphid (Aphis glycines) is an extreme special...

  10. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy

    NARCIS (Netherlands)

    Bouwman, A.C.; Veerkamp, R.F.

    2014-01-01

    The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken

  11. Whole-genome sequence of Bacillus solimangrovi GH 2-4T, isolated from mangrove soil

    OpenAIRE

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-01-01

    Bacillus solimangrovi GH 2-4T was isolated from mangrove soil and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession MJEH00000000.

  12. Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data

    National Research Council Canada - National Science Library

    Oesper, Layla; Satas, Gryte; Raphael, Benjamin J

    2014-01-01

    .... We describe an algorithm called THetA2 that infers the composition of a tumor sample-including not only tumor purity but also the number and content of tumor subpopulations-directly from both whole-genome (WGS) and whole-exome (WXS...

  13. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer

    NARCIS (Netherlands)

    Wang, Kai; Yuen, Siu Tsan; Xu, Jiangchun; Lee, Siu Po; Yan, Helen H N; Shi, Stephanie T; Siu, Hoi Cheong; Deng, Shibing; Chu, Kent Man; Law, Simon; Chan, Kok Hoe; Chan, Annie S Y; Tsui, Wai Yin; Ho, Siu Lun; Chan, Anthony K W; Man, Jonathan L K; Foglizzo, Valentina; Ng, Man Kin; Chan, April S; Ching, Yick Pang; Cheng, Grace H W; Xie, Tao; Fernandez, Julio; Li, Vivian S W; Clevers, Hans; Rejto, Paul A; Mao, Mao; Leung, Suet Yi

    Gastric cancer is a heterogeneous disease with diverse molecular and histological subtypes. We performed whole-genome sequencing in 100 tumor-normal pairs, along with DNA copy number, gene expression and methylation profiling, for integrative genomic analysis. We found subtype-specific genetic and

  14. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans)

    OpenAIRE

    Tran, Phuong N.; Tan, Nicholas E. H.; Lee, Yin Peng; Gan, Han Ming; Polter, Steven J.; Dailey, Lucas K.; Hudson, Andr? O.; Savka, Michael A.

    2015-01-01

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy.

  15. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans).

    Science.gov (United States)

    Tran, Phuong N; Tan, Nicholas E H; Lee, Yin Peng; Gan, Han Ming; Polter, Steven J; Dailey, Lucas K; Hudson, André O; Savka, Michael A

    2015-11-19

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy. Copyright © 2015 Tran et al.

  16. Whole-Genome Scans Provide Evidence of Adaptive Evolution in Malawian Plasmodium falciparum Isolates

    DEFF Research Database (Denmark)

    Ocholla, Harold; Preston, Mark D; Mipando, Mwapatsa

    2014-01-01

    BACKGROUND:  Selection by host immunity and antimalarial drugs has driven extensive adaptive evolution in Plasmodium falciparum and continues to produce ever-changing landscapes of genetic variation. METHODS:  We performed whole-genome sequencing of 69 P. falciparum isolates from Malawi and used...

  17. Animal selection for whole genome sequencing by quantifying the unique contribution of homozygous haplotypes sequenced

    Science.gov (United States)

    Major whole genome sequencing projects promise to identify rare and causal variants within livestock species; however, the efficient selection of animals for sequencing remains a major problem within these surveys. The goal of this project was to develop a library of high accuracy genetic variants f...

  18. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    DEFF Research Database (Denmark)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  19. Functional regression method for whole genome eQTL epistasis analysis with sequencing data.

    Science.gov (United States)

    Xu, Kelin; Jin, Li; Xiong, Momiao

    2017-05-18

    Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction

  20. The Use of Non-Variant Sites to Improve the Clinical Assessment of Whole-Genome Sequence Data.

    Directory of Open Access Journals (Sweden)

    Alberto Ferrarini

    Full Text Available Genetic testing, which is now a routine part of clinical practice and disease management protocols, is often based on the assessment of small panels of variants or genes. On the other hand, continuous improvements in the speed and per-base costs of sequencing have now made whole exome sequencing (WES and whole genome sequencing (WGS viable strategies for targeted or complete genetic analysis, respectively. Standard WGS/WES data analytical workflows generally rely on calling of sequence variants respect to the reference genome sequence. However, the reference genome sequence contains a large number of sites represented by rare alleles, by known pathogenic alleles and by alleles strongly associated to disease by GWAS. It's thus critical, for clinical applications of WGS and WES, to interpret whether non-variant sites are homozygous for the reference allele or if the corresponding genotype cannot be reliably called. Here we show that an alternative analytical approach based on the analysis of both variant and non-variant sites from WGS data allows to genotype more than 92% of sites corresponding to known SNPs compared to 6% genotyped by standard variant analysis. These include homozygous reference sites of clinical interest, thus leading to a broad and comprehensive characterization of variation necessary to an accurate evaluation of disease risk. Altogether, our findings indicate that characterization of both variant and non-variant clinically informative sites in the genome is necessary to allow an accurate clinical assessment of a personal genome. Finally, we propose a highly efficient extended VCF (eVCF file format which allows to store genotype calls for sites of clinical interest while remaining compatible with current variant interpretation software.

  1. The Use of Non-Variant Sites to Improve the Clinical Assessment of Whole-Genome Sequence Data.

    Science.gov (United States)

    Ferrarini, Alberto; Xumerle, Luciano; Griggio, Francesca; Garonzi, Marianna; Cantaloni, Chiara; Centomo, Cesare; Vargas, Sergio Marin; Descombes, Patrick; Marquis, Julien; Collino, Sebastiano; Franceschi, Claudio; Garagnani, Paolo; Salisbury, Benjamin A; Harvey, John Max; Delledonne, Massimo

    2015-01-01

    Genetic testing, which is now a routine part of clinical practice and disease management protocols, is often based on the assessment of small panels of variants or genes. On the other hand, continuous improvements in the speed and per-base costs of sequencing have now made whole exome sequencing (WES) and whole genome sequencing (WGS) viable strategies for targeted or complete genetic analysis, respectively. Standard WGS/WES data analytical workflows generally rely on calling of sequence variants respect to the reference genome sequence. However, the reference genome sequence contains a large number of sites represented by rare alleles, by known pathogenic alleles and by alleles strongly associated to disease by GWAS. It's thus critical, for clinical applications of WGS and WES, to interpret whether non-variant sites are homozygous for the reference allele or if the corresponding genotype cannot be reliably called. Here we show that an alternative analytical approach based on the analysis of both variant and non-variant sites from WGS data allows to genotype more than 92% of sites corresponding to known SNPs compared to 6% genotyped by standard variant analysis. These include homozygous reference sites of clinical interest, thus leading to a broad and comprehensive characterization of variation necessary to an accurate evaluation of disease risk. Altogether, our findings indicate that characterization of both variant and non-variant clinically informative sites in the genome is necessary to allow an accurate clinical assessment of a personal genome. Finally, we propose a highly efficient extended VCF (eVCF) file format which allows to store genotype calls for sites of clinical interest while remaining compatible with current variant interpretation software.

  2. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling.

    Science.gov (United States)

    Pidsley, Ruth; Zotenko, Elena; Peters, Timothy J; Lawrence, Mitchell G; Risbridger, Gail P; Molloy, Peter; Van Djik, Susan; Muhlhausler, Beverly; Stirzaker, Clare; Clark, Susan J

    2016-10-07

    In recent years the Illumina HumanMethylation450 (HM450) BeadChip has provided a user-friendly platform to profile DNA methylation in human samples. However, HM450 lacked coverage of distal regulatory elements. Illumina have now released the MethylationEPIC (EPIC) BeadChip, with new content specifically designed to target these regions. We have used HM450 and whole-genome bisulphite sequencing (WGBS) to perform a critical evaluation of the new EPIC array platform. EPIC covers over 850,000 CpG sites, including >90 % of the CpGs from the HM450 and an additional 413,743 CpGs. Even though the additional probes improve the coverage of regulatory elements, including 58 % of FANTOM5 enhancers, only 7 % distal and 27 % proximal ENCODE regulatory elements are represented. Detailed comparisons of regulatory elements from EPIC and WGBS show that a single EPIC probe is not always informative for those distal regulatory elements showing variable methylation across the region. However, overall data from the EPIC array at single loci are highly reproducible across technical and biological replicates and demonstrate high correlation with HM450 and WGBS data. We show that the HM450 and EPIC arrays distinguish differentially methylated probes, but the absolute agreement depends on the threshold set for each platform. Finally, we provide an annotated list of probes whose signal could be affected by cross-hybridisation or underlying genetic variation. The EPIC array is a significant improvement over the HM450 array, with increased genome coverage of regulatory regions and high reproducibility and reliability, providing a valuable tool for high-throughput human methylome analyses from diverse clinical samples.

  3. Evaluating the performance of commercial whole-genome marker sets for capturing common genetic variation

    Science.gov (United States)

    Mägi, Reedik; Pfeufer, Arne; Nelis, Mari; Montpetit, Alexandre; Metspalu, Andres; Remm, Maido

    2007-01-01

    Background New technologies have enabled genome-wide association studies to be conducted with hundreds of thousands of genotyped SNPs. Several different first-generation genome-wide panels of SNPs have been commercialized. The total amount of common genetic variation is still unknown; however, the coverage of commercial panels can be evaluated against reference population samples genotyped by the International HapMap project. Less information is available about coverage in samples from other populations. Results In this study we compare four commercial panels: the HumanHap 300 and HumanHap 550 Array Sets from the Illumina Infinium series and the Mapping 100 K and Mapping 500 K Array Sets from the Affymetrix GeneChip series. Tagging performance is compared among HapMap CEPH (CEU), Asian (JPT, CHB) and Yoruba (YRI) population samples. It is also evaluated in an Estonian population sample with more than 1000 individuals genotyped in two 500-kbp ENCODE regions of chromosome 2: ENr112 on 2p16.3 and ENr131 on 2p37.1. Conclusion We found that in a non-reference Caucasian population, commercial SNP panels provide levels of coverage similar to those in the HapMap CEPH population sample. We present the proportions of universal and population-specific SNPs in all the commercial platforms studied. PMID:17562002

  4. Selective Whole-Genome Amplification Is a Robust Method That Enables Scalable Whole-Genome Sequencing of Plasmodium vivax from Unprocessed Clinical Samples.

    Science.gov (United States)

    Cowell, Annie N; Loy, Dorothy E; Sundararaman, Sesh A; Valdivia, Hugo; Fisch, Kathleen; Lescano, Andres G; Baldeviano, G Christian; Durand, Salomon; Gerbasi, Vince; Sutherland, Colin J; Nolder, Debbie; Vinetz, Joseph M; Hahn, Beatrice H; Winzeler, Elizabeth A

    2017-02-07

    Whole-genome sequencing (WGS) of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA), which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax, the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA) can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP) characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens. Malaria is a disease caused by Plasmodium parasites that caused 214 million symptomatic cases and 438,000 deaths in 2015. Plasmodium vivax is the most widely distributed species, causing the majority of malaria infections outside sub-Saharan Africa. Whole-genome sequencing (WGS) of Plasmodium parasites from clinical samples has revealed important insights into the epidemiology and mechanisms of drug resistance of malaria

  5. Canaries in the coal mine: Personal and professional impact of undergoing whole genome sequencing on medical professionals.

    Science.gov (United States)

    Zierhut, Heather; McCarthy Veach, Patricia; LeRoy, Bonnie

    2015-11-01

    Public interest in personal whole genome sequencing is increasing. The technology is publicly available and is being used as an educational tool in higher education. Empirical evidence regarding its utility is vital. The goals of this study were to characterize the process of whole genome sequencing in a population of medical and basic science professionals undergoing whole genome sequencing as a part of an educational symposium. Thirty-eight individuals completed one or more surveys from the time of informed consent for whole genome sequencing to 3 months post-symposium. The four surveys assessed demographics, decision-making, communication, decision regret, and personal and professional impact. The most prevalent motivation to participate was professional enhancement, followed by curiosity about the technology, and personal health benefits. The most important initial impact concerned medical implications. Over time, however, impact on professional development was greater than on personal health. Anticipated reactions to receiving whole genome sequencing results generally matched participants' actual reactions and decision regret remained low over time. Benefits and risks of whole genome sequencing included medically actionable results and misunderstanding by healthcare providers. Whole genome sequencing generally had a positive impact professionally and personally on participants. Further education of providers and the public about whole genome sequencing and psychosocial support is warranted. © 2015 Wiley Periodicals, Inc.

  6. A 34K SNP genotyping array for Populus trichocarpa: design, application to the study of natural populations and transferability to other Populus species

    Energy Technology Data Exchange (ETDEWEB)

    Geraldes, Armando [University of British Columbia, Vancouver; Hannemann, Jan [University of Victoria, Canada; Grassa, Chris [University of British Columbia, Vancouver; Farzaneh, Nima [University of British Columbia, Vancouver; Porth, Ilga [University of British Columbia, Vancouver; McKown, Athena [University of British Columbia, Vancouver; Skyba, Oleksandr [University of British Columbia, Vancouver; Li, Eryang [University of British Columbia, Vancouver; Mike, Fujita [University of British Columbia, Vancouver; Friedmann, Michael [University of British Columbia, Vancouver; Wasteneys, Geoffrey [University of British Columbia, Vancouver; Guy, Robert [University of British Columbia, Vancouver; El-Kassaby, Yousry [University of British Columbia, Vancouver; Mansfield, Shawn [University of British Columbia, Vancouver; Cronk, Quentin [University of British Columbia, Vancouver; Ehlting, Juergen [University of Victoria, Canada; Douglas, Carl [University of British Columbia, Vancouver; DiFazio, Stephen P [West Virginia University, Morgantown; Slavov, Gancho [West Virginia University, Morgantown; Ranjan, Priya [ORNL; Muchero, Wellington [ORNL; Gunter, Lee E [ORNL; Wymore, Ann [ORNL; Tuskan, Gerald A [ORNL; Martin, Joel [U.S. Department of Energy, Joint Genome Institute; Schackwitz, Wendy [U.S. Department of Energy, Joint Genome Institute; Pennacchio, Christa [U.S. Department of Energy, Joint Genome Institute; Rokhsar, Daniel [U.S. Department of Energy, Joint Genome Institute

    2013-01-01

    Genetic mapping of quantitative traits requires genotypic data for large numbers of markers in many individuals. Despite the declining costs of genotyping by sequencing, for most studies, the use of large SNP genotyping arrays still offers the most cost-effective solution for large-scale targeted genotyping. Here we report on the design and performance of a SNP genotyping array for Populus trichocarpa (black cottonwood). This genotyping array was designed with SNPs pre-ascertained in 34 wild accessions covering most of the species range. Due to the rapid decay of linkage disequilibrium in P. trichocarpa we adopted a candidate gene approach to the array design that resulted in the selection of 34,131 SNPs, the majority of which are located in, or within 2 kb, of 3,543 candidate genes. A subset of the SNPs (539) was selected based on patterns of variation among the SNP discovery accessions. We show that more than 95% of the loci produce high quality genotypes and that the genotyping error rate for these is likely below 2%, indicating that high-quality data are generated with this array. We demonstrate that even among small numbers of samples (n=10) from local populations over 84% of loci are polymorphic. We also tested the applicability of the array to other species in the genus and found that due to ascertainment bias the number of polymorphic loci decreases rapidly with genetic distance, with the largest numbers detected in other species in section Tacamahaca (P. balsamifera and P. angustifolia). Finally, we provide evidence for the utility of the array for intraspecific studies of genetic differentiation and for species assignment and the detection of natural hybrids.

  7. Evaluation of the OvineSNP50 genotyping array in four South ...

    African Journals Online (AJOL)

    Sandenbergh, Lise

    2016-03-21

    Mar 21, 2016 ... genotyped to determine the utility of the OvineSNP50 chip for these important South ... disequilibrium (LD), population genetic structure, association studies and ..... factor 9 (GDF9) is strongly associated with litter size in sheep.

  8. Whole genome sequencing of an African American family highlights toll like receptor 6 variants in Kawasaki disease susceptibility.

    Directory of Open Access Journals (Sweden)

    Jihoon Kim

    Full Text Available Kawasaki disease (KD is the most common acquired pediatric heart disease. We analyzed Whole Genome Sequences (WGS from a 6-member African American family in which KD affected two of four children. We sought rare, potentially causative genotypes by sequentially applying the following WGS filters: sequence quality scores, inheritance model (recessive homozygous and compound heterozygous, predicted deleteriousness, allele frequency, genes in KD-associated pathways or with significant associations in published KD genome-wide association studies (GWAS, and with differential expression in KD blood transcriptomes. Biologically plausible genotypes were identified in twelve variants in six genes in the two affected children. The affected siblings were compound heterozygous for the rare variants p.Leu194Pro and p.Arg247Lys in Toll-like receptor 6 (TLR6, which affect TLR6 signaling. The affected children were also homozygous for three common, linked (r2 = 1 intronic single nucleotide variants (SNVs in TLR6 (rs56245262, rs56083757 and rs7669329, that have previously shown association with KD in cohorts of European descent. Using transcriptome data from pre-treatment whole blood of KD subjects (n = 146, expression quantitative trait loci (eQTL analyses were performed. Subjects homozygous for the intronic risk allele (A allele of TLR6 rs56245262 had differential expression of Interleukin-6 (IL-6 as a function of genotype (p = 0.0007 and a higher erythrocyte sedimentation rate at diagnosis. TLR6 plays an important role in pathogen-associated molecular pattern recognition, and sequence variations may affect binding affinities that in turn influence KD susceptibility. This integrative genomic approach illustrates how the analysis of WGS in multiplex families with a complex genetic disease allows examination of both the common disease-common variant and common disease-rare variant hypotheses.

  9. Use of whole-genome sequencing to distinguish relapse from reinfection in a completed tuberculosis clinical trial.

    Science.gov (United States)

    Witney, Adam A; Bateson, Anna L E; Jindani, Amina; Phillips, Patrick P J; Coleman, David; Stoker, Neil G; Butcher, Philip D; McHugh, Timothy D

    2017-03-29

    RIFAQUIN was a tuberculosis chemotherapy trial in southern Africa including regimens with high-dose rifapentine with moxifloxacin. Here, the application of whole-genome sequencing (WGS) is evaluated within RIFAQUIN for identifying new infections in treated patients as either relapses or reinfections. WGS is further compared with mycobacterial interspersed repetitive units-variable number tandem repeats (MIRU-VNTR) typing. This is the first report of WGS being used to evaluate new infections in a completed clinical trial for which all treatment and epidemiological data are available for analysis. DNA from 36 paired samples of Mycobacterium tuberculosis cultured from patients before and after treatment was typed using 24-loci MIRU-VNTR, in silico spoligotyping and WGS. Following WGS, the sequences were mapped against the reference strain H37Rv, the single-nucleotide polymorphism (SNP) differences between pairs were identified, and a phylogenetic reconstruction was performed. WGS indicated that 32 of the paired samples had a very low number of SNP differences (0-5; likely relapses). One pair had an intermediate number of SNP differences, and was likely the result of a mixed infection with a pre-treatment minor genotype that was highly related to the post-treatment genotype; this was reclassified as a relapse, in contrast to the MIRU-VNTR result. The remaining three pairs had very high SNP differences (>750; likely reinfections). WGS and MIRU-VNTR both similarly differentiated relapses and reinfections, but WGS provided significant extra information. The low proportion of reinfections seen suggests that in standard chemotherapy trials with up to 24 months of follow-up, typing the strains brings little benefit to an analysis of the trial outcome in terms of differentiating relapse and reinfection. However, there is a benefit to using WGS as compared to MIRU-VNTR in terms of the additional genotype information obtained, in particular for defining the presence of mixed

  10. Whole genome analysis of Vietnamese G2P[4] rotavirus strains possessing the NSP2 gene sharing an ancestral sequence with Chinese sheep and goat rotavirus strains.

    Science.gov (United States)

    Do, Loan Phuong; Doan, Yen Hai; Nakagomi, Toyoko; Gauchan, Punita; Kaneko, Miho; Agbemabiese, Chantal; Dang, Anh Duc; Nakagomi, Osamu

    2015-10-01

    Because imminent introduction into Vietnam of a vaccine against Rotavirus A is anticipated, baseline information on the whole genome of representative strains is needed to understand changes in circulating strains that may occur after vaccine introduction. In this study, the whole genomes of two G2P[4] strains detected in Nha Trang, Vietnam in 2008 were sequenced, this being the last period during which virtually no rotavirus vaccine was used in this country. The two strains were found to be >99.9% identical in sequence and had a typical DS-1 like G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2 genotype constellation. Analysis of the Vietnamese strains with >184 G2P[4] strains retrieved from GenBank/EMBL/DDBJ DNA databases placed the Vietnamese strains in one of the lineages commonly found among contemporary strains, with the exception of the NSP2 and NSP4 genes. The NSP2 genes were found to belong to a previously undescribed lineage that diverged from Chinese sheep and goat rotavirus strains, including a Chinese rotavirus vaccine strain LLR with 95% nucleotide identity; the time of their most recent common ancestor was 1975. The NSP4 genes were found to belong, together with Thai and USA strains, to an emergent lineage (VIII), adding further diversity to ever diversifying NSP4 lineages. Thus, there is a need to enhance surveillance of locally-circulating strains from both children and animals at the whole genome level to address the effect of rotavirus vaccines on changing strain distribution. © 2015 The Societies and Wiley Publishing Asia Pty Ltd.

  11. A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease.

    Science.gov (United States)

    Smedley, Damian; Schubach, Max; Jacobsen, Julius O B; Köhler, Sebastian; Zemojtel, Tomasz; Spielmann, Malte; Jäger, Marten; Hochheiser, Harry; Washington, Nicole L; McMurry, Julie A; Haendel, Melissa A; Mungall, Christopher J; Lewis, Suzanna E; Groza, Tudor; Valentini, Giorgio; Robinson, Peter N

    2016-09-01

    The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease. Copyright © 2016 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  12. Microfluidic screening and whole-genome sequencing identifies mutations associated with improved protein secretion by yeast

    DEFF Research Database (Denmark)

    Huang, Mingtao; Bai, Yunpeng; Sjostrom, Staffan L.

    2015-01-01

    interest in improving its protein secretion capacity. Due to the complexity of the secretory machinery in eukaryotic cells, it is difficult to apply rational engineering for construction of improved strains. Here we used high-throughput microfluidics for the screening of yeast libraries, generated by UV...... mutagenesis. Several screening and sorting rounds resulted in the selection of eight yeast clones with significantly improved secretion of recombinant a-amylase. Efficient secretion was genetically stable in the selected clones. We performed whole-genome sequencing of the eight clones and identified 330...... to construct efficient cell factories for protein secretion. The combined use of microfluidics screening and whole-genome sequencing to map the mutations associated with the improved phenotype can easily be adapted for other products and cell types to identify novel engineering targets, and this approach could...

  13. Tolerance of Whole-Genome Doubling Propagates Chromosomal Instability and Accelerates Cancer Genome Evolution

    DEFF Research Database (Denmark)

    Dewhurst, Sally M.; McGranahan, Nicholas; Burrell, Rebecca A.

    2014-01-01

    The contribution of whole-genome doubling to chromosomal instability (CIN) and tumor evolution is unclear. We use long-term culture of isogenic tetraploid cells from a stable diploid colon cancer progenitor to investigate how a genome-doubling event affects genome stability over time. Rare cells ...... [discovery data: hazard ratio (HR), 4.70, 95% confidence interval (CI), 1.04–21.37; validation data: HR, 1.59, 95% CI, 1.05–2.42]. These data highlight an important role for the tolerance of genome doubling in driving cancer genome evolution.......The contribution of whole-genome doubling to chromosomal instability (CIN) and tumor evolution is unclear. We use long-term culture of isogenic tetraploid cells from a stable diploid colon cancer progenitor to investigate how a genome-doubling event affects genome stability over time. Rare cells...

  14. Downsizing genomic medicine: approaching the ethical complexity of whole-genome sequencing by starting small.

    Science.gov (United States)

    Sharp, Richard R

    2011-03-01

    As we look to a time when whole-genome sequencing is integrated into patient care, it is possible to anticipate a number of ethical challenges that will need to be addressed. The most intractable of these concern informed consent and the responsible management of very large amounts of genetic information. Given the range of possible findings, it remains unclear to what extent it will be possible to obtain meaningful patient consent to genomic testing. Equally unclear is how clinicians will disseminate the enormous volume of genetic information produced by whole-genome sequencing. Toward developing practical strategies for managing these ethical challenges, we propose a research agenda that approaches multiplexed forms of clinical genetic testing as natural laboratories in which to develop best practices for managing the ethical complexities of genomic medicine.

  15. Whole genome multilocus sequence typing as an epidemiologic tool for Yersinia pestis.

    Science.gov (United States)

    Kingry, Luke C; Rowe, Lori A; Respicio-Kingry, Laurel B; Beard, Charles B; Schriefer, Martin E; Petersen, Jeannine M

    2016-04-01

    Human plague is a severe and often fatal zoonotic disease caused by Yersinia pestis. For public health investigations of human cases, nonintensive whole genome molecular typing tools, capable of defining epidemiologic relationships, are advantageous. Whole genome multilocus sequence typing (wgMLST) is a recently developed methodology that simplifies genomic analyses by transforming millions of base pairs of sequence into character data for each gene. We sequenced 13 US Y. pestis isolates with known epidemiologic relationships. Sequences were assembled de novo, and multilocus sequence typing alleles were assigned by comparison against 3979 open reading frames from the reference strain CO92. Allele-based cluster analysis accurately grouped the 13 isolates, as well as 9 publicly available Y. pestis isolates, by their epidemiologic relationships. Our findings indicate wgMLST is a simplified, sensitive, and scalable tool for epidemiologic analysis of Y. pestis strains. Published by Elsevier Inc.

  16. Whole genome sequence of Enterobacter ludwigii type strain EN-119T, isolated from clinical specimens.

    Science.gov (United States)

    Li, Gengmi; Hu, Zonghai; Zeng, Ping; Zhu, Bing; Wu, Lijuan

    2015-04-01

    Enterobacter ludwigii strain EN-119(T) is the type strain of E. ludwigii, which belongs to the E. cloacae complex (Ecc). This strain was first reported and nominated in 2005 and later been found in many hospitals. In this paper, the whole genome sequencing of this strain was carried out. The total genome size of EN-119(T) is 4952,770 bp with 4578 coding sequences, 88 tRNAs and 10 rRNAs. The genome sequence of EN-119(T) is the first whole genome sequence of E. ludwigii, which will further our understanding of Ecc. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. Comparison of whole genome amplification techniques for human single cell exome sequencing.

    Directory of Open Access Journals (Sweden)

    Erik Borgström

    Full Text Available Whole genome amplification (WGA is currently a prerequisite for single cell whole genome or exome sequencing. Depending on the method used the rate of artifact formation, allelic dropout and sequence coverage over the genome may differ significantly.The largest difference between the evaluated protocols was observed when analyzing the target coverage and read depth distribution. These differences also had impact on the downstream variant calling. Conclusively, the products from the AMPLI1 and MALBAC kits were shown to be most similar to the bulk samples and are therefore recommended for WGA of single cells.In this study four commercial kits for WGA (AMPLI1, MALBAC, Repli-G and PicoPlex were used to amplify human single cells. The WGA products were exome sequenced together with non-amplified bulk samples from the same source. The resulting data was evaluated in terms of genomic coverage, allelic dropout and SNP calling.

  18. Efficiency of methylated DNA immunoprecipitation bisulphite sequencing for whole-genome DNA methylation analysis.

    Science.gov (United States)

    Jeong, Hae Min; Lee, Sangseon; Chae, Heejoon; Kim, RyongNam; Kwon, Mi Jeong; Oh, Ensel; Choi, Yoon-La; Kim, Sun; Shin, Young Kee

    2016-08-01

    We compared four common methods for measuring DNA methylation levels and recommended the most efficient method in terms of cost and coverage. The DNA methylation status of liver and stomach tissues was profiled using four different methods, whole-genome bisulphite sequencing (WG-BS), targeted bisulphite sequencing (Targeted-BS), methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA immunoprecipitation bisulphite sequencing (MeDIP-BS). We calculated DNA methylation levels using each method and compared the results. MeDIP-BS yielded the most similar DNA methylation profile to WG-BS, with 20 times less data, suggesting remarkable cost savings and coverage efficiency compared with the other methods. MeDIP-BS is a practical cost-effective method for analyzing whole-genome DNA methylation that is highly accurate at base-pair resolution.

  19. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing

    DEFF Research Database (Denmark)

    Li, Ying-hui; Zhao, Shan-cen; Ma, Jian-xin

    2013-01-01

    BACKGROUND:Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re......-sequencing accessions, which represent wild, domesticated landrace, and Chinese elite soybean populations were analyzed.RESULTS:A total of 5,102,244 single nucleotide polymorphisms (SNPs) and 707,969 insertion/deletions were identified. Among the SNPs detected, 25.5% were not described previously. We found...... that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars. Only a small proportion (2.99%) of the whole genomic regions appear to be affected by artificial selection for preferred agricultural traits...

  20. Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors.

    Science.gov (United States)

    MacLeod, Iona M; Larkin, Denis M; Lewin, Harris A; Hayes, Ben J; Goddard, Mike E

    2013-09-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493-496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.

  1. Whole-genome shotgun sequence of phenazine-producing endophytic Streptomyces kebangsaanensis SUK12

    OpenAIRE

    Juwairiah Remali; Kok-Keong Loke; Chyan Leong Ng; Wan Mohd Aizat; John Tiong; Noraziah Mohamad Zin

    2017-01-01

    Streptomyces sp. produces bioactive compounds with a broad spectrum of activities. Streptomyces kebangsaanesis SUK12 has been identified as a novel endophytic bacteria isolated from ethnomedicinal plant Portulaca olerace, and was found to produce the phenazine class of biologically active antimicrobial metabolites. The potential use of the phenazines has led to our research interest in determining the genome sequence of Streptomyces kebangsaanensis SUK12. This Whole Genome Shotgun project has...

  2. Whole-Genome Analysis in Korean Patients with Autoimmune Myasthenia Gravis

    OpenAIRE

    Na, Sang-Jun; Lee, Ji Hyun; Kim, So Won; Kim, Dae-Seong; Shon, Eun Hee; Park, Hyung Jun; Shin, Ha Young; Kim, Seung Min; Choi, Young-Chul

    2014-01-01

    Purpose The underlying cause of myasthenia gravis (MG) is unknown, although it likely involves a genetic component. However, no common genetic variants have been unequivocally linked to autoimmune MG. We sought to identify the genetic variants associated with an increased or decreased risk of developing MG in samples from a Korean Multicenter MG Cohort. Materials and Methods To determine new genetic targets related to autoimmune MG, a whole genome-based single nucleotide polymorphisms (SNP) a...

  3. Demographic history and biologically relevant genetic variation of Native Mexicans inferred from whole-genome sequencing

    OpenAIRE

    Romero-Hidalgo, Sandra; Ochoa-Leyva, Adrián; Garcíarrubio, Alejandro; Acuña-Alonzo, Victor; Antúnez-Argüelles, Erika; Balcazar-Quintero, Martha; Barquera-Lozano, Rodrigo; Carnevale, Alessandra; Cornejo-Granados, Fernanda; Fernández-López, Juan Carlos; García-Herrera, Rodrigo; García-Ortíz, Humberto; Granados-Silvestre, Ángeles; Granados, Julio; Guerrero-Romero, Fernando

    2017-01-01

    Understanding the genetic structure of Native American populations is important to clarify their diversity, demographic history, and to identify genetic factors relevant for biomedical traits. Here, we show a demographic history reconstruction from 12 Native American whole genomes belonging to six distinct ethnic groups representing the three main described genetic clusters of Mexico (Northern, Southern, and Maya). Effective population size estimates of all Native American groups remained bel...

  4. High Depth, Whole-Genome Sequencing of Cholera Isolates from Haiti and the Dominican Republic

    Science.gov (United States)

    2012-09-11

    glycerol at −80 degrees C. Illumina-based whole genome sequencing We extracted DNA from V. cholerae strains using QiagenDNEasy (Qiagen, Valencia, CA...distinct DNA mismatch repair proteins, and two mutations in two outer membrane proteins, OmpV and OmpH. In order to identify purifying or positive...spontaneously passed human stool samples of patients with a diagnosis of cholera. All patients received standard medical treatment for cholera

  5. ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis

    OpenAIRE

    Riaz, Tiayyba; Shehzad, Wasim; Viari, Alain; Pompanon, Fran?ois; Taberlet, Pierre; Coissac, Eric

    2011-01-01

    Using non-conventional markers, DNA metabarcoding allows biodiversity assessment from complex substrates. In this article, we present ecoPrimers, a software for identifying new barcode markers and their associated PCR primers. ecoPrimers scans whole genomes to find such markers without a priori knowledge. ecoPrimers optimizes two quality indices measuring taxonomical range and discrimination to select the most efficient markers from a set of reference sequences, according to specific experime...

  6. Attitudes of African Americans toward Return of Results from Exome and Whole Genome Sequencing

    OpenAIRE

    Yu, Joon-Ho; Crouch, Julia; Jamal, Seema M.; Holly K Tabor; Bamshad, Michael J.

    2013-01-01

    Exome sequencing and whole genome sequencing (ES/WGS) present patients and research participants with the opportunity to benefit from a broad scope of genetic results of clinical and personal utility. Yet, this potential for benefit also risks disenfranchising populations such as African Americans (AAs) that are already underrepresented in genetic research and utilize genetic tests at lower rates than other populations. Understanding a diverse range of perspectives on consenting for ES/WGS an...

  7. Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors

    Science.gov (United States)

    MacLeod, Iona M.; Larkin, Denis M.; Lewin, Harris A.; Hayes, Ben J.; Goddard, Mike E.

    2013-01-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals. PMID:23842528

  8. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA

    DEFF Research Database (Denmark)

    Poulsen, Jesper Buchhave; Lescai, Francesco; Grove, Jakob

    2016-01-01

    be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA). Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject...... from variant calls. No differences were observed substituting 2x3.2 with 2x1.6 mm discs, allowing for additional reduction of sample material in future projects....

  9. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing

    OpenAIRE

    Helman, Elena; Lawrence, Michael S.; Stewart, Chip; Sougnez, Carrie; Getz, Gad; Meyerson, Matthew

    2014-01-01

    Retrotransposons constitute a major source of genetic variation, and somatic retrotransposon insertions have been reported in cancer. Here, we applied TranspoSeq, a computational framework that identifies retrotransposon insertions from sequencing data, to whole genomes from 200 tumor/normal pairs across 11 tumor types as part of The Cancer Genome Atlas (TCGA) Pan-Cancer Project. In addition to novel germline polymorphisms, we find 810 somatic retrotransposon insertions primarily in lung squa...

  10. Comparative performance of two whole-genome capture methodologies on ancient DNA Illumina libraries

    OpenAIRE

    Ávila-Arcos María C; Sandoval-Velasco Marcela; Schroeder Hannes; Carpenter Meredith L.; Malaspinas Anna-Sapfo; Wales Nathan; Peñaloza Fernando; Bustamante Carlos D.; Gilbert M. Thomas P.

    2015-01-01

    Application of whole genome capture (WGC) methods to ancient DNA (aDNA) promises to increase efficiency of ancient genome sequencing. We compared the performance of two recent WGC methods in enriching human aDNA within Illumina libraries built using both double stranded and single stranded build protocols. Although both methods effectively enriched aDNA we observed consistent differences between the methods providing the opportunity to further explore parameters influencing WGC experiments. ...

  11. Real-Time Whole-Genome Sequencing for Surveillance of Listeria monocytogenes, France

    OpenAIRE

    Moura, Alexandra; Tourdjman, Mathieu; Leclercq, Alexandre; Hamelin, Estelle; Laurent, Edith; Fredriksen, Nathalie; Van Cauteren, Dieter; Bracq-Dieye, H?l?ne; Thouvenot, Pierre; Vales, Guillaume; Tessaud-Rita, Nathalie; Maury, Myl?ne M.; Alexandru, Andreea; Criscuolo, Alexis; Quevillon, Emmanuel

    2017-01-01

    During 2015?2016, we evaluated the performance of whole-genome sequencing (WGS) as a routine typing tool. Its added value for microbiological and epidemiologic surveillance of listeriosis was compared with that for pulsed-field gel electrophoresis (PFGE), the current standard method. A total of 2,743 Listeria monocytogenes isolates collected as part of routine surveillance were characterized in parallel by PFGE and core genome multilocus sequence typing (cgMLST) extracted from WGS. We investi...

  12. Prospective Whole-Genome Sequencing Enhances National Surveillance of Listeria monocytogenes

    OpenAIRE

    Kwong, Jason C.; Mercoulia, Karolina; Tomita, Takehiro; Easton, Marion; Li, Hua Y.; Bulach, Dieter M.; Stinear, Timothy P.; Seemann, Torsten; Howden, Benjamin P.

    2016-01-01

    Whole-genome sequencing (WGS) has emerged as a powerful tool for comparing bacterial isolates in outbreak detection and investigation. Here we demonstrate that WGS performed prospectively for national epidemiologic surveillance of Listeria monocytogenes has the capacity to be superior to our current approaches using pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable-number tandem-repeat analysis (MLVA), binary typing, and serotyping. Initially 423 ...

  13. Comparison of whole genome sequencing to restriction endonuclease analysis and gel diffusion precipitin-based serotyping of Pasteurella multocida.

    Science.gov (United States)

    LeCount, Karen J; Schlater, Linda K; Stuber, Tod; Robbe Austerman, Suelee; Frana, Timothy S; Griffith, Ronald W; Erdman, Matthew M

    2018-01-01

    The gel diffusion precipitin test (GDPT) and restriction endonuclease analysis (REA) have commonly been used in the serotyping and genotyping of Pasteurella multocida. Whole genome sequencing (WGS) and single nucleotide polymorphism (SNP) analysis has become the gold standard for other organisms, offering higher resolution than previously available methods. We compared WGS to REA and GDPT on 163 isolates of P. multocida to determine if WGS produced more precise results. The isolates used represented the 16 reference serovars, isolates with REA profiles matching an attenuated fowl cholera vaccine strain, and isolates from 10 different animal species. Isolates originated from across the United States and from Chile. Identical REA profiles clustered together in the phylogenetic tree. REA profiles that differed by only a few bands had fewer SNP differences than REA profiles with more differences, as expected. The GDPT results were diverse but it was common to see a single serovar show up repeatedly within clusters. Several errors were found when examining the REA profiles. WGS was able to confirm these errors and compensate for the subjectivity in analysis of REA. Also, results of WGS and SNP analysis correlated more closely with the epidemiologic data than GDPT. In silico results were also compared to a lipopolysaccharide rapid multiplex PCR test. From the data produced in our study, WGS and SNP analysis was superior to REA and GDPT and highlighted some of the issues with the older tests.

  14. Evaluation of a Stenotrophomonas maltophilia bacteremia cluster in hematopoietic stem cell transplantation recipients using whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Stefanie Kampmeier

    2017-11-01

    Full Text Available Abstract Background Stenotrophomonas maltophilia ubiquitously occurs in the hospital environment. This opportunistic pathogen can cause severe infections in immunocompromised hosts such as hematopoietic stem cell transplantation (HSCT recipients. Between February and July 2016, a cluster of four patients on the HSCT unit suffered from S. maltophilia bloodstream infections (BSI. Methods For epidemiological investigation we retrospectively identified the colonization status of patients admitted to the ward during this time period and performed environmental monitoring of shower heads, shower outlets, washbasins and toilets in patient rooms. We tested antibiotic susceptibility of detected S. maltophilia isolates. Environmental and blood culture samples were subjected to whole genome sequence (WGS-based typing. Results Of four patients with S. maltophlilia BSI, three were found to be colonized previously. In addition, retrospective investigations revealed two patients being colonized in anal swab samples but not infected. Environmental monitoring revealed one shower outlet contaminated with S. maltophilia. Antibiotic susceptibility testing of seven S. maltophlia strains resulted in two trimethoprim/sulfamethoxazole resistant and five susceptible isolates, however, not excluding an outbreak scenario. WGS-based typing did not result in any close genotypic relationship among the patients’ isolates. In contrast, one environmental isolate from a shower outlet was closely related to a single patient’s isolate. Conclusion WGS-based typing successfully refuted an outbreak of S. maltophilia on a HSCT ward but uncoverd that sanitary installations can be an actual source of S. maltophilia transmissions.

  15. Rapid Identification of Potential Drugs for Diabetic Nephropathy Using Whole-Genome Expression Profiles of Glomeruli

    Directory of Open Access Journals (Sweden)

    Jingsong Shi

    2016-01-01

    Full Text Available Objective. To investigate potential drugs for diabetic nephropathy (DN using whole-genome expression profiles and the Connectivity Map (CMAP. Methodology. Eighteen Chinese Han DN patients and six normal controls were included in this study. Whole-genome expression profiles of microdissected glomeruli were measured using the Affymetrix human U133 plus 2.0 chip. Differentially expressed genes (DEGs between late stage and early stage DN samples and the CMAP database were used to identify potential drugs for DN using bioinformatics methods. Results. (1 A total of 1065 DEGs (FDR 1.5 were found in late stage DN patients compared with early stage DN patients. (2 Piperlongumine, 15d-PGJ2 (15-delta prostaglandin J2, vorinostat, and trichostatin A were predicted to be the most promising potential drugs for DN, acting as NF-κB inhibitors, histone deacetylase inhibitors (HDACIs, PI3K pathway inhibitors, or PPARγ agonists, respectively. Conclusion. Using whole-genome expression profiles and the CMAP database, we rapidly predicted potential DN drugs, and therapeutic potential was confirmed by previously published studies. Animal experiments and clinical trials are needed to confirm both the safety and efficacy of these drugs in the treatment of DN.

  16. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    Science.gov (United States)

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-07-07

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. Copyright © 2016 Teng et al.

  17. Targeted analysis of whole genome sequence data to diagnose genetic cardiomyopathy.

    Science.gov (United States)

    Golbus, Jessica R; Puckelwartz, Megan J; Dellefave-Castillo, Lisa; Fahrenbach, John P; Nelakuditi, Viswateja; Pesce, Lorenzo L; Pytel, Peter; McNally, Elizabeth M

    2014-12-01

    Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of >50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift toward comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused on 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1 to 14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and segregation analysis, where available. Three of 3 previously identified primary mutations were detected by this analysis. In 6 subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and had additional pathological correlation to provide evidence for causality. For 2 subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. These pilot data demonstrate that ≈30 to 40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes. © 2014 American Heart Association, Inc.

  18. Personalized oncogenomics: clinical experience with malignant peritoneal mesothelioma using whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Brandon S Sheffield

    Full Text Available Peritoneal mesothelioma is a rare and sometimes lethal malignancy that presents a clinical challenge for both diagnosis and management. Recent studies have led to a better understanding of the molecular biology of peritoneal mesothelioma. Translation of the emerging data into better treatments and outcome is needed. From two patients with peritoneal mesothelioma, we derived whole genome sequences, RNA expression profiles, and targeted deep sequencing data. Molecular data were made available for translation into a clinical treatment plan. Treatment responses and outcomes were later examined in the context of molecular findings. Molecular studies presented here provide the first reported whole genome sequences of peritoneal mesothelioma. Mutations in known mesothelioma-related genes NF2, CDKN2A, LATS2, amongst others, were identified. Activation of MET-related signaling pathways was demonstrated in both cases. A hypermutated phenotype was observed in one case (434 vs. 18 single nucleotide variants and was associated with a favourable outcome despite sarcomatoid histology and multifocal disease. This study represents the first report of whole genome analyses of peritoneal mesothelioma, a key step in the understanding and treatment of this disease.

  19. Evaluation of the OvineSNP50 genotyping array in four South

    African Journals Online (AJOL)

    Sandenbergh, Lise

    2016-03-21

    Mar 21, 2016 ... The OvineSNP50 chip was developed by Illumina in collaboration with the International Sheep. Genomics Consortium (ISGC) and became commercially available in 2009. This microarray-based system is designed to determine the genotype of approximately 54 000 single nucleotide polymorphisms (SNPs).

  20. Copy number analysis by low coverage whole genome sequencing using ultra low-input DNA from formalin-fixed paraffin embedded tumor tissue.

    Science.gov (United States)

    Kader, Tanjina; Goode, David L; Wong, Stephen Q; Connaughton, Jacquie; Rowley, Simone M; Devereux, Lisa; Byrne, David; Fox, Stephen B; Mir Arnau, Gisela; Tothill, Richard W; Campbell, Ian G; Gorringe, Kylie L

    2016-11-15

    Unlocking clinically translatable genomic information, including copy number alterations (CNA), from formalin-fixed paraffin-embedded (FFPE) tissue is challenging due to low yields and degraded DNA. We describe a robust, cost-effective low-coverage whole genome sequencing (LC WGS) method for CNA detection using 5 ng of FFPE-derived DNA. CN profiles using 100 ng or 5 ng input DNA were highly concordant and comparable with molecular inversion probe (MIP) array profiles. LC WGS improved CN profiles of samples that performed poorly using MIP arrays. Our technique enables identification of driver and prognostic CNAs in archival patient samples previously deemed unsuitable for genomic analysis due to DNA limitations.

  1. Selective Whole-Genome Amplification Is a Robust Method That Enables Scalable Whole-Genome Sequencing of Plasmodium vivax from Unprocessed Clinical Samples

    Directory of Open Access Journals (Sweden)

    Annie N. Cowell

    2017-02-01

    Full Text Available Whole-genome sequencing (WGS of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA, which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax, the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens.

  2. Whole-genome analysis of herbicide-tolerant mutant rice generated by Agrobacterium-mediated gene targeting.

    Science.gov (United States)

    Endo, Masaki; Kumagai, Masahiko; Motoyama, Ritsuko; Sasaki-Yamagata, Harumi; Mori-Hosokawa, Satomi; Hamada, Masao; Kanamori, Hiroyuki; Nagamura, Yoshiaki; Katayose, Yuichi; Itoh, Takeshi; Toki, Seiichi

    2015-01-01

    Gene targeting (GT) is a technique used to modify endogenous genes in target genomes precisely via homologous recombination (HR). Although GT plants are produced using genetic transformation techniques, if the difference between the endogenous and the modified gene is limited to point mutations, GT crops can be considered equivalent to non-genetically modified mutant crops generated by conventional mutagenesis techniques. However, it is difficult to guarantee the non-incorporation of DNA fragments from Agrobacterium in GT plants created by Agrobacterium-mediated GT despite screening with conventional Southern blot and/or PCR techniques. Here, we report a comprehensive analysis of herbicide-tolerant rice plants generated by inducing point mutations in the rice ALS gene via Agrobacterium-mediated GT. We performed genome comparative genomic hybridization (CGH) array analysis and whole-genome sequencing to evaluate the molecular composition of GT rice plants. Thus far, no integration of Agrobacterium-derived DNA fragments has been detected in GT rice plants. However, >1,000 single nucleotide polymorphisms (SNPs) and insertion/deletion (InDels) were found in GT plants. Among these mutations, 20-100 variants might have some effect on expression levels and/or protein function. Information about additive mutations should be useful in clearing out unwanted mutations by backcrossing. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.

  3. Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Lai-Ping Wong

    2014-05-01

    Full Text Available South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP. The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP. SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.

  4. Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

    Science.gov (United States)

    Wong, Lai-Ping; Lai, Jason Kuan-Han; Saw, Woei-Yuh; Ong, Rick Twee-Hee; Cheng, Anthony Youzhi; Pillai, Nisha Esakimuthu; Liu, Xuanyao; Xu, Wenting; Chen, Peng; Foo, Jia-Nee; Tan, Linda Wei-Lin; Koo, Seok-Hwee; Soong, Richie; Wenk, Markus Rene; Lim, Wei-Yen; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2014-05-01

    South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.

  5. A new sieving matrix for DNA sequencing, genotyping and mutation detection and high-throughput genotyping with a 96-capillary array system

    Energy Technology Data Exchange (ETDEWEB)

    Gao, David [Iowa State Univ., Ames, IA (United States)

    1999-11-08

    Capillary electrophoresis has been widely accepted as a fast separation technique in DNA analysis. In this dissertation, a new sieving matrix is described for DNA analysis, especially DNA sequencing, genetic typing and mutation detection. A high-throughput 96 capillary array electrophoresis system was also demonstrated for simultaneous multiple genotyping. The authors first evaluated the influence of different capillary coatings on the performance of DNA sequencing. A bare capillary was compared with a DB-wax, an FC-coated and a polyvinylpyrrolidone dynamically coated capillary with PEO as sieving matrix. It was found that covalently-coated capillaries had no better performance than bare capillaries while PVP coating provided excellent and reproducible results. The authors also developed a new sieving Matrix for DNA separation based on commercially available poly(vinylpyrrolidone) (PVP). This sieving matrix has a very low viscosity and an excellent self-coating effect. Successful separations were achieved in uncoated capillaries. Sequencing of M13mp18 showed good resolution up to 500 bases in treated PVP solution. Temperature gradient capillary electrophoresis and PVP solution was applied to mutation detection. A heteroduplex sample and a homoduplex reference were injected during a pair of continuous runs. A temperature gradient of 10 C with a ramp of 0.7 C/min was swept throughout the capillary. Detection was accomplished by laser induced fluorescence detection. Mutation detection was performed by comparing the pattern changes between the homoduplex and the heteroduplex samples. High throughput, high detection rate and easy operation were achieved in this system. They further demonstrated fast and reliable genotyping based on CTTv STR system by multiple-capillary array electrophoresis. The PCR products from individuals were mixed with pooled allelic ladder as an absolute standard and coinjected with a 96-vial tray. Simultaneous one-color laser-induced fluorescence

  6. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee

    DEFF Research Database (Denmark)

    Ellington, M J; Ekelund, O; Aarestrup, Frank Møller

    2017-01-01

    Whole genome sequencing (WGS) offers the potential to predict antimicrobial susceptibility from a single assay. The European Committee on Antimicrobial Susceptibility Testing established a subcommittee to review the current development status of WGS for bacterial antimicrobial susceptibility...

  7. Novel Degenerate PCR Method for Whole-Genome Amplification Applied to Peru Margin (ODP Leg 201) Subsurface Samples

    National Research Council Canada - National Science Library

    Martino, Amanda J; Rhodes, Matthew E; Biddle, Jennifer F; Brandt, Leah D; Tomsho, Lynn P; House, Christopher H

    2012-01-01

    A degenerate polymerase chain reaction (PCR)-based method of whole-genome amplification, designed to work fluidly with 454 sequencing technology, was developed and tested for use on deep marine subsurface DNA samples...

  8. Algorithms to Model Single Gene, Single Chromosome, and Whole Genome Copy Number Changes Jointly in Tumor Phylogenetics: e1003740

    National Research Council Canada - National Science Library

    Salim Akhter Chowdhury; Stanley E Shackney; Kerstin Heselmeyer-Haddad; Thomas Ried; Alejandro A Schäffer; Russell Schwartz

    2014-01-01

      We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome...

  9. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics

    National Research Council Canada - National Science Library

    Chowdhury, Salim Akhter; Shackney, Stanley E; Heselmeyer-Haddad, Kerstin; Ried, Thomas; Schäffer, Alejandro A; Schwartz, Russell

    2014-01-01

    We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome...

  10. Evidence and evolutionary analysis of ancient whole-genome duplication in barley predating the divergence from rice

    OpenAIRE

    Grosse Ivo; Waugh Robbie; Graner Andreas; Thiel Thomas; Close Timothy J; Stein Nils

    2009-01-01

    Abstract Background Well preserved genomic colinearity among agronomically important grass species such as rice, maize, Sorghum, wheat and barley provides access to whole-genome structure information even in species lacking a reference genome sequence. We investigated footprints of whole-genome duplication (WGD) in barley that shaped the cereal ancestor genome by analyzing shared synteny with rice using a ~2000 gene-based barley genetic map and the rice genome reference sequence. Results Base...

  11. Whole-Genome Sequencing Allows for Improved Identification of Persistent Listeria monocytogenes in Food-Associated Environments

    OpenAIRE

    Stasiewicz, Matthew J.; Oliver, Haley F.; Wiedmann, Martin; den Bakker, Henk C.

    2015-01-01

    While the food-borne pathogen Listeria monocytogenes can persist in food associated environments, there are no whole-genome sequence (WGS) based methods to differentiate persistent from sporadic strains. Whole-genome sequencing of 188 isolates from a longitudinal study of L. monocytogenes in retail delis was used to (i) apply single-nucleotide polymorphism (SNP)-based phylogenetics for subtyping of L. monocytogenes, (ii) use SNP counts to differentiate persistent from repeatedly reintroduced ...

  12. Genotyping of high-risk anal human papillomavirus (HPV): ion torrent-next generation sequencing vs. linear array.

    Science.gov (United States)

    Nowak, Rebecca G; Ambulos, Nicholas P; Schumaker, Lisa M; Mathias, Trevor J; White, Ruth A; Troyer, Jennifer; Wells, David; Charurat, Manhattan E; Bentzen, Søren M; Cullen, Kevin J

    2017-06-13

    Our next generation sequencing (NGS)-based human papillomavirus (HPV) genotyping assay showed a high degree of concordance with the Roche Linear Array (LA) with as little as 1.25 ng formalin-fixed paraffin-embedded-derived genomic DNA in head and neck and cervical cancer samples. This sensitive genotyping assay uses barcoded HPV PCR broad-spectrum general primers 5+/6+ (BSGP)5+/6+ applicable to population studies, but it's diagnostic performance has not been tested in cases with multiple concurrent HPV infections. We conducted a cross-sectional study to compare the positive and negative predictive value (PPV and NPV), sensitivity and specificity of the NGS assay to detect HPV genotype infections as compared to the LA. DNA was previously extracted from ten anal swab samples from men who have sex with men in Nigeria enrolled on the TRUST/RV368 cohort study. Two-sample tests of proportions were used to examine differences in the diagnostic performance of the NGS assay to detect high vs. low-risk HPV type-specific infections. In total there were 94 type-specific infections detected in 10 samples with a median of 9.5, range (9 to 10) per sample. Using the LA as the gold standard, 84.4% (95% CI: 75.2-91.2) of the same anal type-specific infections detected on the NGS assay had been detected by LA. The PPV and sensitivity differed significantly for high risk (PPV: 90%, 95% CI: 79.5-96.2; sensitivity: 93.1%, 95% CI: 83.3-98.1) as compared to low risk HPV (PPV: 73%, 95% CI: 54.1-87.7; sensitivity: 61.1, 95% CI: 43.5-76.9) (all p  0.05). The NGS assay detected 10 HPV genotypes that were not among the 37 genotypes found on LA (30, 32, 43, 44, 74, 86, 87, 90, 91, 114). The NGS assay accurately detects multiple HPV infections in individual clinical specimens with limited sample volume and has extended coverage compared to LA.

  13. Taxonomic revision of Harveyi clade bacteria (family Vibrionaceae) based on analysis of whole genome sequences.

    Science.gov (United States)

    Urbanczyk, Henryk; Ogura, Yoshitoshi; Hayashi, Tetsuya

    2013-07-01

    Use of inadequate methods for classification of bacteria in the so-called Harveyi clade (family Vibrionaceae, Gammaproteobacteria) has led to incorrect assignment of strains and proliferation of synonymous species. In order to resolve taxonomic ambiguities within the Harveyi clade and to test usefulness of whole genome sequence data for classification of Vibrionaceae, draft genome sequences of 12 strains were determined and analysed. The sequencing included type strains of seven species: Vibrio sagamiensis NBRC 104589(T), Vibrio azureus NBRC 104587(T), Vibrio harveyi NBRC 15634(T), Vibrio rotiferianus LMG 21460(T), Vibrio campbellii NBRC 15631(T), Vibrio jasicida LMG 25398(T), and Vibrio owensii LMG 25443(T). Draft genome sequences of strain LMG 25430, previously designated the type strain of [Vibrio communis], and two strains (MWB 21 and 090810c) from the 'beijerinckii' lineage were also determined. Whole genomes of two additional strains (ATCC 25919 and 200612B) that previously could not be assigned to any Harveyi clade species were also sequenced. Analysis of the genome sequence data revealed a clear case of synonymy between V. owensii and [V. communis], confirming an earlier proposal to synonymize both species. Both strains from the 'beijerinckii' lineage were classified as V. jasicida, while the strains ATCC 25919 and 200612B were classified as V. owensii and V. campbellii, respectively. We also found that two strains, AND4 and Ex25, are closely related to Harveyi clade bacteria, but could not be assigned to any species of the family Vibrionaceae. The use of whole genome sequence data for the taxonomic classification of the Harveyi clade bacteria and other members of the family Vibrionaceae is also discussed.

  14. Whole genome sequence study of cannabis dependence in two independent cohorts.

    Science.gov (United States)

    Gizer, Ian R; Bizon, Chris; Gilder, David A; Ehlers, Cindy L; Wilhelmsen, Kirk C

    2017-01-23

    Recent advances in genome wide sequencing techniques and analytical methods allow for more comprehensive examinations of the genome than microarray-based genome-wide association studies (GWAS). The present report provides the first application of whole genome sequencing (WGS) to identify low frequency variants involved in cannabis dependence across two independent cohorts. The present study used low-coverage whole genome sequence data to conduct set-based association and enrichment analyses of low frequency variation in protein-coding regions as well as regulatory regions in relation to cannabis dependence. Two cohorts were studied: a population-based Native American tribal community consisting of 697 participants nested within large multi-generational pedigrees and a family-based sample of 1832 predominantly European ancestry participants largely nested within nuclear families. Participants in both samples were assessed for Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV) lifetime cannabis dependence, with 168 and 241 participants receiving a positive diagnosis in each sample, respectively. Sequence kernel association tests identified one protein-coding region, C1orf110 and one regulatory region in the MEF2B gene that achieved significance in a meta-analysis of both samples. A regulatory region within the PCCB gene, a gene previously associated with schizophrenia, exhibited a suggestive association. Finally, a significant enrichment of regions within or near genes with multiple splice variants or involved in cell adhesion or potassium channel activity were associated with cannabis dependence. This initial study demonstrates the potential utility of low pass whole genome sequencing for identifying genetic variants involved in the etiology of cannabis use disorders. © 2017 Society for the Study of Addiction.

  15. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma.

    Science.gov (United States)

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-02-04

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  16. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling.

    Science.gov (United States)

    Meinel, Thomas; Krause, Antje

    2012-01-01

    In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.

  17. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus.

    Directory of Open Access Journals (Sweden)

    Elizabeth M Driebe

    Full Text Available Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss.

  18. Whole-genome resequencing of 100 healthy individuals using DNA pooling.

    Science.gov (United States)

    Wang, Xiaobin; Sui, Weiguo; Wu, Weiqing; Hou, Xianliang; Ou, Minglin; Xiang, Yueying; Dai, Yong

    2016-11-01

    With the advent of next-generation sequencing technology, the cost of sequencing has significantly decreased. However, sequencing costs remain high for large-scale studies. In the present study, DNA pooling was applied as a cost-effective strategy for sequencing. The sequencing results for 100 healthy individuals obtained via whole-genome resequencing and using DNA pooling are presented in the present study. In order to minimise the likelihood of systematic bias in sampling, paired-end libraries with an insert size of 500 bp were prepared for all samples and then subjected to whole-genome sequencing using four lanes for each library and resulting in at least a 30-fold haploid coverage for each sample. The NCBI human genome build37 (hg19) was used as a reference genome for the present study and the short reads were aligned to the reference genome achieving 99.84% coverage. In addition, the average sequencing depth was 32.76. In total, ~3 million single-nucleotide polymorphisms were identified, of which 99.88% were in the NCBI dbSNP database. Furthermore, ~600,000 small insertion/deletions, 500,000 structure variants, 5,000 copy number variations and 13,000 single nucleotide variants were identified. According to the present study, the whole genome has been sequenced for a small sample subjects from southern China for the first time. Furthermore, new variation sites were identified by comparing with the reference sequence, and new knowledge of the human genome variation was added to the human genomic databases. Furthermore, the particular distribution regions of variation were illustrated by analyzing various sites of variation, such as single-nucleotide polymorphisms.

  19. P-Hint-Hunt: a deep parallelized whole genome DNA methylation detection tool.

    Science.gov (United States)

    Peng, Shaoliang; Yang, Shunyun; Gao, Ming; Liao, Xiangke; Liu, Jie; Yang, Canqun; Wu, Chengkun; Yu, Wenqiang

    2017-03-14

    The increasing studies have been conducted using whole genome DNA methylation detection as one of the most important part of epigenetics research to find the significant relationships among DNA methylation and several typical diseases, such as cancers and diabetes. In many of those studies, mapping the bisulfite treated sequence to the whole genome has been the main method to study DNA cytosine methylation. However, today's relative tools almost suffer from inaccuracies and time-consuming problems. In our study, we designed a new DNA methylation prediction tool ("Hint-Hunt") to solve the problem. By having an optimal complex alignment computation and Smith-Waterman matrix dynamic programming, Hint-Hunt could analyze and predict the DNA methylation status. But when Hint-Hunt tried to predict DNA methylation status with large-scale dataset, there are still slow speed and low temporal-spatial efficiency problems. In order to solve the problems of Smith-Waterman dynamic programming and low temporal-spatial efficiency, we further design a deep parallelized whole genome DNA methylation detection tool ("P-Hint-Hunt") on Tianhe-2 (TH-2) supercomputer. To the best of our knowledge, P-Hint-Hunt is the first parallel DNA methylation detection tool with a high speed-up to process large-scale dataset, and could run both on CPU and Intel Xeon Phi coprocessors. Moreover, we deploy and evaluate Hint-Hunt and P-Hint-Hunt on TH-2 supercomputer in different scales. The experimental results illuminate our tools eliminate the deviation caused by bisulfite treatment in mapping procedure and the multi-level parallel program yields a 48 times speed-up with 64 threads. P-Hint-Hunt gain a deep acceleration on CPU and Intel Xeon Phi heterogeneous platform, which gives full play of the advantages of multi-cores (CPU) and many-cores (Phi).

  20. Unlocking the diversity of genebanks: whole-genome marker analysis of Swiss bread wheat and spelt.

    Science.gov (United States)

    Müller, Thomas; Schierscher-Viret, Beate; Fossati, Dario; Brabant, Cécile; Schori, Arnold; Keller, Beat; Krattinger, Simon G

    2017-11-04

    High-throughput genotyping of Swiss bread wheat and spelt accessions revealed differences in their gene pools and identified bread wheat landraces that were not used in breeding. Genebanks play a pivotal role in preserving the genetic diversity present among old landraces and wild progenitors of modern crops and they represent sources of agriculturally important genes that were lost during domestication and in modern breeding. However, undesirable genes that negatively affect crop performance are often co-introduced when landraces and wild crop progenitors are crossed with elite cultivars, which often limit the use of genebank material in modern breeding programs. A detailed genetic characterization is an important prerequisite to solve this problem and to make genebank material more accessible to breeding. Here, we genotyped 502 bread wheat and 293 spelt accessions held in the Swiss National Genebank using a 15K wheat SNP array. The material included both spring and winter wheats and consisted of old landraces and modern cultivars. Genome- and sub-genome-wide analyses revealed that spelt and bread wheat form two distinct gene pools. In addition, we identified bread wheat landraces that were genetically distinct from modern cultivars. Such accessions were possibly missed in the early Swiss wheat breeding program and are promising targets for the identification of novel genes. The genetic information obtained in this study is appropriate to perform genome-wide association studies, which will facilitate the identification and transfer of agriculturally important genes from the genebank into modern cultivars through marker-assisted selection.

  1. Rapid whole genome sequencing for the detection and characterization of microorganisms directly from clinical samples

    DEFF Research Database (Denmark)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Pontén, Thomas

    2014-01-01

    Whole genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples this could further reduce diagnostic time and thereby improve control and treatment. A major bottle-neck is the availability of fast and reliable bioinformatics...... microbiology, WGS of isolated bacteria and by directly sequencing on pellets from the urine. A rapid method for analyzing the sequence data was developed. Bacteria were cultivated from 19 samples, but only in pure culture from 17. WGS improved the identification of the cultivated bacteria and almost complete...

  2. Whole-genome shotgun sequence of phenazine-producing endophytic Streptomyces kebangsaanensis SUK12

    Directory of Open Access Journals (Sweden)

    Juwairiah Remali

    2017-09-01

    Full Text Available Streptomyces sp. produces bioactive compounds with a broad spectrum of activities. Streptomyces kebangsaanesis SUK12 has been identified as a novel endophytic bacteria isolated from ethnomedicinal plant Portulaca olerace, and was found to produce the phenazine class of biologically active antimicrobial metabolites. The potential use of the phenazines has led to our research interest in determining the genome sequence of Streptomyces kebangsaanensis SUK12. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession number PRJNA269542. The raw sequence data are available [https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP105770].

  3. The Promise of Whole Genome Pathogen Sequencing for the Molecular Epidemiology of Emerging Aquaculture Pathogens

    Science.gov (United States)

    Bayliss, Sion C.; Verner-Jeffreys, David W.; Bartie, Kerry L.; Aanensen, David M.; Sheppard, Samuel K.; Adams, Alexandra; Feil, Edward J.

    2017-01-01

    Aquaculture is the fastest growing food-producing sector, and the sustainability of this industry is critical both for global food security and economic welfare. The management of infectious disease represents a key challenge. Here, we discuss the opportunities afforded by whole genome sequencing of bacterial and viral pathogens of aquaculture to mitigate disease emergence and spread. We outline, by way of comparison, how sequencing technology is transforming the molecular epidemiology of pathogens of public health importance, emphasizing the importance of community-oriented databases and analysis tools. PMID:28217117

  4. Long insert whole genome sequencing for copy number variant and translocation detection

    OpenAIRE

    Liang, Winnie S.; Aldrich, Jessica; Tembe, Waibhav; Kurdoglu, Ahmet; Cherni, Irene; Phillips, Lori; Reiman, Rebecca; Baker, Angela; Weiss, Glen J.; Carpten, John D.; Craig, David W.

    2013-01-01

    As next-generation sequencing continues to have an expanding presence in the clinic, the identification of the most cost-effective and robust strategy for identifying copy number changes and translocations in tumor genomes is needed. We hypothesized that performing shallow whole genome sequencing (WGS) of 900–1000-bp inserts (long insert WGS, LI-WGS) improves our ability to detect these events, compared with shallow WGS of 300–400-bp inserts. A priori analyses show that LI-WGS requires less s...

  5. Whole-genome shotgun sequence of phenazine-producing endophytic Streptomyces kebangsaanensis SUK12.

    Science.gov (United States)

    Remali, Juwairiah; Loke, Kok-Keong; Ng, Chyan Leong; Aizat, Wan Mohd; Tiong, John; Zin, Noraziah Mohamad

    2017-09-01

    Streptomyces sp. produces bioactive compounds with a broad spectrum of activities. Streptomyces kebangsaanesis SUK12 has been identified as a novel endophytic bacteria isolated from ethnomedicinal plant Portulaca olerace, and was found to produce the phenazine class of biologically active antimicrobial metabolites. The potential use of the phenazines has led to our research interest in determining the genome sequence of Streptomyces kebangsaanensis SUK12. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession number PRJNA269542. The raw sequence data are available [https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP105770].

  6. Whole genome sequence of Pseudomonas aeruginosa F9676, an antagonistic bacterium isolated from rice seed.

    Science.gov (United States)

    Shi, Zhenyuan; Ren, Deyong; Hu, Shikai; Hu, Xingming; Wu, Liwen; Lin, Haiyan; Hu, Jiang; Zhang, Guangheng; Guo, Longbiao

    2015-10-10

    Pseudomonas aeruginosa is a group of bacteria, which can be isolated from diverse ecological niches. P. aeruginosa strain F9676 was first isolated from a rice seed sample in 2003. It showed strong antagonism against several plant pathogens. In this study, whole genome sequencing was carried out. The total genome size of F9676 is 6368,008bp with 5586 coding genes (CDS), 67 tRNAs and 3 rRNAs. The genome sequence of F9676 may shed a light on antagonism P. aeruginosa. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci

    DEFF Research Database (Denmark)

    Rasmusen, L. H.; Dargis, R.; Iversen, Katrine Højholt

    2016-01-01

    Identification of Mitis group streptococci (MGS) to the species level is challenging for routine microbiology laboratories. Correct identification is crucial for the diagnosis of infective endocarditis, identification of treatment failure, and/or infection relapse. Eighty MGS from Danish patients...... with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were...

  8. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation

    DEFF Research Database (Denmark)

    Zhao, Shancen; Zheng, Pingping; Dong, Shanshan

    2013-01-01

    The panda lineage dates back to the late Miocene and ultimately leads to only one extant species, the giant panda (Ailuropoda melanoleuca). Although global climate change and anthropogenic disturbances are recognized to shape animal population demography their contribution to panda population...... dynamics remains largely unknown. We sequenced the whole genomes of 34 pandas at an average 4.7-fold coverage and used this data set together with the previously deep-sequenced panda genome to reconstruct a continuous demographic history of pandas from their origin to the present. We identify two...

  9. Whole genome shotgun sequence of Bacillus amyloliquefaciens TF28, a biocontrol entophytic bacterium.

    Science.gov (United States)

    Zhang, Shumei; Jiang, Wei; Li, Jing; Meng, Liqiang; Cao, Xu; Hu, Jihua; Liu, Yushuai; Chen, Jingyu; Sha, Changqing

    2016-01-01

    Bacillus amyloliquefaciens TF28 is a biocontrol endophytic bacterium that is capable of inhibition of a broad range of plant pathogenic fungi. The strain has the potential to be developed into a biocontrol agent for use in agriculture. Here we report the whole-genome shotgun sequence of the strain. The genome size of B. amyloliquefaciens TF28 is 3,987,635 bp which consists of 3754 protein-coding genes, 65 tandem repeat sequences, 47 minisatellite DNA, 2 microsatellite DNA, 63 tRNA, 7rRNA, 6 sRNA, 3 prophage and CRISPR domains.

  10. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

    OpenAIRE

    Ruibang Luo; Yiu-Lun Wong; Wai-Chun Law; Lap-Kei Lee; Jeanno Cheung; Chi-Man Liu; Tak-Wah Lam

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome ...

  11. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome.

    Science.gov (United States)

    Chapman, Jarrod A; Mascher, Martin; Buluç, Aydın; Barry, Kerrie; Georganas, Evangelos; Session, Adam; Strnadova, Veronika; Jenkins, Jerry; Sehgal, Sunish; Oliker, Leonid; Schmutz, Jeremy; Yelick, Katherine A; Scholz, Uwe; Waugh, Robbie; Poland, Jesse A; Muehlbauer, Gary J; Stein, Nils; Rokhsar, Daniel S

    2015-01-31

    Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

  12. Whole-genome sequencing for identification of the source in hospital-acquired Legionnaires' disease

    DEFF Research Database (Denmark)

    Rosendahl Madsen, A M; Holm, A; Jensen, T G

    2017-01-01

    -genome sequencing to identify the source of infection in hospital-acquired Legionnaires' disease. Phylogenetic analyses showed close relatedness between one patient isolate and a strain found in hospital water, confirming suspicion of nosocomial infection. It was found that whole-genome sequencing can be a useful......Acquisition of Legionnaires' disease is a serious complication of hospitalization. Rapid determination of whether or not the infection is caused by strains of Legionella pneumophila in the hospital environment is crucial to avoid further cases. This study investigated the use of whole...

  13. Whole genome sequencing as a means to assess pathogenic mutations in medical genetics and cancer.

    Science.gov (United States)

    Royer-Bertrand, Beryl; Rivolta, Carlo

    2015-04-01

    The past decade has seen the emergence of next-generation sequencing (NGS) technologies, which have revolutionized the field of human molecular genetics. With NGS, significant portions of the human genome can now be assessed by direct sequence analysis, highlighting normal and pathological variants of our DNA. Recent advances have also allowed the sequencing of complete genomes, by a method referred to as whole genome sequencing (WGS). In this work, we review the use of WGS in medical genetics, with specific emphasis on the benefits and the disadvantages of this technique for detecting genomic alterations leading to Mendelian human diseases and to cancer.

  14. A green-cotyledon/stay-green mutant exemplifies the ancient whole-genome duplications in soybean.

    Science.gov (United States)

    Nakano, Michiharu; Yamada, Tetsuya; Masuda, Yu; Sato, Yutaka; Kobayashi, Hideki; Ueda, Hiroaki; Morita, Ryouhei; Nishimura, Minoru; Kitamura, Keisuke; Kusaba, Makoto

    2014-10-01

    The recent whole-genome sequencing of soybean (Glycine max) revealed that soybean experienced whole-genome duplications 59 million and 13 million years ago, and it has an octoploid-like genome in spite of its diploid nature. We analyzed a natural green-cotyledon mutant line, Tenshin-daiseitou. The physiological analysis revealed that Tenshin-daiseitou shows a non-functional stay-green phenotype in senescent leaves, which is similar to that of the mutant of Mendel's green-cotyledon gene I, the ortholog of SGR in pea. The identification of gene mutations and genetic segregation analysis suggested that defects in GmSGR1 and GmSGR2 were responsible for the green-cotyledon/stay-green phenotype of Tenshin-daiseitou, which was confirmed by RNA interference (RNAi) transgenic soybean experiments using GmSGR genes. The characterized green-cotyledon double mutant d1d2 was found to have the same mutations, suggesting that GmSGR1 and GmSGR2 are D1 and D2. Among the examined d1d2 strains, the d1d2 strain K144a showed a lower Chl a/b ratio in mature seeds than other strains but not in senescent leaves, suggesting a seed-specific genetic factor of the Chl composition in K144a. Analysis of the soybean genome sequence revealed four genomic regions with microsynteny to the Arabidopsis SGR1 region, which included the GmSGR1 and GmSGR2 regions. The other two regions contained GmSGR3a/GmSGR3b and GmSGR4, respectively, which might be pseudogenes or genes with a function that is unrelated to Chl degradation during seed maturation and leaf senescence. These GmSGR genes were thought to be produced by the two whole-genome duplications, and they provide a good example of such whole-genome duplication events in the evolution of the soybean genome. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  15. Identification of emergent blaCMY-2-carrying Proteus mirabilis lineages by whole-genome sequencing

    Directory of Open Access Journals (Sweden)

    M. Mac Aogáin

    2016-01-01

    Full Text Available Whole-genome sequencing of 24 Proteus mirabilis isolates revealed the clonal expansion of two cefoxitin-resistant strains among patients with community-onset infection. These strains harboured blaCMY-2 within a chromosomally located integrative and conjugative element and exhibited multidrug resistance phenotypes. A predominant strain, identified in 18 patients, also harboured the PGI-1 genomic island and associated resistance genes, accounting for its broader antibiotic resistance profile. The identification of these novel multidrug-resistant strains among community-onset infections suggests that they are endemic to this region and represent emergent P. mirabilis lineages of clinical significance.

  16. Whole genome sequencing of a rare rotavirus from archived stool sample demonstrates independent zoonotic origin of human G8P[14] strains in Hungary.

    Science.gov (United States)

    Marton, Szilvia; Dóró, Renáta; Fehér, Enikő; Forró, Barbara; Ihász, Katalin; Varga-Kugler, Renáta; Farkas, Szilvia L; Bányai, Krisztián

    2017-01-02

    Genotype P[14] rotaviruses in humans are thought to be zoonotic strains originating from bovine or ovine host species. Over the past 30 years only few genotype P[14] strains were identified in Hungary totaling<0.1% of all human rotaviruses whose genotype had been determined. In this study we report the genome sequence and phylogenetic analysis of a human genotype G8P[14] strain, RVA/Human-wt/HUN/182-02/2001/G8P[14]. The whole genome constellation (G8-P[14]-I2-R2-C2-M2-A11-N2-T6-E2-H3) of this strain was shared with another Hungarian zoonotic G8P[14] strain, RVA/Human-wt/HUN/BP1062/2004/G8P[14], although phylogenetic analyses revealed the two rotaviruses likely had different progenitors. Overall, our findings indicate that human G8P[14] rotavirus detected in Hungary in the past originated from independent zoonotic events. Further studies are needed to assess the public health risk associated with infections by various animal rotavirus strains. Copyright © 2016. Published by Elsevier B.V.

  17. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee.

    Science.gov (United States)

    Ellington, M J; Ekelund, O; Aarestrup, F M; Canton, R; Doumith, M; Giske, C; Grundman, H; Hasman, H; Holden, M T G; Hopkins, K L; Iredell, J; Kahlmeter, G; Köser, C U; MacGowan, A; Mevius, D; Mulvey, M; Naas, T; Peto, T; Rolain, J-M; Samuelsen, Ø; Woodford, N

    2017-01-01

    Whole genome sequencing (WGS) offers the potential to predict antimicrobial susceptibility from a single assay. The European Committee on Antimicrobial Susceptibility Testing established a subcommittee to review the current development status of WGS for bacterial antimicrobial susceptibility testing (AST). The published evidence for using WGS as a tool to infer antimicrobial susceptibility accurately is currently either poor or non-existent and the evidence / knowledge base requires significant expansion. The primary comparators for assessing genotypic-phenotypic concordance from WGS data should be changed to epidemiological cut-off values in order to improve differentiation of wild-type from non-wild-type isolates (harbouring an acquired resistance). Clinical breakpoints should be a secondary comparator. This assessment will reveal whether genetic predictions could also be used to guide clinical decision making. Internationally agreed principles and quality control (QC) metrics will facilitate early harmonization of analytical approaches and interpretive criteria for WGS-based predictive AST. Only data sets that pass agreed QC metrics should be used in AST predictions. Minimum performance standards should exist and comparative accuracies across different WGS laboratories and processes should be measured. To facilitate comparisons, a single public database of all known resistance loci should be established, regularly updated and strictly curated using minimum standards for the inclusion of resistance loci. For most bacterial species the major limitations to widespread adoption for WGS-based AST in clinical laboratories remain the current high-cost and limited speed of inferring antimicrobial susceptibility from WGS data as well as the dependency on previous culture because analysis directly on specimens remains challenging. For most bacterial species there is currently insufficient evidence to support the use of WGS-inferred AST to guide clinical decision making

  18. Exploring the areas of applicability of whole-genome prediction methods for Asian rice (Oryza sativa L.).

    Science.gov (United States)

    Onogi, Akio; Ideta, Osamu; Inoshita, Yuto; Ebana, Kaworu; Yoshioka, Takuma; Yamasaki, Masanori; Iwata, Hiroyoshi

    2015-01-01

    Our simulation results clarify the areas of applicability of nine prediction methods and suggest the factors that affect their accuracy at predicting empirical traits. Whole-genome prediction is used to predict genetic value from genome-wide markers. The choice of method is important for successful prediction. We compared nine methods using empirical data for eight phenological and morphological traits of Asian rice cultivars (Oryza sativa L.) and data simulated from real marker genotype data. The methods were genomic BLUP (GBLUP), reproducing kernel Hilbert spaces regression (RKHS), Lasso, elastic net, random forest (RForest), Bayesian lasso (Blasso), extended Bayesian lasso (EBlasso), weighted Bayesian shrinkage regression (wBSR), and the average of all methods (Ave). The objectives were to evaluate the predictive ability of these methods in a cultivar population, to characterize them by exploring the area of applicability of each method using simulation, and to investigate the causes of their different accuracies for empirical traits. GBLUP was the most accurate for one trait, RKHS and Ave for two, and RForest for three traits. In the simulation, Blasso, EBlasso, and Ave showed stable performance across the simulated scenarios, whereas the other methods, except wBSR, had specific areas of applicability; wBSR performed poorly in most scenarios. For each method, the accuracy ranking for the empirical traits was largely consistent with that in one of the simulated scenarios, suggesting that the simulation conditions reflected the factors that affected the method accuracy for the empirical results. This study will be useful for genomic prediction not only in Asian rice, but also in populations from other crops with relatively small training sets and strong linkage disequilibrium structures.

  19. Population and Whole Genome Sequence Based Characterization of Invasive Group A Streptococci Recovered in the United States during 2015

    Directory of Open Access Journals (Sweden)

    Sopio Chochua

    2017-09-01

    Full Text Available Group A streptococci (GAS are genetically diverse. Determination of strain features can reveal associations with disease and resistance and assist in vaccine formulation. We employed whole-genome sequence (WGS-based characterization of 1,454 invasive GAS isolates recovered in 2015 by Active Bacterial Core Surveillance and performed conventional antimicrobial susceptibility testing. Predictions were made for genotype, GAS carbohydrate, antimicrobial resistance, surface proteins (M family, fibronectin binding, T, R28, secreted virulence proteins (Sda1, Sic, exotoxins, hyaluronate capsule, and an upregulated nga operon (encodes NADase and streptolysin O promoter (Pnga3. Sixty-four M protein gene (emm types were identified among 69 clonal complexes (CCs, including one CC of Streptococcus dysgalactiae subsp. equisimilis. emm types predicted the presence or absence of active sof determinants and were segregated into sof-positive or sof-negative genetic complexes. Only one “emm type switch” between strains was apparent. sof-negative strains showed a propensity to cause infections in the first quarter of the year, while sof+ strain infections were more likely in summer. Of 1,454 isolates, 808 (55.6% were Pnga3 positive and 637 (78.9% were accounted for by types emm1, emm89, and emm12. Theoretical coverage of a 30-valent M vaccine combined with an M-related protein (Mrp vaccine encompassed 98% of the isolates. WGS data predicted that 15.3, 13.8, 12.7, and 0.6% of the isolates were nonsusceptible to tetracycline, erythromycin plus clindamycin, erythromycin, and fluoroquinolones, respectively, with only 19 discordant phenotypic results. Close phylogenetic clustering of emm59 isolates was consistent with recent regional emergence. This study revealed strain traits informative for GAS disease incidence tracking, outbreak detection, vaccine strategy, and antimicrobial therapy.

  20. Whole-genome expression analyses of type 2 diabetes in human skin reveal altered immune function and burden of infection.

    Science.gov (United States)

    Wu, Chun; Chen, Xiaopan; Shu, Jing; Lee, Chun-Ting

    2017-05-23

    Skin disorders are among most common complications associated with type 2 diabetes mellitus (T2DM). Although T2DM patients are known to have increased risk of infections and other T2DM-related skin disorders, their molecular mechanisms are largely unknown. This study aims to identify dysregulated genes and gene networks that are associated with T2DM in human skin. We compared the expression profiles of 56,318 transcribed genes on 74 T2DM cases and 148 gender- age-, and race-matched non-diabetes controls from the Genotype-Tissue Expression (GTEx) database. RNA-Sequencing data indicates that diabetic skin is characterized by increased expression of genes that are related to immune responses (CCL20, CXCL9, CXCL10, CXCL11, CXCL13, and CCL18), JAK/STAT signaling pathway (JAK3, STAT1, and STAT2), tumor necrosis factor superfamily (TNFSF10 and TNFSF15), and infectious disease pathways (OAS1, OAS2, OAS3, and IFIH1). Genes in cell adhesion molecules pathway (NCAM1 and L1CAM) and collagen family (PCOLCE2 and COL9A3) are downregulated, suggesting structural changes in the skin of T2DM. For the first time, to the best of our knowledge, this pioneer analytic study reports comprehensive unbiased gene expression changes and dysregulated pathways in the non-diseased skin of T2DM patients. This comprehensive understanding derived from whole-genome expression profiles could advance our knowledge in determining molecular targets for the prevention and treatment of T2DM-associated skin disorders.

  1. Rediscovery by Whole Genome Sequencing: Classical Mutations and Genome Polymorphisms in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    McCluskey, Kevin; Wiest, Aric E.; Grigoriev, Igor V.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Baker, Scott E.

    2011-06-02

    Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.

  2. Comparative whole genome sequence analysis of wild-type and cidofovir-resistant monkeypoxvirus

    Directory of Open Access Journals (Sweden)

    Huggins John

    2010-05-01

    Full Text Available Abstract We performed whole genome sequencing of a cidofovir {[(S-1-(3-hydroxy-2-phosphonylmethoxy-propyl cytosine] [HPMPC]}-resistant (CDV-R strain of Monkeypoxvirus (MPV. Whole-genome comparison with the wild-type (WT strain revealed 55 single-nucleotide polymorphisms (SNPs and one tandem-repeat contraction. Over one-third of all identified SNPs were located within genes comprising the poxvirus replication complex, including the DNA polymerase, RNA polymerase, mRNA capping methyltransferase, DNA processivity factor, and poly-A polymerase. Four polymorphic sites were found within the DNA polymerase gene. DNA polymerase mutations observed at positions 314 and 684 in MPV were consistent with CDV-R loci previously identified in Vaccinia virus (VACV. These data suggest the mechanism of CDV resistance may be highly conserved across Orthopoxvirus (OPV species. SNPs were also identified within virulence genes such as the A-type inclusion protein, serine protease inhibitor-like protein SPI-3, Schlafen ATPase and thymidylate kinase, among others. Aberrant chain extension induced by CDV may lead to diverse alterations in gene expression and viral replication that may result in both adaptive and attenuating mutations. Defining the potential contribution of substitutions in the replication complex and RNA processing machinery reported here may yield further insight into CDV resistance and may augment current therapeutic development strategies.

  3. HAPLOWSER: a whole-genome haplotype browser for personal genome and metagenome.

    Science.gov (United States)

    Kim, Jong Hyun; Kim, Woo-Cheol; Waterman, Michael S; Park, Sanghyun; Li, Lei M

    2009-09-15

    Haplotype assembly is becoming a very important tool in genome sequencing of human and other organisms. Although haplotypes were previously inferred from genome assemblies, there has never been a comparative haplotype browser that depicts a global picture of whole-genome alignments among haplotypes of different organisms. We introduce a whole-genome HAPLotype brOWSER (HAPLOWSER), providing evolutionary perspectives from multiple aligned haplotypes and functional annotations. Haplowser enables the comparison of haplotypes from metagenomes, and associates conserved regions or the bases at the conserved regions with functional annotations and custom tracks. The associations are quantified for further analysis and presented as pie charts. Functional annotations and custom tracks that are projected onto haplotypes are saved as multiple files in FASTA format. Haplowser provides a user-friendly interface, and can display alignments of haplotypes with functional annotations at any resolution. Haplowser, written in Java, supports multiple platforms including Windows and Linux. Haplowser is publicly available at http://embio.yonsei.ac.kr/haplowser .

  4. Rapid construction of a whole-genome transposon insertion collection for Shewanella oneidensis by Knockout Sudoku

    Science.gov (United States)

    Baym, Michael; Shaket, Lev; Anzai, Isao A.; Adesina, Oluwakemi; Barstow, Buz

    2016-01-01

    Whole-genome knockout collections are invaluable for connecting gene sequence to function, yet traditionally, their construction has required an extraordinary technical effort. Here we report a method for the construction and purification of a curated whole-genome collection of single-gene transposon disruption mutants termed Knockout Sudoku. Using simple combinatorial pooling, a highly oversampled collection of mutants is condensed into a next-generation sequencing library in a single day, a 30- to 100-fold improvement over prior methods. The identities of the mutants in the collection are then solved by a probabilistic algorithm that uses internal self-consistency within the sequencing data set, followed by rapid algorithmically guided condensation to a minimal representative set of mutants, validation, and curation. Starting from a progenitor collection of 39,918 mutants, we compile a quality-controlled knockout collection of the electroactive microbe Shewanella oneidensis MR-1 containing representatives for 3,667 genes that is functionally validated by high-throughput kinetic measurements of quinone reduction. PMID:27830751

  5. Kernel-based whole-genome prediction of complex traits: a review

    Directory of Open Access Journals (Sweden)

    Gota eMorota

    2014-10-01

    Full Text Available Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways, thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  6. Independent Evolution of Winner Traits without Whole Genome Duplication in Dekkera Yeasts.

    Science.gov (United States)

    Guo, Yi-Cheng; Zhang, Lin; Dai, Shao-Xing; Li, Wen-Xing; Zheng, Jun-Juan; Li, Gong-Hua; Huang, Jing-Fei

    2016-01-01

    Dekkera yeasts have often been considered as alternative sources of ethanol production that could compete with S. cerevisiae. The two lineages of yeasts independently evolved traits that include high glucose and ethanol tolerance, aerobic fermentation, and a rapid ethanol fermentation rate. The Saccharomyces yeasts attained these traits mainly through whole genome duplication approximately 100 million years ago (Mya). However, the Dekkera yeasts, which were separated from S. cerevisiae approximately 200 Mya, did not undergo whole genome duplication (WGD) but still occupy a niche similar to S. cerevisiae. Upon analysis of two Dekkera yeasts and five closely related non-WGD yeasts, we found that a massive loss of cis-regulatory elements occurred in an ancestor of the Dekkera yeasts, which led to improved mitochondrial functions similar to the S. cerevisiae yeasts. The evolutionary analysis indicated that genes involved in the transcription and translation process exhibited faster evolution in the Dekkera yeasts. We detected 90 positively selected genes, suggesting that the Dekkera yeasts evolved an efficient translation system to facilitate adaptive evolution. Moreover, we identified that 12 vacuolar H+-ATPase (V-ATPase) function genes that were under positive selection, which assists in developing tolerance to high alcohol and high sugar stress. We also revealed that the enzyme PGK1 is responsible for the increased rate of glycolysis in the Dekkera yeasts. These results provide important insights to understand the independent adaptive evolution of the Dekkera yeasts and provide tools for genetic modification promoting industrial usage.

  7. Independent Evolution of Winner Traits without Whole Genome Duplication in Dekkera Yeasts.

    Directory of Open Access Journals (Sweden)

    Yi-Cheng Guo

    Full Text Available Dekkera yeasts have often been considered as alternative sources of ethanol production that could compete with S. cerevisiae. The two lineages of yeasts independently evolved traits that include high glucose and ethanol tolerance, aerobic fermentation, and a rapid ethanol fermentation rate. The Saccharomyces yeasts attained these traits mainly through whole genome duplication approximately 100 million years ago (Mya. However, the Dekkera yeasts, which were separated from S. cerevisiae approximately 200 Mya, did not undergo whole genome duplication (WGD but still occupy a niche similar to S. cerevisiae. Upon analysis of two Dekkera yeasts and five closely related non-WGD yeasts, we found that a massive loss of cis-regulatory elements occurred in an ancestor of the Dekkera yeasts, which led to improved mitochondrial functions similar to the S. cerevisiae yeasts. The evolutionary analysis indicated that genes involved in the transcription and translation process exhibited faster evolution in the Dekkera yeasts. We detected 90 positively selected genes, suggesting that the Dekkera yeasts evolved an efficient translation system to facilitate adaptive evolution. Moreover, we identified that 12 vacuolar H+-ATPase (V-ATPase function genes that were under positive selection, which assists in developing tolerance to high alcohol and high sugar stress. We also revealed that the enzyme PGK1 is responsible for the increased rate of glycolysis in the Dekkera yeasts. These results provide important insights to understand the independent adaptive evolution of the Dekkera yeasts and provide tools for genetic modification promoting industrial usage.

  8. Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

    Science.gov (United States)

    Robins-Browne, Roy M; Holt, Kathryn E; Ingle, Danielle J; Hocking, Dianna M; Yang, Ji; Tauschek, Marija

    2016-01-01

    The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E.coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods.

  9. Molecular characterization of avian polyomavirus isolated from psittacine birds based on the whole genome sequence analysis.

    Science.gov (United States)

    Katoh, Hiroshi; Ohya, Kenji; Une, Yumi; Yamaguchi, Tsuyoshi; Fukushi, Hideto

    2009-07-02

    Seven avian polyomaviruses (APVs) were isolated from seven psittacine birds of four species. Their whole genome sequences were genetically analyzed. Comparing with the sequence of BFDV1 strain, nucleotide substitutions in the sequences of seven APV isolates were found at 63 loci and a high level of conservation of amino acid sequence in each viral protein (VP1, VP2, VP3, VP4, and t/T antigen) was predicted. An A-to-T nucleotide substitution was observed in non-control region of all seven APV sequences in comparison with BFDV1 strain. Two C-to-T nucleotide substitutions were also detected in non-coding regions of one isolate. A phylogenetic analysis of the whole genome sequences indicated that the sequences from the same species of bird were closely related. APV has been reported to have distinct tropism for cell cultures of various avian species. The present study indicated that a single amino acid substitution at position 221 in VP2 was essential for propagating in chicken embryonic fibroblast culture and this substitution was promoted by propagation on budgerigar embryonic fibroblast culture. For two isolates, three serial amino acids appeared to be deleted in VP4. However, this deletion had little effect on virus propagation.

  10. Analysis of a Streptococcus pyogenes puerperal sepsis cluster by use of whole-genome sequencing.

    Science.gov (United States)

    Ben Zakour, Nouri L; Venturini, Carola; Beatson, Scott A; Walker, Mark J

    2012-07-01

    Between June and November 2010, a concerning rise in the number of cases of puerperal sepsis, a postpartum pelvic bacterial infection contracted by women after childbirth, was observed in the New South Wales, Australia, hospital system. Group A streptococcus (GAS; Streptococcus pyogenes) isolates PS001 to PS011 were recovered from nine patients. Pulsed-field gel electrophoresis and emm sequence typing revealed that GAS of emm1.40, emm75.0, emm77.0, emm89.0, and emm89.9 were each recovered from a single patient, ruling out a single source of infection. However, emm28.8 GAS were recovered from four different patients. To investigate the relatedness of these emm28 isolates, whole-genome sequencing was undertaken and the genome sequences were compared to the genome sequence of the emm28.4 reference strain, MGAS6180. A total of 186 single nucleotide polymorphisms were identified, for which the phylogenetic reconstruction indicated an outbreak of a polyclonal nature. While two isolates collected from different hospitals were not closely related, isolates from two puerperal sepsis patients from the same hospital were indistinguishable, suggesting patient-to-patient transmission or infection from a common source. The results of this study indicate that traditional typing protocols, such as pulsed-field gel electrophoresis, may not be sensitive enough to allow fine epidemiological discrimination of closely related bacterial isolates. Whole-genome sequencing presents a valid alternative that allows accurate fine-scale epidemiological investigation of bacterial infectious disease.

  11. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    Science.gov (United States)

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  12. Pathway Processor: A Tool for Integrating Whole-Genome Expression Results into Metabolic Networks

    Science.gov (United States)

    Grosu, Paul; Townsend, Jeffrey P.; Hartl, Daniel L.; Cavalieri, Duccio

    2002-01-01

    We have developed a new tool to visualize expression data on metabolic pathways and to evaluate which metabolic pathways are most affected by transcriptional changes in whole-genome expression experiments. Using the Fisher Exact Test, the method scores biochemical pathways according to the probability that as many or more genes in a pathway would be significantly altered in a given experiment by chance alone. This method has been validated on diauxic shift experiments and reproduces well known effects of carbon source on yeast metabolism. The analysis is implemented with Pathway Analyzer, one of the tools of Pathway Processor, a new statistical package for the analysis of whole-genome expression data. Results from multiple experiments can be compared, reducing the analysis from the full set of individual genes to a limited number of pathways of interest. The pathways are visualized with OpenDX, an open-source visualization software package, and the relationship between genes in the pathways can be examined in detail using Expression Mapper, the second program of the package. This program features a graphical output displaying differences in expression on metabolic charts of the biochemical pathways to which the open reading frames are assigned. [Supplementary materials are available at http://www.cgr.harvard.edu/cavalieri/pp.html and http://www.genome.org.] PMID:12097350

  13. Comparison of microbial DNA enrichment tools for metagenomic whole genome sequencing.

    Science.gov (United States)

    Thoendel, Matthew; Jeraldo, Patricio R; Greenwood-Quaintance, Kerryl E; Yao, Janet Z; Chia, Nicholas; Hanssen, Arlen D; Abdel, Matthew P; Patel, Robin

    2016-08-01

    Metagenomic whole genome sequencing for detection of pathogens in clinical samples is an exciting new area for discovery and clinical testing. A major barrier to this approach is the overwhelming ratio of human to pathogen DNA in samples with low pathogen abundance, which is typical of most clinical specimens. Microbial DNA enrichment methods offer the potential to relieve this limitation by improving this ratio. Two commercially available enrichment kits, the NEBNext Microbiome DNA Enrichment Kit and the Molzym MolYsis Basic kit, were tested for their ability to enrich for microbial DNA from resected arthroplasty component sonicate fluids from prosthetic joint infections or uninfected sonicate fluids spiked with Staphylococcus aureus. Using spiked uninfected sonicate fluid there was a 6-fold enrichment of bacterial DNA with the NEBNext kit and 76-fold enrichment with the MolYsis kit. Metagenomic whole genome sequencing of sonicate fluid revealed 13- to 85-fold enrichment of bacterial DNA using the NEBNext enrichment kit. The MolYsis approach achieved 481- to 9580-fold enrichment, resulting in 7 to 59% of sequencing reads being from the pathogens known to be present in the samples. These results demonstrate the usefulness of these tools when testing clinical samples with low microbial burden using next generation sequencing. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Expansion by whole genome duplication and evolution of the sox gene family in teleost fish.

    Science.gov (United States)

    Voldoire, Emilien; Brunet, Frédéric; Naville, Magali; Volff, Jean-Nicolas; Galiana, Delphine

    2017-01-01

    It is now recognized that several rounds of whole genome duplication (WGD) have occurred during the evolution of vertebrates, but the link between WGDs and phenotypic diversification remains unsolved. We have investigated in this study the impact of the teleost-specific WGD on the evolution of the sox gene family in teleostean fishes. The sox gene family, which encodes for transcription factors, has essential role in morphology, physiology and behavior of vertebrates and teleosts, the current largest group of vertebrates. We have first redrawn the evolution of all sox genes identified in eleven teleost genomes using a comparative genomic approach including phylogenetic and synteny analyses. We noticed, compared to tetrapods, an important expansion of the sox family: 58% (11/19) of sox genes are duplicated in teleost genomes. Furthermore, all duplicated sox genes, except sox17 paralogs, are derived from the teleost-specific WGD. Then, focusing on five sox genes, analyzing the evolution of coding and non-coding sequences, as well as the expression patterns in fish embryos and adult tissues, we demonstrated that these paralogs followed lineage-specific evolutionary trajectories in teleost genomes. This work, based on whole genome data from multiple teleostean species, supports the contribution of WGDs to the expansion of gene families, as well as to the emergence of genomic differences between lineages that might promote genetic and phenotypic diversity in teleosts.

  15. A comprehensive whole-genome integrated cytogenetic map for the alpaca (Lama pacos).

    Science.gov (United States)

    Avila, Felipe; Baily, Malorie P; Perelman, Polina; Das, Pranab J; Pontius, Joan; Chowdhary, Renuka; Owens, Elaine; Johnson, Warren E; Merriwether, David A; Raudsepp, Terje

    2014-01-01

    Genome analysis of the alpaca (Lama pacos, LPA) has progressed slowly compared to other domestic species. Here, we report the development of the first comprehensive whole-genome integrated cytogenetic map for the alpaca using fluorescence in situ hybridization (FISH) and CHORI-246 BAC library clones. The map is comprised of 230 linearly ordered markers distributed among all 36 alpaca autosomes and the sex chromosomes. For the first time, markers were assigned to LPA14, 21, 22, 28, and 36. Additionally, 86 genes from 15 alpaca chromosomes were mapped in the dromedary camel (Camelus dromedarius, CDR), demonstrating exceptional synteny and linkage conservation between the 2 camelid genomes. Cytogenetic mapping of 191 protein-coding genes improved and refined the known Zoo-FISH homologies between camelids and humans: we discovered new homologous synteny blocks (HSBs) corresponding to HSA1-LPA/CDR11, HSA4-LPA/CDR31 and HSA7-LPA/CDR36, and revised the location of breakpoints for others. Overall, gene mapping was in good agreement with the Zoo-FISH and revealed remarkable evolutionary conservation of gene order within many human-camelid HSBs. Most importantly, 91 FISH-mapped markers effectively integrated the alpaca whole-genome sequence and the radiation hybrid maps with physical chromosomes, thus facilitating the improvement of the sequence assembly and the discovery of genes of biological importance. © 2015 S. Karger AG, Basel.

  16. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  17. Integration of transcriptome and whole genomic resequencing data to identify key genes affecting swine fat deposition.

    Directory of Open Access Journals (Sweden)

    Kai Xing

    Full Text Available Fat deposition is highly correlated with the growth, meat quality, reproductive performance and immunity of pigs. Fatty acid synthesis takes place mainly in the adipose tissue of pigs; therefore, in this study, a high-throughput massively parallel sequencing approach was used to generate adipose tissue transcriptomes from two groups of Songliao black pigs that had opposite backfat thickness phenotypes. The total number of paired-end reads produced for each sample was in the range of 39.29-49.36 millions. Approximately 188 genes were differentially expressed in adipose tissue and were enriched for metabolic processes, such as fatty acid biosynthesis, lipid synthesis, metabolism of fatty acids, etinol, caffeine and arachidonic acid and immunity. Additionally, many genetic variations were detected between the two groups through pooled whole-genome resequencing. Integration of transcriptome and whole-genome resequencing data revealed important genomic variations among the differentially expressed genes for fat deposition, for example, the lipogenic genes. Further studies are required to investigate the roles of candidate genes in fat deposition to improve pig breeding programs.

  18. Whole-genome sequence comparison as a method for improving bacterial species definition.

    Science.gov (United States)

    Zhang, Wen; Du, Pengcheng; Zheng, Han; Yu, Weiwen; Wan, Li; Chen, Chen

    2014-01-01

    We compared pairs of 1,226 bacterial strains with whole genome sequences and calculated their average nucleotide identity (ANI) between genomes to determine whether whole genome comparison can be directly used for bacterial species definition. We found that genome comparisons of two bacterial strains from the same species (SGC) have a significantly higher ANI than those of two strains from different species (DGC), and that the ANI between the query and the reference genomes can be used to determine whether two genomes come from the same species. Bacterial species definition based on ANI with a cut-off value of 0.92 matched well (81.5%) with the current bacterial species definition. The ANI value was shown to be consistent with the standard for traditional bacterial species definition, and it could be used in bacterial taxonomy for species definition. A new bioinformatics program (ANItools) was also provided in this study for users to obtain the ANI value of any two bacterial genome pairs (http://genome.bioinfo-icdc.org/). This program can match a query strain to all bacterial genomes, and identify the highest ANI value of the strain at the species, genus and family levels respectively, providing valuable insights for species definition.

  19. Systematic evaluation of bias in microbial community profiles induced by whole genome amplification.

    Science.gov (United States)

    Direito, Susana O L; Zaura, Egija; Little, Miranda; Ehrenfreund, Pascale; Röling, Wilfred F M

    2014-03-01

    Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used [multiple displacement amplification (MDA)] and one new primer-free method [primase-based whole genome amplification (pWGA)] were compared using a polymerase chain reaction (PCR)-based method as control. Pyrosequencing of an environmental sample and principal component analysis revealed that MDA impacted community profiles more strongly than pWGA and indicated that this related to species GC content, although an influence of DNA integrity could not be excluded. Subsequently, biases by species GC content, DNA integrity and fragment size were separately analysed using defined mixtures of DNA from various species. We found significantly less amplification of species with the highest GC content for MDA-based templates and, to a lesser extent, for pWGA. DNA fragmentation also interfered severely: species with more fragmented DNA were less amplified with MDA and pWGA. pWGA was unable to amplify low molecular weight DNA (< 1.5 kb), whereas MDA was inefficient. We conclude that pWGA is the most promising method for characterization of microbial communities in low-biomass environments and for currently planned astrobiological missions to Mars. © 2013 Society for Applied Microbiology and John Wiley & Sons Ltd.

  20. Whole genome analysis of linezolid resistance in Streptococcus pneumoniae reveals resistance and compensatory mutations

    Directory of Open Access Journals (Sweden)

    Légaré Danielle

    2011-10-01

    Full Text Available Abstract Background Several mutations were present in the genome of Streptococcus pneumoniae linezolid-resistant strains but the role of several of these mutations had not been experimentally tested. To analyze the role of these mutations, we reconstituted resistance by serial whole genome transformation of a novel resistant isolate into two strains with sensitive background. We sequenced the parent mutant and two independent transformants exhibiting similar minimum inhibitory concentration to linezolid. Results Comparative genomic analyses revealed that transformants acquired G2576T transversions in every gene copy of 23S rRNA and that the number of altered copies correlated with the level of linezolid resistance and cross-resistance to florfenicol and chloramphenicol. One of the transformants also acquired a mutation present in the parent mutant leading to the overexpression of an ABC transporter (spr1021. The acquisition of these mutations conferred a fitness cost however, which was further enhanced by the acquisition of a mutation in a RNA methyltransferase implicated in resistance. Interestingly, the fitness of the transformants could be restored in part by the acquisition of altered copies of the L3 and L16 ribosomal proteins and by mutations leading to the overexpression of the spr1887 ABC transporter that were present in the original linezolid-resistant mutant. Conclusions Our results demonstrate the usefulness of whole genome approaches at detecting major determinants of resistance as well as compensatory mutations that alleviate the fitness cost associated with resistance.

  1. Computel: computation of mean telomere length from whole-genome next-generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Lilit Nersisyan

    Full Text Available Telomeres are the ends of eukaryotic chromosomes, consisting of consecutive short repeats that protect chromosome ends from degradation. Telomeres shorten with each cell division, leading to replicative cell senescence. Deregulation of telomere length homeostasis is associated with the development of various age-related diseases and cancers. A number of experimental techniques exist for telomere length measurement; however, until recently, the absence of tools for extracting telomere lengths from high-throughput sequencing data has significantly obscured the association of telomere length with molecular processes in normal and diseased conditions. We have developed Computel, a program in R for computing mean telomere length from whole-genome next-generation sequencing data. Computel is open source, and is freely available at https://github.com/lilit-nersisyan/computel. It utilizes a short-read alignment-based approach and integrates various popular tools for sequencing data analysis. We validated it with synthetic and experimental data, and compared its performance with the previously available software. The results have shown that Computel outperforms existing software in accuracy, independence of results from sequencing conditions, stability against inherent sequencing errors, and better ability to distinguish pure telomeric sequences from interstitial telomeric repeats. By providing a highly reliable methodology for determining telomere lengths from whole-genome sequencing data, Computel should help to elucidate the role of telomeres in cellular health and disease.

  2. Epigenetic regulation of subgenome dominance following whole genome triplication in Brassica rapa.

    Science.gov (United States)

    Cheng, Feng; Sun, Chao; Wu, Jian; Schnable, James; Woodhouse, Margaret R; Liang, Jianli; Cai, Chengcheng; Freeling, Michael; Wang, Xiaowu

    2016-07-01

    Subgenome dominance is an important phenomenon observed in allopolyploids after whole genome duplication, in which one subgenome retains more genes as well as contributes more to the higher expressing gene copy of paralogous genes. To dissect the mechanism of subgenome dominance, we systematically investigated the relationships of gene expression, transposable element (TE) distribution and small RNA targeting, relating to the multicopy paralogous genes generated from whole genome triplication in Brassica rapa. The subgenome dominance was found to be regulated by a relatively stable factor established previously, then inherited by and shared among B. rapa varieties. In addition, we found a biased distribution of TEs between flanking regions of paralogous genes. Furthermore, the 24-nt small RNAs target TEs and are negatively correlated to the dominant expression of individual paralogous gene pairs. The biased distribution of TEs among subgenomes and the targeting of 24-nt small RNAs together produce the dominant expression phenomenon at a subgenome scale. Based on these findings, we propose a bucket hypothesis to illustrate subgenome dominance and hybrid vigor. Our findings and hypothesis are valuable for the evolutionary study of polyploids, and may shed light on studies of hybrid vigor, which is common to most species. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  3. Kernel-based whole-genome prediction of complex traits: a review

    Science.gov (United States)

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics. PMID:25360145

  4. Whole-Genome Transcriptional Analysis of Chemolithoautotrophic Thiosulfate Oxidation by Thiobacillus denitrificans Under Aerobic vs. Denitrifying Conditions

    Energy Technology Data Exchange (ETDEWEB)

    Beller, H R; Letain, T E; Chakicherla, A; Kane, S R; Legler, T C; Coleman, M A

    2006-04-22

    Thiobacillus denitrificans is one of the few known obligate chemolithoautotrophic bacteria capable of energetically coupling thiosulfate oxidation to denitrification as well as aerobic respiration. As very little is known about the differential expression of genes associated with ke chemolithoautotrophic functions (such as sulfur-compound oxidation and CO2 fixation) under aerobic versus denitrifying conditions, we conducted whole-genome, cDNA microarray studies to explore this topic systematically. The microarrays identified 277 genes (approximately ten percent of the genome) as differentially expressed using Robust Multi-array Average statistical analysis and a 2-fold cutoff. Genes upregulated (ca. 6- to 150-fold) under aerobic conditions included a cluster of genes associated with iron acquisition (e.g., siderophore-related genes), a cluster of cytochrome cbb3 oxidase genes, cbbL and cbbS (encoding the large and small subunits of form I ribulose 1,5-bisphosphate carboxylase/oxygenase, or RubisCO), and multiple molecular chaperone genes. Genes upregulated (ca. 4- to 95-fold) under denitrifying conditions included nar, nir, and nor genes (associated respectively with nitrate reductase, nitrite reductase, and nitric oxide reductase, which catalyze successive steps of denitrification), cbbM (encoding form II RubisCO), and genes involved with sulfur-compound oxidation (including two physically separated but highly similar copies of sulfide:quinone oxidoreductase and of dsrC, associated with dissimilatory sulfite reductase). Among genes associated with denitrification, relative expression levels (i.e., degree of upregulation with nitrate) tended to decrease in the order nar > nir > nor > nos. Reverse transcription, quantitative PCR analysis was used to validate these trends.

  5. Recent advances in understanding the roles of whole genome duplications in evolution [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Carol MacKintosh

    2017-08-01

    Full Text Available Ancient whole-genome duplications (WGDs—paleopolyploidy events—are key to solving Darwin’s ‘abominable mystery’ of how flowering plants evolved and radiated into a rich variety of species. The vertebrates also emerged from their invertebrate ancestors via two WGDs, and genomes of diverse gymnosperm trees, unicellular eukaryotes, invertebrates, fishes, amphibians and even a rodent carry evidence of lineage-specific WGDs. Modern polyploidy is common in eukaryotes, and it can be induced, enabling mechanisms and short-term cost-benefit assessments of polyploidy to be studied experimentally. However, the ancient WGDs can be reconstructed only by comparative genomics: these studies are difficult because the DNA duplicates have been through tens or hundreds of millions of years of gene losses, mutations, and chromosomal rearrangements that culminate in resolution of the polyploid genomes back into diploid ones (rediploidisation. Intriguing asymmetries in patterns of post-WGD gene loss and retention between duplicated sets of chromosomes have been discovered recently, and elaborations of signal transduction systems are lasting legacies from several WGDs. The data imply that simpler signalling pathways in the pre-WGD ancestors were converted via WGDs into multi-stranded parallelised networks. Genetic and biochemical studies in plants, yeasts and vertebrates suggest a paradigm in which different combinations of sister paralogues in the post-WGD regulatory networks are co-regulated under different conditions. In principle, such networks can respond to a wide array of environmental, sensory and hormonal stimuli and integrate them to generate phenotypic variety in cell types and behaviours. Patterns are also being discerned in how the post-WGD signalling networks are reconfigured in human cancers and neurological conditions. It is fascinating to unpick how ancient genomic events impact on complexity, variety and disease in modern life.

  6. Whole genome prediction of bladder cancer risk with the Bayesian LASSO.

    Science.gov (United States)

    de Maturana, Evangelina López; Chanok, Stephen J; Picornell, Antoni C; Rothman, Nathaniel; Herranz, Jesús; Calle, M Luz; García-Closas, Montserrat; Marenne, Gaëlle; Brand, Angela; Tardón, Adonina; Carrato, Alfredo; Silverman, Debra T; Kogevinas, Manolis; Gianola, Daniel; Real, Francisco X; Malats, Núria

    2014-07-01

    To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUCtest = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions. © 2014 WILEY PERIODICALS, INC.

  7. Genome-wide SNP-genotyping array to study the evolution of the human pathogen Vibrio vulnificus biotype 3.

    Science.gov (United States)

    Raz, Nili; Danin-Poleg, Yael; Hayman, Ryan B; Bar-On, Yudi; Linetsky, Alex; Shmoish, Michael; Sanjuán, Eva; Amaro, Carmen; Walt, David R; Kashi, Yechezkel

    2014-01-01

    Vibrio vulnificus is an aquatic bacterium and an important human pathogen. Strains of V. vulnificus are classified into three different biotypes. The newly emerged biotype 3 has been found to be clonal and restricted to Israel. In the family Vibrionaceae, horizontal gene transfer is the main mechanism responsible for the emergence of new pathogen groups. To better understand the evolution of the bacterium, and in particular to trace the evolution of biotype 3, we performed genome-wide SNP genotyping of 254 clinical and environmental V. vulnificus isolates with worldwide distribution recovered over a 30-year period, representing all phylogeny groups. A custom single-nucleotide polymorphism (SNP) array implemented on the Illumina GoldenGate platform was developed based on 570 SNPs randomly distributed throughout the genome. In general, the genotyping results divided the V. vulnificus species into three main phylogenetic lineages and an additional subgroup, clade B, consisting of environmental and clinical isolates from Israel. Data analysis suggested that 69% of biotype 3 SNPs are similar to SNPs from clade B, indicating that biotype 3 and clade B have a common ancestor. The rest of the biotype 3 SNPs were scattered along the biotype 3 genome, probably representing multiple chromosomal segments that may have been horizontally inserted into the clade B recipient core genome from other phylogroups or bacterial species sharing the same ecological niche. Results emphasize the continuous evolution of V. vulnificus and support the emergence of new pathogenic groups within this species as a recurrent phenomenon. Our findings contribute to a broader understanding of the evolution of this human pathogen.

  8. Genome-Wide SNP-Genotyping Array to Study the Evolution of the Human Pathogen Vibrio vulnificus Biotype 3

    Science.gov (United States)

    Hayman, Ryan B.; Bar-On, Yudi; Linetsky, Alex; Shmoish, Michael; Sanjuán, Eva; Amaro, Carmen; Walt, David R.; Kashi, Yechezkel

    2014-01-01

    Vibrio vulnificus is an aquatic bacterium and an important human pathogen. Strains of V. vulnificus are classified into three different biotypes. The newly emerged biotype 3 has been found to be clonal and restricted to Israel. In the family Vibrionaceae, horizontal gene transfer is the main mechanism responsible for the emergence of new pathogen groups. To better understand the evolution of the bacterium, and in particular to trace the evolution of biotype 3, we performed genome-wide SNP genotyping of 254 clinical and environmental V. vulnificus isolates with worldwide distribution recovered over a 30-year period, representing all phylogeny groups. A custom single-nucleotide polymorphism (SNP) array implemented on the Illumina GoldenGate platform was developed based on 570 SNPs randomly distributed throughout the genome. In general, the genotyping results divided the V. vulnificus species into three main phylogenetic lineages and an additional subgroup, clade B, consisting of environmental and clinical isolates from Israel. Data analysis suggested that 69% of biotype 3 SNPs are similar to SNPs from clade B, indicating that biotype 3 and clade B have a common ancestor. The rest of the biotype 3 SNPs were scattered along the biotype 3 genome, probably representing multiple chromosomal segments that may have been horizontally inserted into the clade B recipient core genome from other phylogroups or bacterial species sharing the same ecological niche. Results emphasize the continuous evolution of V. vulnificus and support the emergence of new pathogenic groups within this species as a recurrent phenomenon. Our findings contribute to a broader understanding of the evolution of this human pathogen. PMID:25526263

  9. "GenotypeColour™": colour visualisation of SNPs and CNVs

    Science.gov (United States)

    Barlati, Sergio; Chiesa, Sergio; Magri, Chiara

    2009-01-01

    Background The volume of data available on genetic variations has increased considerably with the recent development of high-density, single-nucleotide polymorphism (SNP) arrays. Several software programs have been developed to assist researchers in the analysis of this huge amount of data, but few can rely upon a whole genome variability visualisation system that could help data interpretation. Results We have developed GenotypeColour™ as a rapid user-friendly tool able to upload, visualise and compare the huge amounts of data produced by Affymetrix Human Mapping GeneChips without losing the overall view of the data. Some features of GenotypeColour™ include visualising the entire genome variability in a single screenshot for one or more samples, the simultaneous display of the genotype and Copy Number state for thousands of SNPs, and the comparison of large amounts of samples by producing "consensus" images displaying regions of complete or partial identity. The software is also useful for genotype analysis of trios and to show regions of potential uniparental disomy (UPD). All information can then be exported in a tabular format for analysis with dedicated software. At present, the software can handle data from 10 K, 100 K, 250 K, 5.0 and 6.0 Affymetrix chips. Conclusion We have created a software that offers a new way of displaying and comparing SNP and CNV genomic data. The software is available free at and is especially useful for the analysis of multiple samples. PMID:19193232

  10. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-10-24

    Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity

  11. Whole genome methylation profiles as independent markers of survival in stage IIIC melanoma patients

    Directory of Open Access Journals (Sweden)

    Sigalotti Luca

    2012-09-01

    Full Text Available Abstract Background The clinical course of cutaneous melanoma (CM can differ significantly for patients with identical stages of disease, defined clinico-pathologically, and no molecular markers differentiate patients with such a diverse prognosis. This study aimed to define the prognostic value of whole genome DNA methylation profiles in stage III CM. Methods Genome-wide methylation profiles were evaluated by the Illumina Human Methylation 27 BeadChip assay in short-term neoplastic cell cultures from 45 stage IIIC CM patients. Unsupervised K-means partitioning clustering was exploited to sort patients into 2 groups based on their methylation profiles. Methylation patterns related to the discovered groups were determined using the nearest shrunken centroid classification algorithm. The impact of genome-wide methylation patterns on overall survival (OS was assessed using Cox regression and Kaplan-Meier analyses. Results Unsupervised K-means partitioning by whole genome methylation profiles identified classes with significantly different OS in stage IIIC CM patients. Patients with a “favorable” methylation profile had increased OS (P = 0.001, log-rank = 10.2 by Kaplan-Meier analysis. Median OS of stage IIIC patients with a “favorable” vs. “unfavorable” methylation profile were 31.5 and 10.4 months, respectively. The 5 year OS for stage IIIC patients with a “favorable” methylation profile was 41.2% as compared to 0% for patients with an “unfavorable” methylation profile. Among the variables examined by multivariate Cox regression analysis, classification defined by methylation profile was the only predictor of OS (Hazard Ratio = 2.41, for “unfavorable” methylation profile; 95% Confidence Interval: 1.02-5.70; P = 0.045. A 17 gene methylation signature able to correctly assign prognosis (overall error rate = 0 in stage IIIC patients on the basis of distinct methylation-defined groups was also identified

  12. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic

  13. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    Directory of Open Access Journals (Sweden)

    Alexander T Dilthey

    2016-10-01

    Full Text Available Genetic variation at the Human Leucocyte Antigen (HLA genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG framework. First, we construct a PRG for 46 (mostly HLA genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1 and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data. Of 158 alleles tested, we correctly infer 157 alleles (99.4%. We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample remain a

  14. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    Science.gov (United States)

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available

  15. Whole-Genome Thermodynamic Analysis Reduces siRNA Off-Target Effects

    Science.gov (United States)

    Chen, Xi; Liu, Peng; Chou, Hui-Hsien

    2013-01-01

    Small interfering RNAs (siRNAs) are important tools for knocking down targeted genes, and have been widely applied to biological and biomedical research. To design siRNAs, two important aspects must be considered: the potency in knocking down target genes and the off-target effect on any nontarget genes. Although many studies have produced useful tools to design potent siRNAs, off-target prevention has mostly been delegated to sequence-level alignment tools such as BLAST. We hypothesize that whole-genome thermodynamic analysis can identify potential off-targets with higher precision and help us avoid siRNAs that may have strong off-target effects. To validate this hypothesis, two siRNA sets were designed to target three human genes IDH1, ITPR2 and TRIM28. They were selected from the output of two popular siRNA design tools, siDirect and siDesign. Both siRNA design tools have incorporated sequence-level screening to avoid off-targets, thus their output is believed to be optimal. However, one of the sets we tested has off-target genes predicted by Picky, a whole-genome thermodynamic analysis tool. Picky can identify off-target genes that may hybridize to a siRNA within a user-specified melting temperature range. Our experiments validated that some off-target genes predicted by Picky can indeed be inhibited by siRNAs. Similar experiments were performed using commercially available siRNAs and a few off-target genes were also found to be inhibited as predicted by Picky. In summary, we demonstrate that whole-genome thermodynamic analysis can identify off-target genes that are missed in sequence-level screening. Because Picky prediction is deterministic according to thermodynamics, if a siRNA candidate has no Picky predicted off-targets, it is unlikely to cause off-target effects. Therefore, we recommend including Picky as an additional screening step in siRNA design. PMID:23484018

  16. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Ahrenfeldt, Johanne; Bellod Cisneros, Jose Luis

    2016-01-01

    web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes...... and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services...... and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https...

  17. A Danish Salmonella Bareilly outbreak investigated by the use of whole genome sequencing

    DEFF Research Database (Denmark)

    Torpdahl, M.; Kiil, K.; Litrup, E.

    2013-01-01

    In 2012, we saw an increase of the Salmonella serotype Bareilly isolated from human infections. Bareilly is a rare serotype in Denmark, isolated from human infections between 2 and 9 times annually over the last 10 years. As a routine in rare serotypes, we use PFGE as the molecular method...... and broilers differed by two bands When using PFGE in outbreak investigation there are some interpretative implications that have to be considered. There are differences on how important band changes are when defining clusters of different serotypes. Some outbreaks have been reported to include PFGE profiles...... with several band changes and others are defined by one PFGE profile thereby excluding closely related profiles. We decided to investigate whether whole genome sequencing (WGS) could resolve this issue and be useful in outbreak investigations. Several analyses were performed, including a SNP tree based...

  18. Whole-genome sequencing of a malignant granular cell tumor with metabolic response to pazopanib

    Science.gov (United States)

    Wei, Lei; Liu, Song; Conroy, Jeffrey; Wang, Jianmin; Papanicolau-Sengos, Antonios; Glenn, Sean T.; Murakami, Mitsuko; Liu, Lu; Hu, Qiang; Conroy, Jacob; Miles, Kiersten Marie; Nowak, David E.; Liu, Biao; Qin, Maochun; Bshara, Wiam; Omilian, Angela R.; Head, Karen; Bianchi, Michael; Burgher, Blake; Darlak, Christopher; Kane, John; Merzianu, Mihai; Cheney, Richard; Fabiano, Andrew; Salerno, Kilian; Talati, Chetasi; Khushalani, Nikhil I.; Trump, Donald L.; Johnson, Candace S.; Morrison, Carl D.

    2015-01-01

    Granular cell tumors are an uncommon soft tissue neoplasm. Malignant granular cell tumors comprise T transitions, particularly when immediately preceded by a 5′ G. A loss-of-function mutation was detected in a newly recognized tumor suppressor candidate, BRD7. No mutations were found in known targets of pazopanib. However, we identified a receptor tyrosine kinase pathway mutation in GFRA2 that warrants further evaluation. To the best of our knowledge, this is only the second reported case of a malignant granular cell tumor exhibiting a response to pazopanib, and the first whole-genome sequencing of this uncommon tumor type. The findings provide insight into the genetic basis of malignant granular cell tumors and identify potential targets for further investigation. PMID:27148567

  19. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation

    DEFF Research Database (Denmark)

    Zhao, Shancen; Zheng, Pingping; Dong, Shanshan

    2013-01-01

    The panda lineage dates back to the late Miocene and ultimately leads to only one extant species, the giant panda (Ailuropoda melanoleuca). Although global climate change and anthropogenic disturbances are recognized to shape animal population demography their contribution to panda population...... dynamics remains largely unknown. We sequenced the whole genomes of 34 pandas at an average 4.7-fold coverage and used this data set together with the previously deep-sequenced panda genome to reconstruct a continuous demographic history of pandas from their origin to the present. We identify two...... panda populations that show genetic adaptation to their environments. However, in all three populations, anthropogenic activities have negatively affected pandas for 3,000 years....

  20. Whole-genome regression and prediction methods applied to plant and animal breeding.

    Science.gov (United States)

    de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L

    2013-02-01

    Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.

  1. Clinical decision support for whole genome sequence information leveraging a service-oriented architecture: a prototype.

    Science.gov (United States)

    Welch, Brandon M; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

    2014-01-01

    Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time.

  2. Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Nielsen, Eva M.; Kaas, Rolf Sommer

    2014-01-01

    Salmonella enterica is a common cause of minor and large food borne outbreaks. To achieve successful and nearly ‘real-time’ monitoring and identification of outbreaks, reliable sub-typing is essential. Whole genome sequencing (WGS) shows great promises for using as a routine epidemiological typing...... analyses were also compared to PFGE reveling that WGS typing achieved the greater performance than the traditional method. In conclusion, for S. Typhimurium, SNP analysis and nucleotide difference approach of WGS data seem to be the superior methods for epidemiological typing compared to other phylogenetic...... analytic approaches that may be used on WGS. These approaches were also superior to the more classical typing method, PFGE. Our study also indicates that WGS alone is insufficient to determine whether strains are related or un-related to outbreaks. This still requires the combination of epidemiological...

  3. Bioinformatics Workflow for Clinical Whole Genome Sequencing at Partners HealthCare Personalized Medicine

    Directory of Open Access Journals (Sweden)

    Ellen A. Tsai

    2016-02-01

    Full Text Available Effective implementation of precision medicine will be enhanced by a thorough understanding of each patient’s genetic composition to better treat his or her presenting symptoms or mitigate the onset of disease. This ideally includes the sequence information of a complete genome for each individual. At Partners HealthCare Personalized Medicine, we have developed a clinical process for whole genome sequencing (WGS with application in both healthy individuals and those with disease. In this manuscript, we will describe our bioinformatics strategy to efficiently process and deliver genomic data to geneticists for clinical interpretation. We describe the handling of data from FASTQ to the final variant list for clinical review for the final report. We will also discuss our methodology for validating this workflow and the cost implications of running WGS.

  4. SPlinted Ligation Adapter Tagging (SPLAT), a novel library preparation method for whole genome bisulphite sequencing

    Science.gov (United States)

    Manlig, Erika; Wahlberg, Per

    2017-01-01

    Abstract Sodium bisulphite treatment of DNA combined with next generation sequencing (NGS) is a powerful combination for the interrogation of genome-wide DNA methylation profiles. Library preparation for whole genome bisulphite sequencing (WGBS) is challenging due to side effects of the bisulphite treatment, which leads to extensive DNA damage. Recently, a new generation of methods for bisulphite sequencing library preparation have been devised. They are based on initial bisulphite treatment of the DNA, followed by adaptor tagging of single stranded DNA fragments, and enable WGBS using low quantities of input DNA. In this study, we present a novel approach for quick and cost effective WGBS library preparation that is based on splinted adaptor tagging (SPLAT) of bisulphite-converted single-stranded DNA. Moreover, we validate SPLAT against three commercially available WGBS library preparation techniques, two of which are based on bisulphite treatment prior to adaptor tagging and one is a conventional WGBS method. PMID:27899585

  5. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David

    2008-01-01

    the Clostridium tetani plasmid p88, where homologues for toxin genes can be easily visualized in other sequenced Clostridium genomes, and for a Clostridium botulinum genome, compared to 14 other Clostridium genomes. DNA structural information is also included in the atlas to visualize the DNA chromosomal context......The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced...... genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including...

  6. Accurate and robust prediction of genetic relationship from whole-genome sequences.

    Directory of Open Access Journals (Sweden)

    Hong Li

    Full Text Available Computing the genetic relationship between two humans is important to studies in genetics, genomics, genealogy, and forensics. Relationship algorithms may be sensitive to noise, such as that arising from sequencing errors or imperfect reference genomes. We developed an algorithm for estimation of genetic relationship by averaged blocks (GRAB that is designed for whole-genome sequencing (WGS data. GRAB segments the genome into blocks, calculates the fraction of blocks sharing identity, and then uses a classification tree to infer 1st- to 5th- degree relationships and unrelated individuals. We evaluated GRAB on simulated and real sequenced families, and compared it with other software. GRAB achieves similar performance, and does not require knowledge of population background or phasing. GRAB can be used in workflows for identifying unreported relationships, validating reported relationships in family-based studies, and detection of sample-tracking errors or duplicate inclusion. The software is available at familygenomics.systemsbiology.net/grab.

  7. Identification of genomic regions associated with female fertility in Danish Jersey using whole genome sequence data

    DEFF Research Database (Denmark)

    Höglund, Johanna; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2015-01-01

    sires from Denmark with official breeding values for female fertility traits. The association analyses were carried out in two steps: first the cattle genome was scanned for quantitative trait loci using a sire model for FTI using imputed whole genome sequence variants; second the significant...... (AIS), 56-day non-return rate (NRR), number of days from first to last insemination (IFL), and number of days between calving and first insemination (ICF). The objective of this study was to identify associations between sequence variants and fertility traits in Jersey cattle based on 1,225 Jersey...... for cows on BTA20, BTA23 and BTA25, IFL for heifers on BTA7 and QTL9-2 on BTA9, NRR for heifers on BTA7 and BTA23, and NRR for cows on BTA23. Conclusion: The genome wide association study presented here revealed 6 genomic regions associated with FTI. Screening these 6 QTL regions for the underlying female...

  8. Whole genome sequence of Pantoea ananatis R100, an antagonistic bacterium isolated from rice seed.

    Science.gov (United States)

    Wu, Liwen; Liu, Ruifang; Niu, Yaofang; Lin, Haiyan; Ye, Weijun; Guo, Longbiao; Hu, Xingming

    2016-05-10

    Pantoea ananatis is a group of bacteria, which was first reported as plant pathogen. Recently, several papers also described its biocontrol ability. In 2003, P. ananatis R100, which showed strong antagonism against several plant pathogens, was isolated from rice seeds. In this study, whole genome sequence of this strain was determined by SMRT Cell technology. The total genome size of R100 is 4,857,861bp with 4659 coding genes (CDS), 82 tRNAs and 22 rRNAs. The genome sequence of R100 may shed a light on the research of antagonism P. ananatis. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation

    DEFF Research Database (Denmark)

    Michaelson, Jacob J.; Shi, Yujian; Gujral, Madhusudan

    2012-01-01

    investigated global patterns of germline mutation by whole-genome sequencing of monozygotic twins concordant for ASD and their parents. Mutation rates varied widely throughout the genome (by 100-fold) and could be explained by intrinsic characteristics of DNA sequence and chromatin structure. Dense clusters...... of mutations within individual genomes were attributable to compound mutation or gene conversion. Hypermutability was a characteristic of genes involved in ASD and other diseases. In addition, genes impacted by mutations in this study were associated with ASD in independent exome-sequencing data sets. Our......De novo mutation plays an important role in autism spectrum disorders (ASDs). Notably, pathogenic copy number variants (CNVs) are characterized by high mutation rates. We hypothesize that hypermutability is a property of ASD genes and may also include nucleotide-substitution hot spots. We...

  10. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates.

    Science.gov (United States)

    Berthelot, Camille; Brunet, Frédéric; Chalopin, Domitille; Juanchich, Amélie; Bernard, Maria; Noël, Benjamin; Bento, Pascal; Da Silva, Corinne; Labadie, Karine; Alberti, Adriana; Aury, Jean-Marc; Louis, Alexandra; Dehais, Patrice; Bardou, Philippe; Montfort, Jérôme; Klopp, Christophe; Cabau, Cédric; Gaspin, Christine; Thorgaard, Gary H; Boussaha, Mekki; Quillet, Edwige; Guyomard, René; Galiana, Delphine; Bobe, Julien; Volff, Jean-Nicolas; Genêt, Carine; Wincker, Patrick; Jaillon, Olivier; Roest Crollius, Hugues; Guiguen, Yann

    2014-04-22

    Vertebrate evolution has been shaped by several rounds of whole-genome duplications (WGDs) that are often suggested to be associated with adaptive radiations and evolutionary innovations. Due to an additional round of WGD, the rainbow trout genome offers a unique opportunity to investigate the early evolutionary fate of a duplicated vertebrate genome. Here we show that after 100 million years of evolution the two ancestral subgenomes have remained extremely collinear, despite the loss of half of the duplicated protein-coding genes, mostly through pseudogenization. In striking contrast is the fate of miRNA genes that have almost all been retained as duplicated copies. The slow and stepwise rediploidization process characterized here challenges the current hypothesis that WGD is followed by massive and rapid genomic reorganizations and gene deletions.

  11. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences

    KAUST Repository

    Coll, Francesc

    2015-05-27

    Mycobacterium tuberculosis drug resistance (DR) challenges effective tuberculosis disease control. Current molecular tests examine limited numbers of mutations, and although whole genome sequencing approaches could fully characterise DR, data complexity has restricted their clinical application. A library (1,325 mutations) predictive of DR for 15 anti-tuberculosis drugs was compiled and validated for 11 of them using genomic-phenotypic data from 792 strains. A rapid online ‘TB-Profiler’ tool was developed to report DR and strain-type profiles directly from raw sequences. Using our DR mutation library, in silico diagnostic accuracy was superior to some commercial diagnostics and alternative databases. The library will facilitate sequence-based drug-susceptibility testing.

  12. Real-Time Whole-Genome Sequencing for Surveillance of Listeria monocytogenes, France.

    Science.gov (United States)

    Moura, Alexandra; Tourdjman, Mathieu; Leclercq, Alexandre; Hamelin, Estelle; Laurent, Edith; Fredriksen, Nathalie; Van Cauteren, Dieter; Bracq-Dieye, Hélène; Thouvenot, Pierre; Vales, Guillaume; Tessaud-Rita, Nathalie; Maury, Mylène M; Alexandru, Andreea; Criscuolo, Alexis; Quevillon, Emmanuel; Donguy, Marie-Pierre; Enouf, Vincent; de Valk, Henriette; Brisse, Sylvain; Lecuit, Marc

    2017-09-01

    During 2015-2016, we evaluated the performance of whole-genome sequencing (WGS) as a routine typing tool. Its added value for microbiological and epidemiologic surveillance of listeriosis was compared with that for pulsed-field gel electrophoresis (PFGE), the current standard method. A total of 2,743 Listeria monocytogenes isolates collected as part of routine surveillance were characterized in parallel by PFGE and core genome multilocus sequence typing (cgMLST) extracted from WGS. We investigated PFGE and cgMLST clusters containing human isolates. Discrimination of isolates was significantly higher by cgMLST than by PFGE (pWGS-based typing should replace PFGE as the primary typing method for L. monocytogenes.

  13. Characterization of a Wheat Breeders' Array suitable for high-throughput SNP genotyping of global accessions of hexaploid bread wheat (Triticum aestivum).

    Science.gov (United States)

    Allen, Alexandra M; Winfield, Mark O; Burridge, Amanda J; Downie, Rowena C; Benbow, Harriet R; Barker, Gary L A; Wilkinson, Paul A; Coghill, Jane; Waterfall, Christy; Davassi, Alessandro; Scopes, Geoff; Pirani, Ali; Webster, Teresa; Brew, Fiona; Bloor, Claire; Griffiths, Simon; Bentley, Alison R; Alda, Mark; Jack, Peter; Phillips, Andrew L; Edwards, Keith J

    2017-03-01

    Targeted selection and inbreeding have resulted in a lack of genetic diversity in elite hexaploid bread wheat accessions. Reduced diversity can be a limiting factor in the breeding of high yielding varieties and crucially can mean reduced resilience in the face of changing climate and resource pressures. Recent technological advances have enabled the development of molecular markers for use in the assessment and utilization of genetic diversity in hexaploid wheat. Starting with a large collection of 819 571 previously characterized wheat markers, here we describe the identification of 35 143 single nucleotide polymorphism-based markers, which are highly suited to the genotyping of elite hexaploid wheat accessions. To assess their suitability, the markers have been validated using a commercial high-density Affymetrix Axiom® genotyping array (the Wheat Breeders' Array), in a high-throughput 384 microplate configuration, to characterize a diverse global collection of wheat accessions including landraces and elite lines derived from commercial breeding communities. We demonstrate that the Wheat Breeders' Array is also suitable for generating high-density genetic maps of previously uncharacterized populations and for characterizing novel genetic diversity produced by mutagenesis. To facilitate the use of the array by the wheat community, the markers, the associated sequence and the genotype information have been made available through the interactive web site 'CerealsDB'. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  14. Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs)

    Science.gov (United States)

    Sims, Gregory E.; Kim, Sung-Hou

    2011-01-01

    A whole-genome phylogeny of the Escherichia coli/Shigella group was constructed by using the feature frequency profile (FFP) method. This alignment-free approach uses the frequencies of l-mer features of whole genomes to infer phylogenic distances. We present two phylogenies that accentuate different aspects of E. coli/Shigella genomic evolution: (i) one based on the compositions of all possible features of length l = 24 (∼8.4 million features), which are likely to reveal the phenetic grouping and relationship among the organisms and (ii) the other based on the compositions of core features with low frequency and low variability (∼0.56 million features), which account for ∼69% of all commonly shared features among 38 taxa examined and are likely to have genome-wide lineal evolutionary signal. Shigella appears as a single clade when all possible features are used without filtering of noncore features. However, results using core features show that Shigella consists of at least two distantly related subclades, implying that the subclades evolved into a single clade because of a high degree of convergence influenced by mobile genetic elements and niche adaptation. In both FFP trees, the basal group of the E. coli/Shigella phylogeny is the B2 phylogroup, which contains primarily uropathogenic strains, suggesting that the E. coli/Shigella ancestor was likely a facultative or opportunistic pathogen. The extant commensal strains diverged relatively late and appear to be the result of reductive evolution of genomes. We also identify clade distinguishing features and their associated genomic regions within each phylogroup. Such features may provide useful information for understanding evolution of the groups and for quick diagnostic identification of each phylogroup. PMID:21536867

  15. Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing.

    Science.gov (United States)

    Park, Doori; Jung, Je Won; Choi, Beom-Soon; Jayakodi, Murukarthick; Lee, Jeongsoo; Lim, Jongsung; Yu, Yeisoo; Choi, Yong-Soo; Lee, Myeong-Lyeol; Park, Yoonseong; Choi, Ik-Young; Yang, Tae-Jin; Edwards, Owain R; Nah, Gyoungju; Kwon, Hyung Wook

    2015-01-02

    The honey bee is an important model system for increasing understanding of molecular and neural mechanisms underlying social behaviors relevant to the agricultural industry and basic science. The western honey bee, Apis mellifera, has served as a model species, and its genome sequence has been published. In contrast, the genome of the Asian honey bee, Apis cerana, has not yet been sequenced. A. cerana has been raised in Asian countries for thousands of years and has brought considerable economic benefits to the apicultural industry. A cerana has divergent biological traits compared to A. mellifera and it has played a key role in maintaining biodiversity in eastern and southern Asia. Here we report the first whole genome sequence of A. cerana. Using de novo assembly methods, we produced a 238 Mbp draft of the A. cerana genome and generated 10,651 genes. A.cerana-specific genes were analyzed to better understand the novel characteristics of this honey bee species. Seventy-two percent of the A. cerana-specific genes had more than one GO term, and 1,696 enzymes were categorized into 125 pathways. Genes involved in chemoreception and immunity were carefully identified and compared to those from other sequenced insect models. These included 10 gustatory receptors, 119 odorant receptors, 10 ionotropic receptors, and 160 immune-related genes. This first report of the whole genome sequence of A. cerana provides resources for comparative sociogenomics, especially in the field of social insect communication. These important tools will contribute to a better understanding of the complex behaviors and natural biology of the Asian honey bee and to anticipate its future evolutionary trajectory.

  16. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  17. Views of American OB/GYNs on the ethics of prenatal whole-genome sequencing.

    Science.gov (United States)

    Bayefsky, Michelle J; White, Amina; Wakim, Paul; Hull, Sara Chandros; Wasserman, David; Chen, Stephanie; Berkman, Benjamin E

    2016-12-01

    Given public demand for genetic information, the potential to perform prenatal whole-genome sequencing (PWGS) non-invasively in the future, and decreasing costs of whole-genome sequencing, it is likely that OB/GYN practice will include PWGS. The goal of this project was to explore OB/GYNs' views on the ethical issues surrounding PWGS and their preparedness for counseling patients on its use. A national survey was administered to 2500 members of American Congress of Obstetricians and Gynecologists. A total of 1114 respondents completed the survey (response rate = 45%). OB/GYNs are most concerned with ordering non-medical fetal genetic information, are worried about increasing parental anxiety, and feel it is appropriate to be directive when counseling parents about PWGS. Furthermore, most OB/GYNs have limited knowledge of genetics, rely heavily on genetic counselors and would like more guidance regarding the clinical adoption of PWGS. OB/GYNs do not completely accept or reject PWGS, but a substantial number have significant ethical and practical concerns. They are most concerned with issues that will directly affect their practices and interactions with patients, such as increasing parental anxiety and costs of care. Professional guidance would be instrumental in directing the adoption of PWGS and alleviating the ethical burden posed by PWGS on individual OB/GYNs. Published 2016. This article is a U.S. Government work and is in the public domain in the USA. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

  18. Whole genome investigation of a divergent clade of the pathogen Streptococcus suis

    Directory of Open Access Journals (Sweden)

    Abiyad eBaig

    2015-11-01

    Full Text Available Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN and cpn60 did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70, of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species.

  19. Practical Issues in Implementing Whole-Genome-Sequencing in Routine Diagnostic Microbiology.

    Science.gov (United States)

    Rossen, John W A; Friedrich, Alexander W; Moran-Gilad, Jacob

    2017-11-05

    next generation sequencing (NGS) is increasingly being used in clinical microbiology. Like every new technology that is being adopted in microbiology, the integration of NGS into clinical and routine workflows needs to be carefully managed. to review the practical aspects of implementing bacterial whole genome sequencing (WGS) in routine diagnostic laboratories. review of the literature and expert opinion. in this review, we discuss when and how to integrate whole genome sequencing (WGS) in the routine workflow of the clinical laboratory. In addition, as the microbiology laboratories have to adhere to various national and international regulations and criteria for their accreditation, we deliberate on quality control issues for using WGS in microbiology, including the importance of proficiency testing. Furthermore, the current and future place of this technology in the diagnostic hierarchy of microbiology is described as well as the necessity of maintaining backwards compatibility with already established methods. Finally, we speculate on the question whether WGS can entirely replace routine microbiology in the future and the tension between the fact that most sequencers are designed to process multiple samples in parallel whereas for optimal diagnosis a one-by-one processing of the samples is preferred. Special reference is made to the cost and turnaround time of WGS in diagnostic laboratories. further development is required to improve the workflow for WGS, particularly shorten the turnaround time, reduce costs and streamline downstream data analyses. Only when these processes will reach maturity, reliance on WGS for routine patient management and infection control management will become feasible, enabling the transformation of clinical microbiology into a genome-based and personalised diagnostic field. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  20. The Whole-Genome and Transcriptome of the Manila Clam (Ruditapes philippinarum).

    Science.gov (United States)

    Mun, Seyoung; Kim, Yun-Ji; Markkandan, Kesavan; Shin, Wonseok; Oh, Sumin; Woo, Jiyoung; Yoo, Jongsu; An, Hyesuck; Han, Kyudong

    2017-06-01

    The manila clam, Ruditapes philippinarum, is an important bivalve species in worldwide aquaculture including Korea. The aquaculture production of R. philippinarum is under threat from diverse environmental factors including viruses, microorganisms, parasites, and water conditions with subsequently declining production. In spite of its importance as a marine resource, the reference genome of R. philippinarum for comprehensive genetic studies is largely unexplored. Here, we report the de novo whole-genome and transcriptome assembly of R. philippinarum across three different tissues (foot, gill, and adductor muscle), and provide the basic data for advanced studies in selective breeding and disease control in order to obtain successful aquaculture systems. An approximately 2.56 Gb high quality whole-genome was assembled with various library construction methods. A total of 108,034 protein coding gene models were predicted and repetitive elements including simple sequence repeats and noncoding RNAs were identified to further understanding of the genetic background of R. philippinarum for genomics-assisted breeding. Comparative analysis with the bivalve marine invertebrates uncover that the gene family related to complement C1q was enriched. Furthermore, we performed transcriptome analysis with three different tissues in order to support genome annotation and then identified 41,275 transcripts which were annotated. The R. philippinarum genome resource will markedly advance a wide range of potential genetic studies, a reference genome for comparative analysis of bivalve species and unraveling mechanisms of biological processes in molluscs. We believe that the R. philippinarum genome will serve as an initial platform for breeding better-quality clams using a genomic approach. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  1. Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly

    Science.gov (United States)

    Li, Heng

    2012-01-01

    Motivation: Eugene Myers in his string graph paper suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion (INDEL) calling, can also be achieved with unitigs. Results: To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly can be a beneficial complement to the standard variant calling pipeline for whole-genome resequencing. In the methodological aspects, we propose FMD-index for forward–backward extension of DNA sequences, a fast algorithm for finding all super-maximal exact matches and one-pass construction of unitigs from an FMD-index. Availability: http://github.com/lh3/fermi Contact: hengli@broadinstitute.org PMID:22569178

  2. Whole genome sequencing and methylome analysis of the wild guinea pig.

    Science.gov (United States)

    Weyrich, Alexandra; Schüllermann, Tino; Heeger, Felix; Jeschek, Marie; Mazzoni, Camila J; Chen, Wei; Schumann, Kathrin; Fickel, Joerns

    2014-11-28

    DNA methylation is a heritable mechanism that acts in response to environmental changes, lifestyle and diseases by influencing gene expression in eukaryotes. Epigenetic studies of wild organisms are mandatory to understand their role in e.g. adaptational processes in the great variety of ecological niches. However, strategies to address those questions on a methylome scale are widely missing. In this study we present such a strategy and describe a whole genome sequence and methylome analysis of the wild guinea pig. We generated a full Wild guinea pig (Cavia aperea) genome sequence with enhanced coverage of methylated regions, benefiting from the available sequence of the domesticated relative Cavia porcellus. This new genome sequence was then used as reference to map the sequence reads of bisulfite treated Wild guinea pig sequencing libraries to investigate DNA-methylation patterns at nucleotide-specific level, by using our here described method, named 'DNA-enrichment-bisulfite-sequencing' (MEBS). The results achieved using MEBS matched those of standard methods in other mammalian model species. The technique is cost efficient, and incorporates both methylation enrichment results and a nucleotide-specific resolution even without a whole genome sequence available. Thus MEBS can be easily applied to extend methylation enrichment studies to a nucleotide-specific level. The approach is suited to study methylomes of not yet sequenced mammals at single nucleotide resolution. The strategy is transferable to other mammalian species by applying the nuclear genome sequence of a close relative. It is therefore of interest for studies on a variety of wild species trying to answer evolutionary, adaptational, ecological or medical questions by epigenetic mechanisms.

  3. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    Science.gov (United States)

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  4. A model for carbohydrate metabolism in the diatom Phaeodactylum tricornutum deduced from comparative whole genome analysis.

    Directory of Open Access Journals (Sweden)

    Peter G Kroth

    Full Text Available BACKGROUND: Diatoms are unicellular algae responsible for approximately 20% of global carbon fixation. Their evolution by secondary endocytobiosis resulted in a complex cellular structure and metabolism compared to algae with primary plastids. METHODOLOGY/PRINCIPAL FINDINGS: The whole genome sequence of the diatom Phaeodactylum tricornutum has recently been completed. We identified and annotated genes for enzymes involved in carbohydrate pathways based on extensive EST support and comparison to the whole genome sequence of a second diatom, Thalassiosira pseudonana. Protein localization to mitochondria was predicted based on identified similarities to mitochondrial localization motifs in other eukaryotes, whereas protein localization to plastids was based on the presence of signal peptide motifs in combination with plastid localization motifs previously shown to be required in diatoms. We identified genes potentially involved in a C4-like photosynthesis in P. tricornutum and, on the basis of sequence-based putative localization of relevant proteins, discuss possible differences in carbon concentrating mechanisms and CO(2 fixation between the two diatoms. We also identified genes encoding enzymes involved in photorespiration with one interesting exception: glycerate kinase was not found in either P. tricornutum or T. pseudonana. Various Calvin cycle enzymes were found in up to five different isoforms, distributed between plastids, mitochondria and the cytosol. Diatoms store energy either as lipids or as chrysolaminaran (a beta-1,3-glucan outside of the plastids. We identified various beta-glucanases and large membrane-bound glucan synthases. Interestingly most of the glucanases appear to contain C-terminal anchor domains that may attach the enzymes to membranes. CONCLUSIONS/SIGNIFICANCE: Here we present a detailed synthesis of carbohydrate metabolism in diatoms based on the genome sequences of Thalassiosira pseudonana and Phaeodactylum tricornutum

  5. Unlocking the diversity of genebanks: whole-genome marker analysis of Swiss bread wheat and spelt

    KAUST Repository

    Müller, Thomas

    2017-11-04

    Genebanks play a pivotal role in preserving the genetic diversity present among old landraces and wild progenitors of modern crops and they represent sources of agriculturally important genes that were lost during domestication and in modern breeding. However, undesirable genes that negatively affect crop performance are often co-introduced when landraces and wild crop progenitors are crossed with elite cultivars, which often limit the use of genebank material in modern breeding programs. A detailed genetic characterization is an important prerequisite to solve this problem and to make genebank material more accessible to breeding. Here, we genotyped 502 bread wheat and 293 spelt accessions held in the Swiss National Genebank using a 15K wheat SNP array. The material included both spring and winter wheats and consisted of old landraces and modern cultivars. Genome- and sub-genome-wide analyses revealed that spelt and bread wheat form two distinct gene pools. In addition, we identified bread wheat landraces that were genetically distinct from modern cultivars. Such accessions were possibly missed in the early Swiss wheat breeding program and are promising targets for the identification of novel genes. The genetic information obtained in this study is appropriate to perform genome-wide association studies, which will facilitate the identification and transfer of agriculturally important genes from the genebank into modern cultivars through marker-assisted selection.

  6. Whole-Genome Analysis of Diversity and SNP-Major Gene Association in Peach Germplasm.

    Directory of Open Access Journals (Sweden)

    Diego Micheletti

    Full Text Available Peach was domesticated in China more than four millennia ago and from there it spread world-wide. Since the middle of the last century, peach breeding programs have been very dynamic generating hundreds of new commercial varieties, however, in most cases such varieties derive from a limited collection of parental lines (founders. This is one reason for the observed low levels of variability of the commercial gene pool, implying that knowledge of the extent and distribution of genetic variability in peach is critical to allow the choice of adequate parents to confer enhanced productivity, adaptation and quality to improved varieties. With this aim we genotyped 1,580 peach accessions (including a few closely related Prunus species maintained and phenotyped in five germplasm collections (four European and one Chinese with the International Peach SNP Consortium 9K SNP peach array. The study of population structure revealed the subdivision of the panel in three main populations, one mainly made up of Occidental varieties from breeding programs (POP1OCB, one of Occidental landraces (POP2OCT and the third of Oriental accessions (POP3OR. Analysis of linkage disequilibrium (LD identified differential patterns of genome-wide LD blocks in each of the populations. Phenotypic data for seven monogenic traits were integrated in a genome-wide association study (GWAS. The significantly associated SNPs were always in the regions predicted by linkage analysis, forming haplotypes of markers. These diagnostic haplotypes could be used for marker-assisted selection (MAS in modern breeding programs.

  7. Efficient oligonucleotide probe selection for pan-genomic tiling arrays

    Directory of Open Access Journals (Sweden)

    Zhang Wei

    2009-09-01

    Full Text Available Abstract Background Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome. Results This paper presents a new probe selection algorithm (PanArray that can tile multiple whole genomes using a minimal number of probes. Unlike arrays built on clustered gene families, PanArray uses an unbiased, probe-centric approach that does not rely on annotations, gene clustering, or multi-alignments. Instead, probes are evenly tiled across all sequences of the pan-genome at a consistent level of coverage. To minimize the required number of probes, probes conserved across multiple strains in the pan-genome are selected first, and additional probes are used only where necessary to span polymorphic regions of the genome. The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage. Conclusion PanArray is an oligonucleotide probe selection algorithm for tiling multiple genome sequences using a minimal number of probes. It is capable of fully tiling all genomes of a species on

  8. Comparison of In-House Multiplex Real Time PCR, Diagcor GenoFlow HPV Array Test and INNO-LiPA HPV Genotyping Extra Assays with LCD- Array Kit for Human Papillomavirus Genotyping in Cervical Liquid Based Cytology Specimens and Genital Lesions in Tehran, Iran.

    Science.gov (United States)

    Sohrabi, Amir; Rahnamaye-Farzami, Marjan; Mirab-Samiee, Siamak; Mahdavi, Saeed; Babaei, Monireh

    2016-01-01

    Human papillomavirus is a major etiologic agent for some human common cancers. Cervical precancer and cancer is the most prevalent dysplasia by HPV genotypes. Various rapid and sensitive methods have been developed into readily HPV genotyping. In the present study, we compared the performance of Real Time PCR, GenoFlow HPV Array, and INNO-LiPA HPV Genotyping Extra Assays with LCD- Array. From 108 cervical samples, HPV was detected in 33 women (30.55%). Among detected HPV genotypes, HPV 6 and 11 were dominant genotypes. Comparing these methods revealed that for Real Time PCR, Genoflow, and INNO-LiPA in comparison with LCD Array, sensitivity and specificity were 94.2%, 93%; 76.7%, 93%; 64%, 96.5%, respectively. Overall, accuracy and precision of these methods were more than 80% and 90%, respectively. It seems that these methods are reliable and suitable for detection and genotyping of HPVs in cervical disorders and other dysplasia associated with human papillomaviruses.

  9. Whole-genome typing and characterization of blaVIM19-harbouring ST383 Klebsiella pneumoniae by PFGE, whole-genome mapping and WGS.

    Science.gov (United States)

    Sabirova, Julia S; Xavier, Basil Britto; Coppens, Jasmine; Zarkotou, Olympia; Lammens, Christine; Janssens, Lore; Burggrave, Ronald; Wagner, Trevor; Goossens, Herman; Malhotra-Kumar, Surbhi

    2016-06-01

    We utilized whole-genome mapping (WGM) and WGS to characterize 12 clinical carbapenem-resistant Klebsiella pneumoniae strains (TGH1-TGH12). All strains were screened for carbapenemase genes by PCR, and typed by MLST, PFGE (XbaI) and WGM (AflII) (OpGen, USA). WGS (Illumina) was performed on TGH8 and TGH10. Reads were de novo assembled and annotated [SPAdes, Rapid Annotation Subsystem Technology (RAST)]. Contigs were aligned directly, and after in silico AflII restriction, with corresponding WGMs (MapSolver, OpGen; BioNumerics, Applied Maths). All 12 strains were ST383. Of the 12 strains, 11 were carbapenem resistant, 7 harboured blaKPC-2 and 11 harboured blaVIM-19. Varying the parameters for assigning WGM clusters showed that these were comparable to STs and to the eight PFGE types or subtypes (difference of three or more bands). A 95% similarity coefficient assigned all 12 WGMs to a single cluster, whereas a 99% similarity coefficient (or ≥10 unmatched-fragment difference) assigned the 12 WGMs to eight (sub)clusters. Based on a difference of three or more bands between PFGE profiles, the Simpson's diversity indices (SDIs) of WGM (0.94, Jackknife pseudo-values CI: 0.883-0.996) and PFGE (0.93, Jackknife pseudo-values CI: 0.828-1.000) were similar (P = 0.649). However, the discriminatory power of WGM was significantly higher (SDI: 0.94, Jackknife pseudo-values CI: 0.883-0.996) than that of PFGE profiles typed on a difference of seven or more bands (SDI: 0.53, Jackknife pseudo-values CI: 0.212-0.849) (P = 0.007). This study demonstrates the application of WGM to understanding the epidemiology of hospital-associated K. pneumoniae. Utilizing a combination of WGM and WGS, we also present here the first longitudinal genomic characterization of the highly dynamic carbapenem-resistant ST383 K. pneumoniae clone that is rapidly gaining importance in Europe. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial

  10. Development of a large SNP genotyping array and generation of high-density genetic maps in tomato.

    Directory of Open Access Journals (Sweden)

    Sung-Chur Sim

    Full Text Available The concurrent development of high-throughput genotyping platforms and next generation sequencing (NGS has increased the number and density of genetic markers, the efficiency of constructing detailed linkage maps, and our ability to overlay recombination and physical maps of the genome. We developed an array for tomato with 8,784 Single Nucleotide Polymorphisms (SNPs mainly discovered based on NGS-derived transcriptome sequences. Of the SNPs, 7,720 (88% passed manufacturing quality control and could be scored in tomato germplasm. The array was used to generate high-density linkage maps for three interspecific F(2 populations: EXPEN 2000 (Solanum lycopersicum LA0925 x S. pennellii LA0716, 79 individuals, EXPEN 2012 (S. lycopersicum Moneymaker x S. pennellii LA0716, 160 individuals, and EXPIM 2012 (S. lycopersicum Moneymaker x S. pimpinellifolium LA0121, 183 individuals. The EXPEN 2000-SNP and EXPEN 2012 maps consisted of 3,503 and 3,687 markers representing 1,076 and 1,229 unique map positions (genetic bins, respectively. The EXPEN 2000-SNP map had an average marker bin interval of 1.6 cM, while the EXPEN 2012 map had an average bin interval of 0.9 cM. The EXPIM 2012 map was constructed with 4,491 markers (1,358 bins and an average bin interval of 0.8 cM. All three linkage maps revealed an uneven distribution of markers across the genome. The dense EXPEN 2012 and EXPIM 2012 maps showed high levels of colinearity across all 12 chromosomes, and also revealed evidence of small inversions between LA0716 and LA0121. Physical positions of 7,666 SNPs were identified relative to the tomato genome sequence. The genetic and physical positions were mostly consistent. Exceptions were observed for chromosomes 3, 10 and 12. Comparing genetic positions relative to physical positions revealed that genomic regions with high recombination rates were consistent with the known distribution of euchromatin across the 12 chromosomes, while very low recombination rates

  11. A Site Specific Model And Analysis Of The Neutral Somatic Mutation Rate In Whole-Genome Cancer Data

    DEFF Research Database (Denmark)

    Bertl, Johanna; Guo, Qianyun; Rasmussen, Malene Juul

    2017-01-01

    Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation ra...

  12. Whole Genome Sequencing of High-Risk Families to Identify New Mutational Mechanisms of Breast Cancer Predisposition

    Science.gov (United States)

    2015-12-01

    principal discipline(s) of the project? Our approach integrated whole genome sequencing with experimental biology and with application and development of...pathogenicity of genetic variants. Bioinformatics. 31:761-763. 13 Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day IN, Gaunt TR, Campbell C. (2015

  13. ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Ye, Jia; Li, Songgang

    2005-01-01

    We describe an algorithm, ReAS, to recover ancestral sequences for transposable elements (TEs) from the unassembled reads of a whole genome shotgun. The main assumptions are that these TEs must exist at high copy numbers across the genome and must not be so old that they are no longer recognizable...

  14. Whole-genome amplified DNA from stored dried blood spots is reliable in high resolution melting curve and sequencing analysis

    DEFF Research Database (Denmark)

    Winkel, Bo G; Hollegaard, Mads Vilhelm; Olesen, Morten S

    2011-01-01

    The use of dried blood spots (DBS) samples in genomic workup has been limited by the relative low amounts of genomic DNA (gDNA) they contain. It remains to be proven that whole genome amplified DNA (wgaDNA) from stored DBS samples, constitutes a reliable alternative to gDNA.We wanted to compare...

  15. Direct DNA Extraction from Mycobacterium tuberculosis Frozen Stocks as a Reculture-Independent Approach to Whole-Genome Sequencing

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Zallet, J; Lillebaek, T

    2015-01-01

    Culturing before DNA extraction represents a major time-consuming step in whole-genome sequencing of slow-growing bacteria, such as Mycobacterium tuberculosis. We report a workflow to extract DNA from frozen isolates without reculturing. Prepared libraries and sequence data were comparable...

  16. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their preci...

  17. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland.

    Science.gov (United States)

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten; Lefort, François

    2016-10-06

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture. Copyright © 2016 Crovadore et al.

  18. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus.

    Science.gov (United States)

    Lee, Yookyung; Lim, Sooyeon; Rhee, Moon-Soo; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-03-01

    Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000.

  19. Whole-genome sequence of Dermabacter vaginalis AD1-86T, isolated from vaginal fluid of Korean woman.

    Science.gov (United States)

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-12-01

    Dermabacter vaginalis AD1-86T was isolated from the vaginal fluid of a Korean woman. Whole genome sequencing analysis was conducted using a PacBio RS II platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_CP012117.

  20. A whole genome sequence of ‘Candidatus Liberibacter asiaticus’ from Guangdong, China, where HLB was first described

    Science.gov (United States)

    Citrus Huanglongbing (HLB, yellow shoot disease) has been endemic in Guangdong Province, China, for >100 years. “Candidatus Liberibacter asiaticus” (CLas) is a putative pathogen of HLB and currently unculturable. Here, a draft whole genome sequence of CLas strain A4 from Guangdong is presented. Stra...

  1. Whole-Genome Sequences of Two Campylobacter coli Isolates from the Antimicrobial Resistance Monitoring Program in Colombia.

    Science.gov (United States)

    Bernal, Johan F; Donado-Godoy, Pilar; Valencia, María Fernanda; León, Maribel; Gómez, Yolanda; Rodríguez, Fernando; Agarwala, Richa; Landsman, David; Mariño-Ramírez, Leonardo

    2016-03-17

    Campylobacter coli, along with Campylobacter jejuni, is a major agent of gastroenteritis and acute enterocolitis in humans. We report the whole-genome sequences of two multidrug-resistance C. coli strains, isolated from the Colombian poultry chain. The isolates contain a variety of antimicrobial resistance genes for aminoglycosides, lincosamides, fluoroquinolones, and tetracycline. Copyright © 2016 Bernal et al.

  2. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Thorup Nielsen, Mette

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely...

  3. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections.

    Directory of Open Access Journals (Sweden)

    Pimlapas Leekitcharoenphon

    Full Text Available Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections.

  4. Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

    NARCIS (Netherlands)

    Pandit, Aridaman; de Boer, Rob J|info:eu-repo/dai/nl/074214152

    2014-01-01

    BACKGROUND: Following transmission, HIV-1 evolves into a diverse population, and next generation sequencing enables us to detect variants occurring at low frequencies. Studying viral evolution at the level of whole genomes was hitherto not possible because next generation sequencing delivers

  5. Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation

    OpenAIRE

    Jackson, Brendan R.; Tarr, Cheryl; Strain, Errol; Jackson, Kelly A.; Conrad, Amanda; Carleton, Heather; Katz, Lee S.; Stroika, Steven; Gould, L. Hannah; Mody, Rajal K.; Silk, Benjamin J.; Beal, Jennifer; Chen, Yi; Timme, Ruth; Doyle, Matthew

    2016-01-01

    Implementation of whole-genome sequencing (WGS)–based surveillance for Listeria monocytogenes in 2013 greatly improved detection and investigation of listeriosis outbreaks in the United States. Lessons from this intervention can guide WGS-based surveillance for other foodborne pathogens.

  6. Development of novel InDel markers and genetic diversity in Chenopodium quinoa through whole-genome re-sequencing.

    Science.gov (United States)

    Zhang, Tifu; Gu, Minfeng; Liu, Yuhe; Lv, Yuanda; Zhou, Ling; Lu, Haiyan; Liang, Shuaiqiang; Bao, Huabin; Zhao, Han

    2017-09-05

    Quinoa (Chenopodium quinoa Willd.) is a balanced nutritional crop, but its breeding improvement has been limited by the lack of information on its genetics and genomics. Therefore, it is necessary to obtain knowledge on genomic variation, population structure, and genetic diversity and to develop novel Insertion/Deletion (InDel) markers for quinoa by whole-genome re-sequencing. We re-sequenced 11 quinoa accessions and obtained a coverage depth between approximately 7× to 23× the quinoa genome. Based on the 1453-megabase (Mb) assembly from the reference accession Riobamba, 8,441,022 filtered bi-allelic single nucleotide polymorphisms (SNPs) and 842,783 filtered InDels were identified, with an estimated SNP and InDel density of 5.81 and 0.58 per kilobase (kb). From the genomic InDel variations, 85 dimorphic InDel markers were newly developed and validated. Together with the 62 simple sequence repeat (SSR) markers reported, a total of 147 markers were used for genotyping the 129 quinoa accessions. Molecular grouping analysis showed classification into two major groups, the Andean highland (composed of the northern and southern highland subgroups) and Chilean coastal, based on combined STRUCTURE, phylogenetic tree and PCA (Principle Component Analysis) analyses. Further analysis of the genetic diversity exhibited a decreasing tendency from the Chilean coast group to the Andean highland group, and the gene flow between subgroups was more frequent than that between the two subgroups and the Chilean coastal group. The majority of the variations (approximately 70%) were found through an analysis of molecular variation (AMOVA) due to the diversity between the groups. This was congruent with the observation of a highly significant FST value (0.705) between the groups, demonstrating significant genetic differentiation between the Andean highland type of quinoa and the Chilean coastal type. Moreover, a core set of 16 quinoa germplasms that capture all 362 alleles was

  7. Whole genome sequencing-based characterization of extensively drug resistant (XDR) strains of Mycobacterium tuberculosis from Pakistan

    KAUST Repository

    Hasan, Zahra

    2015-03-01

    Objectives: The global increase in drug resistance in Mycobacterium tuberculosis (MTB) strains increases the focus on improved molecular diagnostics for MTB. Extensively drug-resistant (XDR) - TB is caused by MTB strains resistant to rifampicin, isoniazid, fluoroquinolone and aminoglycoside antibiotics. Resistance to anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs), in particular MTB genes. However, there is regional variation between MTB lineages and the SNPs associated with resistance. Therefore, there is a need to identify common resistance conferring SNPs so that effective molecular-based diagnostic tests for MTB can be developed. This study investigated used whole genome sequencing (WGS) to characterize 37 XDR MTB isolates from Pakistan and investigated SNPs related to drug resistance. Methods: XDR-TB strains were selected. DNA was extracted from MTB strains, and samples underwent WGS with 76-base-paired end fragment sizes using Illumina paired end HiSeq2000 technology. Raw sequence data were mapped uniquely to H37Rv reference genome. The mappings allowed SNPs and small indels to be called using SAMtools/BCFtools. Results: This study found that in all XDR strains, rifampicin resistance was attributable to SNPs in the rpoB RDR region. Isoniazid resistance-associated mutations were primarily related to katG codon 315 followed by inhA S94A. Fluoroquinolone resistance was attributable to gyrA 91-94 codons in most strains, while one did not have SNPs in either gyrA or gyrB. Aminoglycoside resistance was mostly associated with SNPs in rrs, except in 6 strains. Ethambutol resistant strains had embB codon 306 mutations, but many strains did not have this present. The SNPs were compared with those present in commercial assays such as LiPA Hain MDRTBsl, and the sensitivity of the assays for these strains was evaluated. Conclusions: If common drug resistance associated with SNPs evaluated the concordance between phenotypic and

  8. Whole genome sequencing reveals complex evolution patterns of multidrug-resistant Mycobacterium tuberculosis Beijing strains in patients.

    Directory of Open Access Journals (Sweden)

    Matthias Merker

    Full Text Available Multidrug-resistant (MDR Mycobacterium tuberculosis complex (MTBC strains represent a major threat for tuberculosis (TB control. Treatment of MDR-TB patients is long and less effective, resulting in a significant number of treatment failures. The development of further resistances leads to extensively drug-resistant (XDR variants. However, data on the individual reasons for treatment failure, e.g. an induced mutational burst, and on the evolution of bacteria in the patient are only sparsely available. To address this question, we investigated the intra-patient evolution of serial MTBC isolates obtained from three MDR-TB patients undergoing longitudinal treatment, finally leading to XDR-TB. Sequential isolates displayed identical IS6110 fingerprint patterns, suggesting the absence of exogenous re-infection. We utilized whole genome sequencing (WGS to screen for variations in three isolates from Patient A and four isolates from Patient B and C, respectively. Acquired polymorphisms were subsequently validated in up to 15 serial isolates by Sanger sequencing. We determined eight (Patient A and nine (Patient B polymorphisms, which occurred in a stepwise manner during the course of the therapy and were linked to resistance or a potential compensatory mechanism. For both patients, our analysis revealed the long-term co-existence of clonal subpopulations that displayed different drug resistance allele combinations. Out of these, the most resistant clone was fixed in the population. In contrast, baseline and follow-up isolates of Patient C were distinguished each by eleven unique polymorphisms, indicating an exogenous re-infection with an XDR strain not detected by IS6110 RFLP typing. Our study demonstrates that intra-patient microevolution of MDR-MTBC strains under longitudinal treatment is more complex than previously anticipated. However, a mutator phenotype was not detected. The presence of different subpopulations might confound phenotypic and

  9. Reconstructing Native American migrations from whole-genome and whole-exome data.

    Directory of Open Access Journals (Sweden)

    Simon Gravel

    Full Text Available There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM, Mexican-American (MXL, and Puerto Rican (PUR populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern American ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya, supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent (IBD and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find

  10. Evaluation of an Array-Based Method for Human Papillomavirus Detection and Genotyping in Comparison with Conventional Methods Used in Cervical Cancer Screening▿

    Science.gov (United States)

    García-Sierra, Nerea; Martró, Elisa; Castellà, Eva; Llatjós, Mariona; Tarrats, Antoni; Bascuñana, Elisabet; Díaz, Rosana; Carrasco, María; Sirera, Guillem; Matas, Lurdes; Ausina, Vicente

    2009-01-01

    Cervical cancer is the second-most prevalent cancer in young women around the world. Infection with human papillomavirus (HPV), especially high-risk HPV types (HR-HPV), is necessary for the development of this cancer. HPV-DNA detection is increasingly being used in cervical cancer screening programs, together with the Papanicolau smear test. We evaluated the usefulness of introducing this new array-based HPV genotyping method (i.e., Clinical Arrays Papillomavirus Humano) in the cervical cancer screening algorithm in our center. The results obtained using this method were compared to those obtained by the hybrid capture II high-risk HPV DNA test (HC-II) and Papanicolau in a selected group of 408 women. The array-based assay was performed in women that were HC-II positive or presented cytological alterations. Among 246 array-positive patients, 123 (50%) presented infection with ≥2 types, and HR-HPV types were detected in 206 (83.7%), mainly HPV-16 (24.0%). Up to 132 (33.2%) specimens were classified as ASCUS (for atypical squamous cells of undetermined significance), and only 48 (36.4%) of them were HPV-DNA positive by either assay; however, 78.7% of these cases were caused by HR-HPV types. The agreement between both HPV-DNA detection techniques was fairly good (n = 367). Screening with Papanicolau smear and HC-II tests, followed by HPV detection and genotyping, provided an optimal identification of women at risk for the development of cervical cancer. Furthermore, with the identification of specific genotypes, either in single or multiple infections, a better prediction of disease progression was achieved. The array method also made allowed us to determine the possible contribution of the available vaccines in our setting. PMID:19439534

  11. Evaluation of an array-based method for human papillomavirus detection and genotyping in comparison with conventional methods used in cervical cancer screening.

    Science.gov (United States)

    García-Sierra, Nerea; Martró, Elisa; Castellà, Eva; Llatjós, Mariona; Tarrats, Antoni; Bascuñana, Elisabet; Díaz, Rosana; Carrasco, María; Sirera, Guillem; Matas, Lurdes; Ausina, Vicente

    2009-07-01

    Cervical cancer is the second-most prevalent cancer in young women around the world. Infection with human papillomavirus (HPV), especially high-risk HPV types (HR-HPV), is necessary for the development of this cancer. HPV-DNA detection is increasingly being used in cervical cancer screening programs, together with the Papanicolau smear test. We evaluated the usefulness of introducing this new array-based HPV genotyping method (i.e., Clinical Arrays Papillomavirus Humano) in the cervical cancer screening algorithm in our center. The results obtained using this method were compared to those obtained by the hybrid capture II high-risk HPV DNA test (HC-II) and Papanicolau in a selected group of 408 women. The array-based assay was performed in women that were HC-II positive or presented cytological alterations. Among 246 array-positive patients, 123 (50%) presented infection with >or=2 types, and HR-HPV types were detected in 206 (83.7%), mainly HPV-16 (24.0%). Up to 132 (33.2%) specimens were classified as ASCUS (for atypical squamous cells of undetermined significance), and only 48 (36.4%) of them were HPV-DNA positive by either assay; however, 78.7% of these cases were caused by HR-HPV types. The agreement between both HPV-DNA detection techniques was fairly good (n = 367). Screening with Papanicolau smear and HC-II tests, followed by HPV detection and genotyping, provided an optimal identification of women at risk for the development of cervical cancer. Furthermore, with the identification of specific genotypes, either in single or multiple infections, a better prediction of disease progression was achieved. The array method also made allowed us to determine the possible contribution of the available vaccines in our setting.

  12. Effective normalization for copy number variation detection from whole genome sequencing.

    Science.gov (United States)

    Janevski, Angel; Varadan, Vinay; Kamalakaran, Sitharthan; Banerjee, Nilanjana; Dimitrova, Nevenka

    2012-01-01

    Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls

  13. Whole-genome gene expression profiling of formalin-fixed, paraffin-embedded tissue samples.

    Directory of Open Access Journals (Sweden)

    Craig April

    2009-12-01

    Full Text Available We have developed a gene expression assay (Whole-Genome DASL, capable of generating whole-genome gene expression profiles from degraded samples such as formalin-fixed, paraffin-embedded (FFPE specimens.We demonstrated a similar level of sensitivity in gene detection between matched fresh-frozen (FF and FFPE samples, with the number and overlap of probes detected in the FFPE samples being approximately 88% and 95% of that in the corresponding FF samples, respectively; 74% of the differentially expressed probes overlapped between the FF and FFPE pairs. The WG-DASL assay is also able to detect 1.3-1.5 and 1.5-2 -fold changes in intact and FFPE samples, respectively. The dynamic range for the assay is approximately 3 logs. Comparing the WG-DASL assay with an in vitro transcription-based labeling method yielded fold-change correlations of R(2 approximately 0.83, while fold-change comparisons with quantitative RT-PCR assays yielded R(2 approximately 0.86 and R(2 approximately 0.55 for intact and FFPE samples, respectively. Additionally, the WG-DASL assay yielded high self-correlations (R(2>0.98 with low intact RNA inputs ranging from 1 ng to 100 ng; reproducible expression profiles were also obtained with 250 pg total RNA (R(2 approximately 0.92, with approximately 71% of the probes detected in 100 ng total RNA also detected at the 250 pg level. When FFPE samples were assayed, 1 ng total RNA yielded self-correlations of R(2 approximately 0.80, while still maintaining a correlation of R(2 approximately 0.75 with standard FFPE inputs (200 ng.Taken together, these results show that WG-DASL assay provides a reliable platform for genome-wide expression profiling in archived materials. It also possesses utility within clinical settings where only limited quantities of samples may be available (e.g. microdissected material or when minimally invasive procedures are performed (e.g. biopsied specimens.

  14. Whole genome identification of Mycobacterium tuberculosis vaccine candidates by comprehensive data mining and bioinformatic analyses

    Directory of Open Access Journals (Sweden)

    Sadoff Jerald C

    2008-05-01

    Full Text Available Abstract Background Mycobacterium tuberculosis, the causative agent of tuberculosis (TB, infects ~8 million annually culminating in ~2 million deaths. Moreover, about one third of the population is latently infected, 10% of which develop disease during lifetime. Current approved prophylactic TB vaccines (BCG and derivatives thereof are of variable efficiency in adult protection against pulmonary TB (0%–80%, and directed essentially against early phase infection. Methods A genome-scale dataset was constructed by analyzing published data of: (1 global gene expression studies under conditions which simulate intra-macrophage stress, dormancy, persistence and/or reactivation; (2 cellular and humoral immunity, and vaccine potential. This information was compiled along with revised annotation/bioinformatic characterization of selected gene products and in silico mapping of T-cell epitopes. Protocols for scoring, ranking and prioritization of the antigens were developed and applied. Results Cross-matching of literature and in silico-derived data, in conjunction with the prioritization scheme and biological rationale, allowed for selection of 189 putative vaccine candidates from the entire genome. Within the 189 set, the relative distribution of antigens in 3 functional categories differs significantly from their distribution in the whole genome, with reduction in the Conserved hypothetical category (due to improved annotation and enrichment in Lipid and in Virulence categories. Other prominent representatives in the 189 set are the PE/PPE proteins; iron sequestration, nitroreductases and proteases, all within the Intermediary metabolism and respiration category; ESX secretion systems, resuscitation promoting factors and lipoproteins, all within the Cell wall category. Application of a ranking scheme based on qualitative and quantitative scores, resulted in a list of 45 best-scoring antigens, of which: 74% belong to the dormancy

  15. Whole-genome sequencing reveals the mechanisms for evolution of streptomycin resistance in Lactobacillus plantarum.

    Science.gov (United States)

    Zhang, Fuxin; Gao, Jiayuan; Wang, Bini; Huo, Dongxue; Wang, Zhaoxia; Zhang, Jiachao; Shao, Yuyu

    2018-01-31

    In this research, we investigated the evolution of streptomycin resistance in Lactobacillus plantarum ATCC14917, which was passaged in medium containing a gradually increasing concentration of streptomycin. After 25 d, the minimum inhibitory concentration (MIC) of L. plantarum ATCC14917 had reached 131,072 µg/mL, which was 8,192-fold higher than the MIC of the original parent isolate. The highly resistant L. plantarum ATCC14917 isolate was then passaged in antibiotic-free medium to determine the stability of resistance. The MIC value of the L. plantarum ATCC14917 isolate decreased to 2,048 µg/mL after 35 d but remained constant thereafter, indicating that resistance was irreversible even in the absence of selection pressure. Whole-genome sequencing of parent isolates, control isolates, and isolates following passage was used to study the resistance mechanism of L. plantarum ATCC14917 to streptomycin and adaptation in the presence and absence of selection pressure. Five mutated genes (single nucleotide polymorphisms and structural variants) were verified in highly resistant L. plantarum ATCC14917 isolates, which were related to ribosomal protein S12, LPXTG-motif cell wall anchor domain protein, LrgA family protein, Ser/Thr phosphatase family protein, and a hypothetical protein that may correlate with resistance to streptomycin. After passage in streptomycin-free medium, only the mutant gene encoding ribosomal protein S12 remained; the other 4 mutant genes had reverted to the wild type as found in the parent isolate. Although the MIC value of L. plantarum ATCC14917 was reduced in the absence of selection pressure, it remained 128-fold higher than the MIC value of the parent isolate, indicating that ribosomal protein S12 may play an important role in streptomycin resistance. Using the mobile elements database, we demonstrated that streptomycin resistance-related genes in L. plantarum ATCC14917 were not located on mobile elements. This research offers a way of

  16. Whole-genome duplication and the functional diversification of teleost fish hemoglobins.

    Science.gov (United States)

    Opazo, Juan C; Butts, G Tyler; Nery, Mariana F; Storz, Jay F; Hoffmann, Federico G

    2013-01-01

    Subsequent to the two rounds of whole-genome duplication that occurred in the common ancestor of vertebrates, a third genome duplication occurred in the stem lineage of teleost fishes. This teleost-specific genome duplication (TGD) is thought to have provided genetic raw materials for the physiological, morphological, and behavioral diversification of this highly speciose group. The extreme physiological versatility of teleost fish is manifest in their diversity of blood-gas transport traits, which reflects the myriad solutions that have evolved to maintain tissue O(2) delivery in the face of changing metabolic demands and environmental O(2) availability during different ontogenetic stages. During the course of development, regulatory changes in blood-O(2) transport are mediated by the expression of multiple, functionally distinct hemoglobin (Hb) isoforms that meet the particular O(2)-transport challenges encountered by the developing embryo or fetus (in viviparous or oviparous species) and in free-swimming larvae and adults. The main objective of the present study was to assess the relative contributions of whole-genome duplication, large-scale segmental duplication, and small-scale gene duplication in producing the extraordinary functional diversity of teleost Hbs. To accomplish this, we integrated phylogenetic reconstructions with analyses of conserved synteny to characterize the genomic organization and evolutionary history of the globin gene clusters of teleosts. These results were then integrated with available experimental data on functional properties and developmental patterns of stage-specific gene expression. Our results indicate that multiple α- and β-globin genes were present in the common ancestor of gars (order Lepisoteiformes) and teleosts. The comparative genomic analysis revealed that teleosts possess a dual set of TGD-derived globin gene clusters, each of which has undergone lineage-specific changes in gene content via repeated duplication and

  17. Rapid whole-genome sequencing for detection and characterization of microorganisms directly from clinical samples.

    Science.gov (United States)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Ponten, Thomas; Lund, Ole; Svendsen, Christina Aaby; Frimodt-Møller, Niels; Aarestrup, Frank M

    2014-01-01

    Whole-genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples, this could further reduce diagnostic times and thereby improve control and treatment. A major bottleneck is the availability of fast and reliable bioinformatic tools. This study was conducted to evaluate the applicability of WGS directly on clinical samples and to develop easy-to-use bioinformatic tools for the analysis of sequencing data. Thirty-five random urine samples from patients with suspected urinary tract infections were examined using conventional microbiology, WGS of isolated bacteria, and direct sequencing on pellets from the urine samples. A rapid method for analyzing the sequence data was developed. Bacteria were cultivated from 19 samples but in pure cultures from only 17 samples. WGS improved the identification of the cultivated bacteria, and almost complete agreement was observed between phenotypic and predicted antimicrobial susceptibilities. Complete agreement was observed between species identification, multilocus seq