WorldWideScience

Sample records for whole-genome genotyping arrays

  1. Whole-Genome Sequencing and iPLEX MassARRAY Genotyping Map an EMS-Induced Mutation Affecting Cell Competition in Drosophila melanogaster.

    Science.gov (United States)

    Lee, Chang-Hyun; Rimesso, Gerard; Reynolds, David M; Cai, Jinlu; Baker, Nicholas E

    2016-10-13

    Cell competition, the conditional loss of viable genotypes only when surrounded by other cells, is a phenomenon observed in certain genetic mosaic conditions. We conducted a chemical mutagenesis and screen to recover new mutations that affect cell competition between wild-type and RpS3 heterozygous cells. Mutations were identified by whole-genome sequencing, making use of software tools that greatly facilitate the distinction between newly induced mutations and other sources of apparent sequence polymorphism, thereby reducing false-positive and false-negative identification rates. In addition, we utilized iPLEX MassARRAY for genotyping recombinant chromosomes. These approaches permitted the mapping of a new mutation affecting cell competition when only a single allele existed, with a phenotype assessed only in genetic mosaics, without the benefit of complementation with existing mutations, deletions, or duplications. These techniques expand the utility of chemical mutagenesis and whole-genome sequencing for mutant identification. We discuss mutations in the Atm and Xrp1 genes identified in this screen. Copyright © 2016 Lee et al.

  2. Whole-Genome Sequencing and iPLEX MassARRAY Genotyping Map an EMS-Induced Mutation Affecting Cell Competition in Drosophila melanogaster

    Directory of Open Access Journals (Sweden)

    Chang-Hyun Lee

    2016-10-01

    Full Text Available Cell competition, the conditional loss of viable genotypes only when surrounded by other cells, is a phenomenon observed in certain genetic mosaic conditions. We conducted a chemical mutagenesis and screen to recover new mutations that affect cell competition between wild-type and RpS3 heterozygous cells. Mutations were identified by whole-genome sequencing, making use of software tools that greatly facilitate the distinction between newly induced mutations and other sources of apparent sequence polymorphism, thereby reducing false-positive and false-negative identification rates. In addition, we utilized iPLEX MassARRAY for genotyping recombinant chromosomes. These approaches permitted the mapping of a new mutation affecting cell competition when only a single allele existed, with a phenotype assessed only in genetic mosaics, without the benefit of complementation with existing mutations, deletions, or duplications. These techniques expand the utility of chemical mutagenesis and whole-genome sequencing for mutant identification. We discuss mutations in the Atm and Xrp1 genes identified in this screen.

  3. Development of a dense SNP-based linkage map of an apple rootstock progeny using the Malus Infinium whole genome genotyping array.

    Science.gov (United States)

    Antanaviciute, Laima; Fernández-Fernández, Felicidad; Jansen, Johannes; Banchi, Elisa; Evans, Katherine M; Viola, Roberto; Velasco, Riccardo; Dunwell, Jim M; Troggio, Michela; Sargent, Daniel J

    2012-05-25

    A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Of the 7,867 Malus SNP markers on the array, 1,823 (23.2%) were heterozygous in one of the two parents of the progeny, 1,007 (12.8%) were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the 'Golden Delicious' genome sequence. A total of 311 markers (13.7% of all mapped markers) mapped to positions that conflicted with their predicted positions on the 'Golden Delicious' pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been

  4. Development and validation of a 20K single nucleotide polymorphism (SNP) whole genome genotyping array for apple (Malus × domestica Borkh).

    Science.gov (United States)

    Bianco, Luca; Cestaro, Alessandro; Sargent, Daniel James; Banchi, Elisa; Derdak, Sophia; Di Guardo, Mario; Salvi, Silvio; Jansen, Johannes; Viola, Roberto; Gut, Ivo; Laurens, Francois; Chagné, David; Velasco, Riccardo; van de Weg, Eric; Troggio, Michela

    2014-01-01

    High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus). A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs). Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs.

  5. Development and validation of a 20K single nucleotide polymorphism (SNP whole genome genotyping array for apple (Malus × domestica Borkh.

    Directory of Open Access Journals (Sweden)

    Luca Bianco

    Full Text Available High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus. A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs. Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs.

  6. Development and Validation of a 20K Single Nucleotide Polymorphism (SNP) Whole Genome Genotyping Array for Apple (Malus × domestica Borkh)

    Science.gov (United States)

    Bianco, Luca; Cestaro, Alessandro; Sargent, Daniel James; Banchi, Elisa; Derdak, Sophia; Di Guardo, Mario; Salvi, Silvio; Jansen, Johannes; Viola, Roberto; Gut, Ivo; Laurens, Francois; Chagné, David; Velasco, Riccardo; van de Weg, Eric; Troggio, Michela

    2014-01-01

    High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus). A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs). Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs. PMID:25303088

  7. Genotype call for chromosomal deletions using read-depth from whole genome sequence variants in cattle

    DEFF Research Database (Denmark)

    Mesbah-Uddin, Md; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2018-01-01

    We presented a deletion genotyping (copy-number estimation) method that leverages population-scale whole genome sequence variants data from 1K bull genomes project (1KBGP) to build reference panel for imputation. To estimate deletion-genotype likelihood, we extracted read-depth (RD) data of all...

  8. SV2: accurate structural variation genotyping and de novo mutation detection from whole genomes.

    Science.gov (United States)

    Antaki, Danny; Brandler, William M; Sebat, Jonathan

    2018-05-15

    Structural variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease. Here, we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations. SV2 is freely available on GitHub (https://github.com/dantaki/SV2). jsebat@ucsd.edu. Supplementary data are available at Bioinformatics online.

  9. Whole genome DNA copy number changes identified by high density oligonucleotide arrays

    Directory of Open Access Journals (Sweden)

    Huang Jing

    2004-05-01

    Full Text Available Abstract Changes in DNA copy number are one of the hallmarks of the genetic instability common to most human cancers. Previous micro-array-based methods have been used to identify chromosomal gains and losses; however, they are unable to genotype alleles at the level of single nucleotide polymorphisms (SNPs. Here we describe a novel algorithm that uses a recently developed high-density oligonucleotide array-based SNP genotyping method, whole genome sampling analysis (WGSA, to identify genome-wide chromosomal gains and losses at high resolution. WGSA simultaneously genotypes over 10,000 SNPs by allele-specific hybridisation to perfect match (PM and mismatch (MM probes synthesised on a single array. The copy number algorithm jointly uses PM intensity and discrimination ratios between paired PM and MM intensity values to identify and estimate genetic copy number changes. Values from an experimental sample are compared with SNP-specific distributions derived from a reference set containing over 100 normal individuals to gain statistical power. Genomic regions with statistically significant copy number changes can be identified using both single point analysis and contiguous point analysis of SNP intensities. We identified multiple regions of amplification and deletion using a panel of human breast cancer cell lines. We verified these results using an independent method based on quantitative polymerase chain reaction and found that our approach is both sensitive and specific and can tolerate samples which contain a mixture of both tumour and normal DNA. In addition, by using known allele frequencies from the reference set, statistically significant genomic intervals can be identified containing contiguous stretches of homozygous markers, potentially allowing the detection of regions undergoing loss of heterozygosity (LOH without the need for a matched normal control sample. The coupling of LOH analysis, via SNP genotyping, with copy number

  10. Refining QTL with high-density SNP genotyping and whole genome sequence in three cattle breeds

    DEFF Research Database (Denmark)

    Sahana, Goutam; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2012-01-01

    Genome-wide association study was carried out in Nordic Holsteins, Nordic Red and Jersey breeds for functional traits using BovineHD Genotyping BreadChip (Illumina, San Diego, CA). The association analyses were carried out using both linear mixed model approach and a Bayesian variable selection...... method. Principal components were used to account for population structure. The QTL segregating in all three breeds were selected and a few of the most significant ones were followed in further analyses. The polymorphisms in the identified QTL regions were imputed using 90 whole genome sequences...

  11. Effects of DNA mass on multiple displacement whole genome amplification and genotyping performance

    Directory of Open Access Journals (Sweden)

    Haque Kashif A

    2005-09-01

    Full Text Available Abstract Background Whole genome amplification (WGA promises to eliminate practical molecular genetic analysis limitations associated with genomic DNA (gDNA quantity. We evaluated the performance of multiple displacement amplification (MDA WGA using gDNA extracted from lymphoblastoid cell lines (N = 27 with a range of starting gDNA input of 1–200 ng into the WGA reaction. Yield and composition analysis of whole genome amplified DNA (wgaDNA was performed using three DNA quantification methods (OD, PicoGreen® and RT-PCR. Two panels of N = 15 STR (using the AmpFlSTR® Identifiler® panel and N = 49 SNP (TaqMan® genotyping assays were performed on each gDNA and wgaDNA sample in duplicate. gDNA and wgaDNA masses of 1, 4 and 20 ng were used in the SNP assays to evaluate the effects of DNA mass on SNP genotyping assay performance. A total of N = 6,880 STR and N = 56,448 SNP genotype attempts provided adequate power to detect differences in STR and SNP genotyping performance between gDNA and wgaDNA, and among wgaDNA produced from a range of gDNA templates inputs. Results The proportion of double-stranded wgaDNA and human-specific PCR amplifiable wgaDNA increased with increased gDNA input into the WGA reaction. Increased amounts of gDNA input into the WGA reaction improved wgaDNA genotyping performance. Genotype completion or genotype concordance rates of wgaDNA produced from all gDNA input levels were observed to be reduced compared to gDNA, although the reduction was not always statistically significant. Reduced wgaDNA genotyping performance was primarily due to the increased variance of allelic amplification, resulting in loss of heterozygosity or increased undetermined genotypes. MDA WGA produces wgaDNA from no template control samples; such samples exhibited substantial false-positive genotyping rates. Conclusion The amount of gDNA input into the MDA WGA reaction is a critical determinant of genotyping performance of wgaDNA. At least 10 ng of

  12. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  13. Whole genome amplification and microsatellite genotyping of herbarium DNA revealed the identity of an ancient grapevine cultivar

    Science.gov (United States)

    Malenica, Nenad; Šimon, Silvio; Besendorfer, Višnja; Maletić, Edi; Karoglan Kontić, Jasminka; Pejić, Ivan

    2011-09-01

    Reconstruction of the grapevine cultivation history has advanced tremendously during the last decade. Identification of grapevine cultivars by using microsatellite DNA markers has mostly become a routine. The parentage of several renowned grapevine cultivars, like Cabernet Sauvignon and Chardonnay, has been elucidated. However, the assembly of a complete grapevine genealogy is not yet possible because missing links might no longer be in cultivation or are even extinct. This problem could be overcome by analyzing ancient DNA from grapevine herbarium specimens and other historical remnants of once cultivated varieties. Here, we present the first successful genotyping of a grapevine herbarium specimen and the identification of the corresponding grapevine cultivar. Using a set of nine grapevine microsatellite markers, in combination with a whole genome amplification procedure, we found the 90-year-old Tribidrag herbarium specimen to display the same microsatellite profile as the popular American cultivar Zinfandel. This work, together with information from several historical documents, provides a new clue of Zinfandel cultivation in Croatia as early as the beginning of fifteenth century, under the native name Tribidrag. Moreover, it emphasizes substantial information potential of existing grapevine and other herbarium collections worldwide.

  14. Whole genome analysis of porcine astroviruses detected in Japanese pigs reveals genetic diversity and possible intra-genotypic recombination.

    Science.gov (United States)

    Ito, Mika; Kuroda, Moegi; Masuda, Tsuneyuki; Akagami, Masataka; Haga, Kei; Tsuchiaka, Shinobu; Kishimoto, Mai; Naoi, Yuki; Sano, Kaori; Omatsu, Tsutomu; Katayama, Yukie; Oba, Mami; Aoki, Hiroshi; Ichimaru, Toru; Mukono, Itsuro; Ouchi, Yoshinao; Yamasato, Hiroshi; Shirai, Junsuke; Katayama, Kazuhiko; Mizutani, Tetsuya; Nagai, Makoto

    2017-06-01

    Porcine astroviruses (PoAstVs) are ubiquitous enteric virus of pigs that are distributed in several countries throughout the world. Since PoAstVs are detected in apparent healthy pigs, the clinical significance of infection is unknown. However, AstVs have recently been associated with a severe neurological disorder in animals, including humans, and zoonotic potential has been suggested. To date, little is known about the epidemiology of PoAstVs among the pig population in Japan. In this report, we present an analysis of nearly complete genomes of 36 PoAstVs detected by a metagenomics approach in the feces of Japanese pigs. Based on a phylogenetic analysis and pairwise sequence comparison, 10, 5, 15, and 6 sequences were classified as PoAstV2, PoAstV3, PoAstV4, and PoAstV5, respectively. Co-infection with two or three strains was found in individual fecal samples from eight pigs. The phylogenetic trees of ORF1a, ORF1b, and ORF2 of PoAstV2 and PoAstV4 showed differences in their topologies. The PoAstV3 and PoAstV5 strains shared high sequence identities within each genotype in all ORFs; however, one PoAstV3 strain and one PoAstV5 strain showed considerable sequence divergence from the other PoAstV3 and PoAstV5 strains, respectively, in ORF2. Recombination analysis using whole genomes revealed evidence of multiple possible intra-genotype recombination events in PoAstV2 and PoAstV4, suggesting that recombination might have contributed to the genetic diversity and played an important role in the evolution of Japanese PoAstVs. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Whole-genome characterization in pedigreed non-human primates using Genotyping-By-Sequencing and imputation.

    OpenAIRE

    Cervera-Juanes, Rita; Vinson, Amanda; Ferguson, Betsy; Carbone, Lucia; Spindel, Eliot; Mccouch, Susan; Spindel, Jennifer; Nevonen, Kimberly; Letaw, John; Raboin, Michael; Bimber, Ben

    2016-01-01

    Background: Rhesus macaques are widely used in biomedical research, but the application of genomic information in this species to better understand human disease is still undeveloped. Whole-genome sequence (WGS) data in pedigreed macaque colonies could provide substantial experimental power, but the collection of WGS data in large cohorts remains a formidable expense. Here, we describe a cost-effective approach that selects the most informative macaques in a pedigree for whole-genome sequenci...

  16. Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study.

    Directory of Open Access Journals (Sweden)

    Andreas Roetzer

    Full Text Available BACKGROUND: Understanding Mycobacterium tuberculosis (Mtb transmission is essential to guide efficient tuberculosis control strategies. Traditional strain typing lacks sufficient discriminatory power to resolve large outbreaks. Here, we tested the potential of using next generation genome sequencing for identification of outbreak-related transmission chains. METHODS AND FINDINGS: During long-term (1997 to 2010 prospective population-based molecular epidemiological surveillance comprising a total of 2,301 patients, we identified a large outbreak caused by an Mtb strain of the Haarlem lineage. The main performance outcome measure of whole genome sequencing (WGS analyses was the degree of correlation of the WGS analyses with contact tracing data and the spatio-temporal distribution of the outbreak cases. WGS analyses of the 86 isolates revealed 85 single nucleotide polymorphisms (SNPs, subdividing the outbreak into seven genome clusters (two to 24 isolates each, plus 36 unique SNP profiles. WGS results showed that the first outbreak isolates detected in 1997 were falsely clustered by classical genotyping. In 1998, one clone (termed "Hamburg clone" started expanding, apparently independently from differences in the social environment of early cases. Genome-based clustering patterns were in better accordance with contact tracing data and the geographical distribution of the cases than clustering patterns based on classical genotyping. A maximum of three SNPs were identified in eight confirmed human-to-human transmission chains, involving 31 patients. We estimated the Mtb genome evolutionary rate at 0.4 mutations per genome per year. This rate suggests that Mtb grows in its natural host with a doubling time of approximately 22 h (400 generations per year. Based on the genome variation discovered, emergence of the Hamburg clone was dated back to a period between 1993 and 1997, hence shortly before the discovery of the outbreak through epidemiological

  17. Copy number and loss of heterozygosity detected by SNP array of formalin-fixed tissues using whole-genome amplification.

    Directory of Open Access Journals (Sweden)

    Angela Stokes

    Full Text Available The requirement for large amounts of good quality DNA for whole-genome applications prohibits their use for small, laser capture micro-dissected (LCM, and/or rare clinical samples, which are also often formalin-fixed and paraffin-embedded (FFPE. Whole-genome amplification of DNA from these samples could, potentially, overcome these limitations. However, little is known about the artefacts introduced by amplification of FFPE-derived DNA with regard to genotyping, and subsequent copy number and loss of heterozygosity (LOH analyses. Using a ligation adaptor amplification method, we present data from a total of 22 Affymetrix SNP 6.0 experiments, using matched paired amplified and non-amplified DNA from 10 LCM FFPE normal and dysplastic oral epithelial tissues, and an internal method control. An average of 76.5% of SNPs were called in both matched amplified and non-amplified DNA samples, and concordance was a promising 82.4%. Paired analysis for copy number, LOH, and both combined, showed that copy number changes were reduced in amplified DNA, but were 99.5% concordant when detected, amplifications were the changes most likely to be 'missed', only 30% of non-amplified LOH changes were identified in amplified pairs, and when copy number and LOH are combined ∼50% of gene changes detected in the unamplified DNA were also detected in the amplified DNA and within these changes, 86.5% were concordant for both copy number and LOH status. However, there are also changes introduced as ∼20% of changes in the amplified DNA are not detected in the non-amplified DNA. An integrative network biology approach revealed that changes in amplified DNA of dysplastic oral epithelium localize to topologically critical regions of the human protein-protein interaction network, suggesting their functional implication in the pathobiology of this disease. Taken together, our results support the use of amplification of FFPE-derived DNA, provided sufficient samples are used

  18. Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array

    Directory of Open Access Journals (Sweden)

    Sugnet Charles

    2006-12-01

    Full Text Available Abstract Background Alternative splicing is a mechanism for increasing protein diversity by excluding or including exons during post-transcriptional processing. Alternatively spliced proteins are particularly relevant in oncology since they may contribute to the etiology of cancer, provide selective drug targets, or serve as a marker set for cancer diagnosis. While conventional identification of splice variants generally targets individual genes, we present here a new exon-centric array (GeneChip Human Exon 1.0 ST that allows genome-wide identification of differential splice variation, and concurrently provides a flexible and inclusive analysis of gene expression. Results We analyzed 20 paired tumor-normal colon cancer samples using a microarray designed to detect over one million putative exons that can be virtually assembled into potential gene-level transcripts according to various levels of prior supporting evidence. Analysis of high confidence (empirically supported transcripts identified 160 differentially expressed genes, with 42 genes occupying a network impacting cell proliferation and another twenty nine genes with unknown functions. A more speculative analysis, including transcripts based solely on computational prediction, produced another 160 differentially expressed genes, three-fourths of which have no previous annotation. We also present a comparison of gene signal estimations from the Exon 1.0 ST and the U133 Plus 2.0 arrays. Novel splicing events were predicted by experimental algorithms that compare the relative contribution of each exon to the cognate transcript intensity in each tissue. The resulting candidate splice variants were validated with RT-PCR. We found nine genes that were differentially spliced between colon tumors and normal colon tissues, several of which have not been previously implicated in cancer. Top scoring candidates from our analysis were also found to substantially overlap with EST-based bioinformatic

  19. Whole-genome analysis of human papillomavirus genotypes 52 and 58 isolated from Japanese women with cervical intraepithelial neoplasia and invasive cervical cancer.

    Science.gov (United States)

    Tenjimbayashi, Yuri; Onuki, Mamiko; Hirose, Yusuke; Mori, Seiichiro; Ishii, Yoshiyuki; Takeuchi, Takamasa; Tasaka, Nobutaka; Satoh, Toyomi; Morisada, Tohru; Iwata, Takashi; Miyamoto, Shingo; Matsumoto, Koji; Sekizawa, Akihiko; Kukimoto, Iwao

    2017-01-01

    Human papillomavirus genotypes 52 and 58 (HPV52/58) are frequently detected in patients with cervical intraepithelial neoplasia (CIN) and invasive cervical cancer (ICC) in East Asian countries including Japan. As with other HPV genotypes, HPV52/58 consist of multiple lineages of genetic variants harboring less than 10% differences between complete genome sequences of the same HPV genotype. However, site variations of nucleotide and amino acid sequences across the viral whole-genome have not been fully examined for HPV52/58. The aim of this study was to investigate genetic variations of HPV52/58 prevalent among Japanese women by analyzing the viral whole-genome sequences. The entire genomic region of HPV52/58 was amplified by long-range PCR with total cellular DNA extracted from cervical exfoliated cells isolated from Japanese patients with CIN or ICC. The amplified DNA was subjected to next generation sequencing to determine the complete viral genome sequences. Phylogenetic analyses were performed with the whole-genome sequences to assign variant lineages/sublineages to the HPV52/58 isolates. The variability in amino acid sequences of viral proteins was assessed by calculating the Shannon entropy scores at individual amino acid positions of HPV proteins. Among 52 isolates of HPV52 (CIN1, n  = 20; CIN2/3, n  = 21; ICC, n  = 11), 50 isolates belonged to lineage B (sublineage B2) and two isolates belonged to lineage A (sublineage A1). Among 48 isolates of HPV58 (CIN1, n  = 21; CIN2/3, n  = 19; ICC, n  = 8), 47 isolates belonged to lineage A (sublineages A1/A2/A3) and one isolate belonged to lineage C. Single nucleotide polymorphisms specific for individual variant lineages were determined throughout the viral genome based on multiple sequence alignments of the Japanese HPV52/58 isolates and reference HPV52/58 genomes. Entropy analyses revealed that the E1 protein was relatively variable among the HPV52 isolates, whereas the E7, E4, and L2 proteins showed

  20. Whole-genome sequence of a novel Chinese cyprinid herpesvirus 3 isolate reveals the existence of a distinct European genotype in East Asia.

    Science.gov (United States)

    Li, Wei; Lee, Xuezhu; Weng, Shaoping; He, Jianguo; Dong, Chuanfu

    2015-02-25

    Cyprinid herpesvirus 3 (CyHV3), also known as koi herpesvirus (KHV), can be subdivided primarily into European and Asian genotypes, which are represented by CyHV3-U or CyHV3-I and CyHV3-J, respectively. In this study, the whole genome sequence of a novel Chinese CyHV3 isolate (GZ11) was determined and annotated. CyHV3-GZ11 genome was found to contain 295,119 nucleotides with 52.9% G/C content, which is highly similar to those of published CyHV3-U, CyHV3-I, and CyHV3-J strains. With reference to CyHV3-U, CyHV3-I, and CyHV3-J, CyHV3-GZ11 was also classified into 164 open reading frames (ORF), which include eight repeated ORFs. On the basis of the 12 alloherpeviruses core genes, results from phylogenetic analysis showed that CyHV3-GZ11 had closer evolutionary relationships with CyHV3-U and CyHV3-I than with CyHV3/KHV-J, which were also supported by genome wide-based single nucleotide substitution analysis and the use of a series of developed molecular markers. This study was the first to reveal the presence of a distinct European CyHV3 genotype in East and Southeast Asia at a whole genome level, which will evoke new insights on exploring the origin, evolution, and epidemiology of the virus. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. Whole genome expression array profiling highlights differences in mucosal defense genes in Barrett's esophagus and esophageal adenocarcinoma.

    Directory of Open Access Journals (Sweden)

    Derek J Nancarrow

    Full Text Available Esophageal adenocarcinoma (EAC has become a major concern in Western countries due to rapid rises in incidence coupled with very poor survival rates. One of the key risk factors for the development of this cancer is the presence of Barrett's esophagus (BE, which is believed to form in response to repeated gastro-esophageal reflux. In this study we performed comparative, genome-wide expression profiling (using Illumina whole-genome Beadarrays on total RNA extracted from esophageal biopsy tissues from individuals with EAC, BE (in the absence of EAC and those with normal squamous epithelium. We combined these data with publically accessible raw data from three similar studies to investigate key gene and ontology differences between these three tissue states. The results support the deduction that BE is a tissue with enhanced glycoprotein synthesis machinery (DPP4, ATP2A3, AGR2 designed to provide strong mucosal defenses aimed at resisting gastro-esophageal reflux. EAC exhibits the enhanced extracellular matrix remodeling (collagens, IGFBP7, PLAU effects expected in an aggressive form of cancer, as well as evidence of reduced expression of genes associated with mucosal (MUC6, CA2, TFF1 and xenobiotic (AKR1C2, AKR1B10 defenses. When our results are compared to previous whole-genome expression profiling studies keratin, mucin, annexin and trefoil factor gene groups are the most frequently represented differentially expressed gene families. Eleven genes identified here are also represented in at least 3 other profiling studies. We used these genes to discriminate between squamous epithelium, BE and EAC within the two largest cohorts using a support vector machine leave one out cross validation (LOOCV analysis. While this method was satisfactory for discriminating squamous epithelium and BE, it demonstrates the need for more detailed investigations into profiling changes between BE and EAC.

  2. Whole genome transcription profiling of Anaplasma phagocytophilum in human and tick host cells by tiling array analysis

    Directory of Open Access Journals (Sweden)

    Chavez Adela

    2008-07-01

    Full Text Available Abstract Background Anaplasma phagocytophilum (Ap is an obligate intracellular bacterium and the agent of human granulocytic anaplasmosis, an emerging tick-borne disease. Ap alternately infects ticks and mammals and a variety of cell types within each. Understanding the biology behind such versatile cellular parasitism may be derived through the use of tiling microarrays to establish high resolution, genome-wide transcription profiles of the organism as it infects cell lines representative of its life cycle (tick; ISE6 and pathogenesis (human; HL-60 and HMEC-1. Results Detailed, host cell specific transcriptional behavior was revealed. There was extensive differential Ap gene transcription between the tick (ISE6 and the human (HL-60 and HMEC-1 cell lines, with far fewer differentially transcribed genes between the human cell lines, and all disproportionately represented by membrane or surface proteins. There were Ap genes exclusively transcribed in each cell line, apparent human- and tick-specific operons and paralogs, and anti-sense transcripts that suggest novel expression regulation processes. Seven virB2 paralogs (of the bacterial type IV secretion system showed human or tick cell dependent transcription. Previously unrecognized genes and coding sequences were identified, as were the expressed p44/msp2 (major surface proteins paralogs (of 114 total, through elevated signal produced to the unique hypervariable region of each – 2/114 in HL-60, 3/114 in HMEC-1, and none in ISE6. Conclusion Using these methods, whole genome transcription profiles can likely be generated for Ap, as well as other obligate intracellular organisms, in any host cells and for all stages of the cell infection process. Visual representation of comprehensive transcription data alongside an annotated map of the genome renders complex transcription into discernable patterns.

  3. Genome-wide association study using high-density single nucleotide polymorphism arrays and whole-genome sequences for clinical mastitis traits in dairy cattle.

    Science.gov (United States)

    Sahana, G; Guldbrandtsen, B; Thomsen, B; Holm, L-E; Panitz, F; Brøndum, R F; Bendixen, C; Lund, M S

    2014-11-01

    Mastitis is a mammary disease that frequently affects dairy cattle. Despite considerable research on the development of effective prevention and treatment strategies, mastitis continues to be a significant issue in bovine veterinary medicine. To identify major genes that affect mastitis in dairy cattle, 6 chromosomal regions on Bos taurus autosome (BTA) 6, 13, 16, 19, and 20 were selected from a genome scan for 9 mastitis phenotypes using imputed high-density single nucleotide polymorphism arrays. Association analyses using sequence-level variants for the 6 targeted regions were carried out to map causal variants using whole-genome sequence data from 3 breeds. The quantitative trait loci (QTL) discovery population comprised 4,992 progeny-tested Holstein bulls, and QTL were confirmed in 4,442 Nordic Red and 1,126 Jersey cattle. The targeted regions were imputed to the sequence level. The highest association signal for clinical mastitis was observed on BTA 6 at 88.97 Mb in Holstein cattle and was confirmed in Nordic Red cattle. The peak association region on BTA 6 contained 2 genes: vitamin D-binding protein precursor (GC) and neuropeptide FF receptor 2 (NPFFR2), which, based on known biological functions, are good candidates for affecting mastitis. However, strong linkage disequilibrium in this region prevented conclusive determination of the causal gene. A different QTL on BTA 6 located at 88.32 Mb in Holstein cattle affected mastitis. In addition, QTL on BTA 13 and 19 were confirmed to segregate in Nordic Red cattle and QTL on BTA 16 and 20 were confirmed in Jersey cattle. Although several candidate genes were identified in these targeted regions, it was not possible to identify a gene or polymorphism as the causal factor for any of these regions. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  4. Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias

    Directory of Open Access Journals (Sweden)

    Didion John P

    2012-01-01

    Full Text Available Abstract Background High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs in genetic studies, including human genome-wide association studies. Hybridization intensities are converted to genotype calls by clustering algorithms that assign each sample to a genotype class at each SNP. Data for SNP probes that do not conform to the expected pattern of clustering are often discarded, contributing to ascertainment bias and resulting in lost information - as much as 50% in a recent genome-wide association study in dogs. Results We identified atypical patterns of hybridization intensities that were highly reproducible and demonstrated that these patterns represent genetic variants that were not accounted for in the design of the array platform. We characterized variable intensity oligonucleotide (VINO probes that display such patterns and are found in all hybridization-based genotyping platforms, including those developed for human, dog, cattle, and mouse. When recognized and properly interpreted, VINOs recovered a substantial fraction of discarded probes and counteracted SNP ascertainment bias. We developed software (MouseDivGeno that identifies VINOs and improves the accuracy of genotype calling. MouseDivGeno produced highly concordant genotype calls when compared with other methods but it uniquely identified more than 786000 VINOs in 351 mouse samples. We used whole-genome sequence from 14 mouse strains to confirm the presence of novel variants explaining 28000 VINOs in those strains. We also identified VINOs in human HapMap 3 samples, many of which were specific to an African population. Incorporating VINOs in phylogenetic analyses substantially improved the accuracy of a Mus species tree and local haplotype assignment in laboratory mouse strains. Conclusion The problems of ascertainment bias and missing

  5. Alignment of whole genomes.

    Science.gov (United States)

    Delcher, A L; Kasif, S; Fleischmann, R D; Peterson, J; White, O; Salzberg, S L

    1999-01-01

    A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications. PMID:10325427

  6. Evaluation of whole genome amplified DNA to decrease material expenditure and increase quality

    Directory of Open Access Journals (Sweden)

    Marie Bækvad-Hansen

    2017-06-01

    Discussion: Whole genome amplified DNA samples from dried blood spots is well suited for array genotyping and produces robust and reliable genotype data. However, the amplification process introduces additional noise to the data, making detection of structural variants such as copy number variants difficult. With this study, we explore ways of optimizing the amplification protocol in order to reduce noise and increase data quality. We found, that the amplification process was very robust, and that changes in amplification time or temperature did not alter the genotyping calls or quality of the array data. Adding additional replicates of each sample also lead to insignificant changes in the array data. Thus, the amount of noise introduced by the amplification process was consistent regardless of changes made to the amplification protocol. We also explored ways of decreasing material expenditure by reducing the spot size or the amplification reaction volume. The reduction did not affect the quality of the genotyping data.

  7. Co-circulation of multiple subtypes of enterovirus A71 (EV- A71) genotype C, including novel recombinants characterised by use of whole genome sequencing (WGS), Denmark 2016

    DEFF Research Database (Denmark)

    Midgley, Sofie E; Nielsen, Astrid G; Trebbien, Ramona

    2017-01-01

    In Europe, enterovirus A71 (EV-A71) has primarily been associated with sporadic cases of neurological disease. The recent emergence of new genotypes and larger outbreaks with severely ill patients demonstrates a potential for the spread of new, highly pathogenic EV-A71 strains. Detection and char......In Europe, enterovirus A71 (EV-A71) has primarily been associated with sporadic cases of neurological disease. The recent emergence of new genotypes and larger outbreaks with severely ill patients demonstrates a potential for the spread of new, highly pathogenic EV-A71 strains. Detection...

  8. Whole Genome Amplification of Day 3 or Day 5 Human Embryos Biopsies Provides a Suitable DNA Template for PCR-Based Techniques for Genotyping, a Complement of Preimplantation Genetic Testing

    Directory of Open Access Journals (Sweden)

    Elizabeth Schaeffer

    2017-01-01

    Full Text Available Our objective was to determine if whole genome amplification (WGA provides suitable DNA for qPCR-based genotyping for human embryos. Single blastomeres (Day 3 or trophoblastic cells (Day 5 were isolated from 342 embryos for WGA. Comparative Genomic Hybridization determined embryo sex as well as Trisomy 18 or Trisomy 21. To determine the embryo’s sex, qPCR melting curve analysis for SRY and DYS14 was used. Logistic regression indicated a 4.4%, 57.1%, or 98.8% probability of a male embryo when neither gene, SRY only, or both genes were detected, respectively (accuracy = 94.1%, kappa = 0.882, and p<0.001. Fluorescent Capillary Electrophoresis for the amelogenin genes (AMEL was also used to determine sex. AMELY peak’s height was higher and this peak’s presence was highly predictive of male embryos (AUC = 0.93, accuracy = 81.7%, kappa = 0.974, and p<0.001. Trisomy 18 and Trisomy 21 were determined using the threshold cycle difference for RPL17 and TTC3, respectively, which were significantly lower in the corresponding embryos. The Ct difference for TTC3 specifically determined Trisomy 21 (AUC = 0.89 and RPL17 for Trisomy 18 (AUC = 0.94. Here, WGA provides adequate DNA for PCR-based techniques for preimplantation genotyping.

  9. Genetic mapping using the Diversity Arrays Technology (DArT) : application and validation using the whole-genome sequences of Arabidopsis thaliana and the fungal wheat pathogen Mycosphaerella graminicola

    NARCIS (Netherlands)

    Wittenberg, A.H.J.

    2007-01-01

    Diversity Arrays Technology (DArT) is a microarray-based DNA marker technique for genome-wide discovery and genotyping of genetic variation. DArT allows simultaneous scoring of hundreds- to thousands of restriction site based polymorphisms between genotypes and does not require DNA sequence

  10. A SNP Genotyping Array for Hexaploid Oat

    Directory of Open Access Journals (Sweden)

    Nicholas A. Tinker

    2014-11-01

    Full Text Available Recognizing a need in cultivated hexaploid oat ( L. for a reliable set of reference single nucleotide polymorphisms (SNPs, we have developed a 6000 (6K BeadChip design containing 257 Infinium I and 5486 Infinium II designs corresponding to 5743 SNPs. Of those, 4975 SNPs yielded successful assays after array manufacturing. These SNPs were discovered based on a variety of bioinformatics pipelines in complementary DNA (cDNA and genomic DNA originating from 20 or more diverse oat cultivars. The array was validated in 1100 samples from six recombinant inbred line (RIL mapping populations and sets of diverse oat cultivars and breeding lines, and provided approximately 3500 discernible Mendelian polymorphisms. Here, we present an annotation of these SNPs, including methods of discovery, gene identification and orthology, population-genetic characteristics, and tentative positions on an oat consensus map. We also evaluate a new cluster-based method of calling SNPs. The SNP design sequences are made publicly available, and the full SNP genotyping platform is available for commercial purchase from an independent third party.

  11. Tracing Mycobacterium tuberculosis transmission by whole genome sequencing in a high incidence setting

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Soborg, B; Koch, A

    2016-01-01

    In East Greenland, a dramatic increase of tuberculosis (TB) incidence has been observed in recent years. Classical genotyping suggests a genetically similar Mycobacterium tuberculosis (Mtb) strain population as cause, however, precise transmission patterns are unclear. We performed whole genome...

  12. Specificity of the Linear Array HPV Genotyping Test for detecting human papillomavirus genotype 52 (HPV-52)

    OpenAIRE

    Kocjan, Boštjan; Poljak, Mario; Oštrbenk, Anja

    2015-01-01

    Introduction: HPV-52 is one of the most frequent human papillomavirus (HPV) genotypes causing significant cervical pathology. The most widely used HPV genotyping assay, the Roche Linear Array HPV Genotyping Test (Linear Array), is unable to identify HPV- 52 status in samples containing HPV-33, HPV-35, and/or HPV-58. Methods: Linear Array HPV-52 analytical specificity was established by testing 100 specimens reactive with the Linear Array HPV- 33/35/52/58 cross-reactive probe, but not with the...

  13. Large SNP arrays for genotyping in crop plants

    Indian Academy of Sciences (India)

    2012-10-15

    Oct 15, 2012 ... in human has been paralleled by the simultaneous develop- ment of ... In crop plants, the development of large genotyping arrays started much ..... via deep resequencing of reduced representation libraries with the Illumina ...

  14. Whole genome amplification - Review of applications and advances

    Energy Technology Data Exchange (ETDEWEB)

    Hawkins, Trevor L.; Detter, J.C.; Richardson, Paul

    2001-11-15

    The concept of Whole Genome Amplification is something that has arisen in the past few years as modifications to the polymerase chain reaction (PCR) have been adapted to replicate regions of genomes which are of biological interest. The applications here are many--forensics, embryonic disease diagnosis, bio terrorism genome detection, ''imoralization'' of clinical samples, microbial diversity, and genotyping. The key question is if DNA can be replicated a genome at a time without bias or non random distribution of the target. Several papers published in the last year and currently in preparation may lead to the conclusion that whole genome amplification may indeed be possible and therefore open up a new avenue to molecular biology.

  15. Diversity arrays technology (DArT) markers in apple for genetic linkage maps

    NARCIS (Netherlands)

    Schouten, H.J.; Weg, van de W.E.; Carling, J.; Khan, S.A.; McKay, S.J.; Kaauwen, van M.P.W.

    2012-01-01

    Diversity Arrays Technology (DArT) provides a high-throughput whole-genome genotyping platform for the detection and scoring of hundreds of polymorphic loci without any need for prior sequence information. The work presented here details the development and performance of a DArT genotyping array for

  16. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  17. Archived neonatal dried blood spot samples can be used for accurate whole genome and exome-targeted next-generation sequencing

    DEFF Research Database (Denmark)

    Hollegaard, Mads Vilhelm; Grauholm, Jonas; Nielsen, Ronni

    2013-01-01

    Dried blood spot samples (DBSS) have been collected and stored for decades as part of newborn screening programmes worldwide. Representing almost an entire population under a certain age and collected with virtually no bias, the Newborn Screening Biobanks are of immense value in medical studies......, for example, to examine the genetics of various disorders. We have previously demonstrated that DNA extracted from a fraction (2×3.2mm discs) of an archived DBSS can be whole genome amplified (wgaDNA) and used for accurate array genotyping. However, until now, it has been uncertain whether wgaDNA from DBSS...... can be used for accurate whole genome sequencing (WGS) and exome sequencing (WES). This study examined two individuals represented by three different types of samples each: whole-blood (reference samples), 3-year-old DBSS spotted with reference material (refDBSS), and 27- to 29-year-old archived...

  18. Light whole genome sequence for SNP discovery across domestic cat breeds

    Directory of Open Access Journals (Sweden)

    Driscoll Carlos

    2010-06-01

    Full Text Available Abstract Background The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline coronavirus -- FECV, feline immunodeficiency virus - FIV that are homologues to human scourges (cancer, SARS, and AIDS respectively. However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP map is required in order to accomplish disease and phenotype association discovery. Description To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. Conclusions These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.

  19. Whole Genome Epidemiological Typing of Escherichia coli

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer

    validating each position analyzed and ignoring the positions that cannot be validated thereby creating a distance matrix that is used as input to an UPGMA method that creates the final phylogeny. The ND method was also implemented as a web server and published. If whole genome sequencing is to be used...

  20. Whole genome phylogenies for multiple Drosophila species

    Directory of Open Access Journals (Sweden)

    Seetharam Arun

    2012-12-01

    Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between

  1. Whole genome sequencing of genotype VI Newcastle disease viruses from formalin-fixed paraffinembedded tissues from wild pigeons reveals continuous evolution and previously unrecognized genetic diversity in the U.S.

    Science.gov (United States)

    Background: Newcastle disease viruses (NDV) are highly contagious and can cause disease in both wild birds and poultry. A pigeon-adapted variant of genotype VI NDV, termed pigeon paramyxovirus 1, is commonly isolated from Columbiform birds in the United States. Complete genomic characterization of t...

  2. Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate.

    Directory of Open Access Journals (Sweden)

    Benjamin Georgi

    2014-03-01

    Full Text Available Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders.

  3. Genomic View of Bipolar Disorder Revealed by Whole Genome Sequencing in a Genetic Isolate

    Science.gov (United States)

    Georgi, Benjamin; Craig, David; Kember, Rachel L.; Liu, Wencheng; Lindquist, Ingrid; Nasser, Sara; Brown, Christopher; Egeland, Janice A.; Paul, Steven M.; Bućan, Maja

    2014-01-01

    Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders. PMID:24625924

  4. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls....... In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes...

  5. SNPassoc: an R package to perform whole genome association studies.

    Science.gov (United States)

    González, Juan R; Armengol, Lluís; Solé, Xavier; Guinó, Elisabet; Mercader, Josep M; Estivill, Xavier; Moreno, Víctor

    2007-03-01

    The popularization of large-scale genotyping projects has led to the widespread adoption of genetic association studies as the tool of choice in the search for single nucleotide polymorphisms (SNPs) underlying susceptibility to complex diseases. Although the analysis of individual SNPs is a relatively trivial task, when the number is large and multiple genetic models need to be explored it becomes necessary a tool to automate the analyses. In order to address this issue, we developed SNPassoc, an R package to carry out most common analyses in whole genome association studies. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Package SNPassoc is available at CRAN from http://cran.r-project.org. A tutorial is available on Bioinformatics online and in http://davinci.crg.es/estivill_lab/snpassoc.

  6. Whole-genome sequencing of veterinary pathogens

    DEFF Research Database (Denmark)

    Ronco, Troels

    -electrophoresis and single-locus sequencing has been widely used to characterize such types of veterinary pathogens. However, DNA sequencing techniques have become fast and cost effective in recent years and whole-genome sequencing data provide a much higher discriminative power and reproducibility than any...... genetic background. This indicates that dairy cows can be natural carriers of S. aureus subtypes that in certain cases lead to CM. A group of isolates that mostly belonged to ST151 carried three pathogenicity islands that were primarily found in this group. The prevalence of resistance genes was generally...

  7. Harnessing Whole Genome Sequencing in Medical Mycology.

    Science.gov (United States)

    Cuomo, Christina A

    2017-01-01

    Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.

  8. HPV genotype-specific concordance between EuroArray HPV, Anyplex II HPV28 and Linear Array HPV Genotyping test in Australian cervical samples

    Directory of Open Access Journals (Sweden)

    Alyssa M. Cornall

    2017-12-01

    Full Text Available Purpose: To compare human papillomavirus genotype-specific performance of two genotyping assays, Anyplex II HPV28 (Seegene and EuroArray HPV (EuroImmun, with Linear Array HPV (Roche. Methods: DNA extracted from clinican-collected cervical brush specimens in PreservCyt medium (Hologic, from 403 women undergoing management for detected cytological abnormalities, was tested on the three assays. Genotype-specific agreement were assessed by Cohen's kappa statistic and Fisher's z-test of significance between proportions. Results: Agreement between Linear Array and the other 2 assays was substantial to almost perfect (κ = 0.60 − 1.00 for most genotypes, and was almost perfect (κ = 0.81 – 0.98 for almost all high-risk genotypes. Linear Array overall detected most genotypes more frequently, however this was only statistically significant for HPV51 (EuroArray; p = 0.0497, HPV52 (Anyplex II; p = 0.039 and HPV61 (Anyplex II; p=0.047. EuroArray detected signficantly more HPV26 (p = 0.002 and Anyplex II detected more HPV42 (p = 0.035 than Linear Array. Each assay performed differently for HPV68 detection: EuroArray and LA were in moderate to substantial agreement with Anyplex II (κ = 0.46 and 0.62, respectively, but were in poor disagreement with each other (κ = −0.01. Conclusions: EuroArray and Anyplex II had similar sensitivity to Linear Array for most high-risk genotypes, with slightly lower sensitivity for HPV 51 or 52. Keywords: Human papillomavirus, Genotyping, Linear Array, Anyplex II, EuroArray, Cervix

  9. Dirofilaria immitis JYD-34 isolate: whole genome analysis

    Directory of Open Access Journals (Sweden)

    Catherine Bourguinat

    2017-11-01

    Full Text Available Abstract Background Macrocyclic lactone (ML anthelmintics are used for chemoprophylaxis for heartworm infection in dogs and cats. Cases of dogs becoming infected with heartworms, despite apparent compliance to recommended chemoprophylaxis with approved preventives, has led to such cases being considered as suspected lack of efficacy (LOE. Recently, microfilariae collected from a small number of LOE isolates were used as a source of infection of new host dogs and confirmed to have reduced susceptibility to ML in controlled efficacy studies using L3 challenge in dogs. A specific Dirofilaria immitis laboratory isolate named JYD-34 has also been confirmed to have less than 100% susceptibility to ML-based preventives. For preventive claims against heartworm disease, evidence of 100% efficacy is required by FDA-CVM. It was therefore of interest to determine whether JYD-34 has a genetic profile similar to other documented LOE and confirmed reduced susceptibility isolates or has a genetic profile similar to known ML-susceptible isolates. Methods In this study, the 90Mbp whole genome of the JYD-34 strain was sequenced. This genome was compared using bioinformatics tools to pooled whole genomes of four well-characterized susceptible D. immitis populations, one susceptible Missouri laboratory isolate, as well as the pooled whole genomes of four LOE D. immitis populations. Fixation indexes (FST, which allow the genetic structure of each population (isolate to be compared at the level of single nucleotide polymorphisms (SNP across the genome, have been calculated. Forty-one previously reported SNP, that appeared to differentiate between susceptible and LOE and confirmed reduced susceptibility isolates, were also investigated in the JYD-34 isolate. Results The FST analysis, and the analysis of the 41 SNP that appeared to differentiate reduced susceptibility from fully susceptible isolates, confirmed that the JYD-34 isolate has a genome similar to previously

  10. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-01-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  11. Whole genome amplification in preimplantation genetic diagnosis*

    Science.gov (United States)

    Zheng, Ying-ming; Wang, Ning; Li, Lei; Jin, Fan

    2011-01-01

    Preimplantation genetic diagnosis (PGD) refers to a procedure for genetically analyzing embryos prior to implantation, improving the chance of conception for patients at high risk of transmitting specific inherited disorders. This method has been widely used for a large number of genetic disorders since the first successful application in the early 1990s. Polymerase chain reaction (PCR) and fluorescent in situ hybridization (FISH) are the two main methods in PGD, but there are some inevitable shortcomings limiting the scope of genetic diagnosis. Fortunately, different whole genome amplification (WGA) techniques have been developed to overcome these problems. Sufficient DNA can be amplified and multiple tasks which need abundant DNA can be performed. Moreover, WGA products can be analyzed as a template for multi-loci and multi-gene during the subsequent DNA analysis. In this review, we will focus on the currently available WGA techniques and their applications, as well as the new technical trends from WGA products. PMID:21194180

  12. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  13. Aligning the unalignable: bacteriophage whole genome alignments.

    Science.gov (United States)

    Bérard, Sèverine; Chateau, Annie; Pompidor, Nicolas; Guertin, Paul; Bergeron, Anne; Swenson, Krister M

    2016-01-13

    In recent years, many studies focused on the description and comparison of large sets of related bacteriophage genomes. Due to the peculiar mosaic structure of these genomes, few informative approaches for comparing whole genomes exist: dot plots diagrams give a mostly qualitative assessment of the similarity/dissimilarity between two or more genomes, and clustering techniques are used to classify genomes. Multiple alignments are conspicuously absent from this scene. Indeed, whole genome aligners interpret lack of similarity between sequences as an indication of rearrangements, insertions, or losses. This behavior makes them ill-prepared to align bacteriophage genomes, where even closely related strains can accomplish the same biological function with highly dissimilar sequences. In this paper, we propose a multiple alignment strategy that exploits functional collinearity shared by related strains of bacteriophages, and uses partial orders to capture mosaicism of sets of genomes. As classical alignments do, the computed alignments can be used to predict that genes have the same biological function, even in the absence of detectable similarity. The Alpha aligner implements these ideas in visual interactive displays, and is used to compute several examples of alignments of Staphylococcus aureus and Mycobacterium bacteriophages, involving up to 29 genomes. Using these datasets, we prove that Alpha alignments are at least as good as those computed by standard aligners. Comparison with the progressive Mauve aligner - which implements a partial order strategy, but whose alignments are linearized - shows a greatly improved interactive graphic display, while avoiding misalignments. Multiple alignments of whole bacteriophage genomes work, and will become an important conceptual and visual tool in comparative genomics of sets of related strains. A python implementation of Alpha, along with installation instructions for Ubuntu and OSX, is available on bitbucket (https://bitbucket.org/thekswenson/alpha).

  14. Whole-Genome Sequencing for National Surveillance of Shigella flexneri

    Directory of Open Access Journals (Sweden)

    Marie A. Chattaway

    2017-09-01

    Full Text Available National surveillance of Shigella flexneri ensures the rapid detection of outbreaks to facilitate public health investigation and intervention strategies. In this study, we used whole-genome sequencing (WGS to type S. flexneri in order to detect linked cases and support epidemiological investigations. We prospectively analyzed 330 isolates of S. flexneri received at the Gastrointestinal Bacteria Reference Unit at Public Health England between August 2015 and January 2016. Traditional phenotypic and WGS sub-typing methods were compared. PCR was carried out on isolates exhibiting phenotypic/genotypic discrepancies with respect to serotype. Phylogenetic relationships between isolates were analyzed by WGS using single nucleotide polymorphism (SNP typing to facilitate cluster detection. For 306/330 (93% isolates there was concordance between serotype derived from the genome and phenotypic serology. Discrepant results between the phenotypic and genotypic tests were attributed to novel O-antigen synthesis/modification gene combinations or indels identified in O-antigen synthesis/modification genes rendering them dysfunctional. SNP typing identified 36 clusters of two isolates or more. WGS provided microbiological evidence of epidemiologically linked clusters and detected novel O-antigen synthesis/modification gene combinations associated with two outbreaks. WGS provided reliable and robust data for monitoring trends in the incidence of different serotypes over time. SNP typing can be used to facilitate outbreak investigations in real-time thereby informing surveillance strategies and providing the opportunities for implementing timely public health interventions.

  15. Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Plant Ramona N

    2006-08-01

    Full Text Available Abstract Background Whole genome amplification is an increasingly common technique through which minute amounts of DNA can be multiplied to generate quantities suitable for genetic testing and analysis. Questions of amplification-induced error and template bias generated by these methods have previously been addressed through either small scale (SNPs or large scale (CGH array, FISH methodologies. Here we utilized whole genome sequencing to assess amplification-induced bias in both coding and non-coding regions of two bacterial genomes. Halobacterium species NRC-1 DNA and Campylobacter jejuni were amplified by several common, commercially available protocols: multiple displacement amplification, primer extension pre-amplification and degenerate oligonucleotide primed PCR. The amplification-induced bias of each method was assessed by sequencing both genomes in their entirety using the 454 Sequencing System technology and comparing the results with those obtained from unamplified controls. Results All amplification methodologies induced statistically significant bias relative to the unamplified control. For the Halobacterium species NRC-1 genome, assessed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 119 times greater than those from unamplified material, 164.0 times greater for Repli-G, 165.0 times greater for PEP-PCR and 252.0 times greater than the unamplified controls for DOP-PCR. For Campylobacter jejuni, also analyzed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 15 times greater than those from unamplified material, 19.8 times greater for Repli-G, 61.8 times greater for PEP-PCR and 220.5 times greater than the unamplified controls for DOP-PCR. Conclusion Of the amplification methodologies examined in this paper, the multiple displacement amplification products generated the least bias, and produced significantly higher yields of amplified DNA.

  16. HPV genotype-specific concordance between EuroArray HPV, Anyplex II HPV28 and Linear Array HPV Genotyping test in Australian cervical samples.

    Science.gov (United States)

    Cornall, Alyssa M; Poljak, Marin; Garland, Suzanne M; Phillips, Samuel; Machalek, Dorothy A; Tan, Jeffrey H; Quinn, Michael A; Tabrizi, Sepehr N

    2017-12-01

    To compare human papillomavirus genotype-specific performance of two genotyping assays, Anyplex II HPV28 (Seegene) and EuroArray HPV (EuroImmun), with Linear Array HPV (Roche). DNA extracted from clinican-collected cervical brush specimens in PreservCyt medium (Hologic), from 403 women undergoing management for detected cytological abnormalities, was tested on the three assays. Genotype-specific agreement were assessed by Cohen's kappa statistic and Fisher's z-test of significance between proportions. Agreement between Linear Array and the other 2 assays was substantial to almost perfect (κ = 0.60 - 1.00) for most genotypes, and was almost perfect (κ = 0.81 - 0.98) for almost all high-risk genotypes. Linear Array overall detected most genotypes more frequently, however this was only statistically significant for HPV51 (EuroArray; p = 0.0497), HPV52 (Anyplex II; p = 0.039) and HPV61 (Anyplex II; p=0.047). EuroArray detected signficantly more HPV26 (p = 0.002) and Anyplex II detected more HPV42 (p = 0.035) than Linear Array. Each assay performed differently for HPV68 detection: EuroArray and LA were in moderate to substantial agreement with Anyplex II (κ = 0.46 and 0.62, respectively), but were in poor disagreement with each other (κ = -0.01). EuroArray and Anyplex II had similar sensitivity to Linear Array for most high-risk genotypes, with slightly lower sensitivity for HPV 51 or 52. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  17. Automated typing of red blood cell and platelet antigens: a whole-genome sequencing study.

    Science.gov (United States)

    Lane, William J; Westhoff, Connie M; Gleadall, Nicholas S; Aguad, Maria; Smeland-Wagman, Robin; Vege, Sunitha; Simmons, Daimon P; Mah, Helen H; Lebo, Matthew S; Walter, Klaudia; Soranzo, Nicole; Di Angelantonio, Emanuele; Danesh, John; Roberts, David J; Watkins, Nick A; Ouwehand, Willem H; Butterworth, Adam S; Kaufman, Richard M; Rehm, Heidi L; Silberstein, Leslie E; Green, Robert C

    2018-06-01

    There are more than 300 known red blood cell (RBC) antigens and 33 platelet antigens that differ between individuals. Sensitisation to antigens is a serious complication that can occur in prenatal medicine and after blood transfusion, particularly for patients who require multiple transfusions. Although pre-transfusion compatibility testing largely relies on serological methods, reagents are not available for many antigens. Methods based on single-nucleotide polymorphism (SNP) arrays have been used, but typing for ABO and Rh-the most important blood groups-cannot be done with SNP typing alone. We aimed to develop a novel method based on whole-genome sequencing to identify RBC and platelet antigens. This whole-genome sequencing study is a subanalysis of data from patients in the whole-genome sequencing arm of the MedSeq Project randomised controlled trial (NCT01736566) with no measured patient outcomes. We created a database of molecular changes in RBC and platelet antigens and developed an automated antigen-typing algorithm based on whole-genome sequencing (bloodTyper). This algorithm was iteratively improved to address cis-trans haplotype ambiguities and homologous gene alignments. Whole-genome sequencing data from 110 MedSeq participants (30 × depth) were used to initially validate bloodTyper through comparison with conventional serology and SNP methods for typing of 38 RBC antigens in 12 blood-group systems and 22 human platelet antigens. bloodTyper was further validated with whole-genome sequencing data from 200 INTERVAL trial participants (15 × depth) with serological comparisons. We iteratively improved bloodTyper by comparing its typing results with conventional serological and SNP typing in three rounds of testing. The initial whole-genome sequencing typing algorithm was 99·5% concordant across the first 20 MedSeq genomes. Addressing discordances led to development of an improved algorithm that was 99·8% concordant for the remaining 90 Med

  18. Microbial species delineation using whole genome sequences.

    Science.gov (United States)

    Varghese, Neha J; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T; Mavrommatis, Kostas; Kyrpides, Nikos C; Pati, Amrita

    2015-08-18

    Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Small sample whole-genome amplification

    Science.gov (United States)

    Hara, Christine; Nguyen, Christine; Wheeler, Elizabeth; Sorensen, Karen; Arroyo, Erin; Vrankovich, Greg; Christian, Allen

    2005-11-01

    Many challenges arise when trying to amplify and analyze human samples collected in the field due to limitations in sample quantity, and contamination of the starting material. Tests such as DNA fingerprinting and mitochondrial typing require a certain sample size and are carried out in large volume reactions; in cases where insufficient sample is present whole genome amplification (WGA) can be used. WGA allows very small quantities of DNA to be amplified in a way that enables subsequent DNA-based tests to be performed. A limiting step to WGA is sample preparation. To minimize the necessary sample size, we have developed two modifications of WGA: the first allows for an increase in amplified product from small, nanoscale, purified samples with the use of carrier DNA while the second is a single-step method for cleaning and amplifying samples all in one column. Conventional DNA cleanup involves binding the DNA to silica, washing away impurities, and then releasing the DNA for subsequent testing. We have eliminated losses associated with incomplete sample release, thereby decreasing the required amount of starting template for DNA testing. Both techniques address the limitations of sample size by providing ample copies of genomic samples. Carrier DNA, included in our WGA reactions, can be used when amplifying samples with the standard purification method, or can be used in conjunction with our single-step DNA purification technique to potentially further decrease the amount of starting sample necessary for future forensic DNA-based assays.

  20. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  1. Rapid whole genome sequencing and precision neonatology.

    Science.gov (United States)

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Bridging the gap from prenatal karyotyping to whole-genome array comparative genomic hybridization in Hong Kong: survey on knowledge and acceptance of health-care providers and pregnant women.

    Science.gov (United States)

    Cheng, Hiu Yee Heidi; Kan, Anita Sik-Yau; Hui, Pui Wah; Lee, Chin Peng; Tang, Mary Hoi Yin

    2017-12-01

    The use of array comparative genomic hybridization (aCGH) has been increasingly widespread. The challenge of integration of this technology into prenatal diagnosis was the interpretation of results and communicating findings of unclear clinical significance. This study assesses the knowledge and acceptance of prenatal aCGH in Hong Kong obstetricians and pregnant women. The aim is to identify the needs and gaps before implementing the replacement of karyotyping with aCGH. Questionnaires with aCGH information in the form of pamphlets were sent by post to obstetrics and gynecology doctors. For the pregnant women group, a video presentation, pamphlets on aCGH and a self-administered questionnaire were provided at the antenatal clinic. The perception of aCGH between doctors and pregnant women was similar. Doctors not choosing aCGH were more concerned about the difficulty in counseling of variants of unknown significance and adult-onset disease in pregnant women, whereas pregnant women not choosing aCGH were more concerned about the increased waiting time leading to increased anxiety. Prenatal aCGH is perceived as a better test by both doctors and patients. Counseling support, training, and better understanding and communication of findings of unclear clinical significance are necessary to improve doctor-patient experience.

  3. Analysis of phage Mu DNA transposition by whole-genome Escherichia coli tiling arrays reveals a complex relationship to distribution of target selection protein B, transcription and chromosome architectural elements.

    Science.gov (United States)

    Ge, Jun; Lou, Zheng; Cui, Hong; Shang, Lei; Harshey, Rasika M

    2011-09-01

    Of all known transposable elements, phage Mu exhibits the highest transposition efficiency and the lowest target specificity. In vitro, MuB protein is responsible for target choice. In this work, we provide a comprehensive assessment of the genome-wide distribution of MuB and its relationship to Mu target selection using high-resolution Escherichia coli tiling DNA arrays. We have also assessed how MuB binding and Mu transposition are influenced by chromosome-organizing elements such as AT-rich DNA signatures, or the binding of the nucleoid-associated protein Fis, or processes such as transcription. The results confirm and extend previous biochemical and lower resolution in vivo data. Despite the generally random nature of Mu transposition and MuB binding, there were hot and cold insertion sites and MuB binding sites in the genome, and differences between the hottest and coldest sites were large. The new data also suggest that MuB distribution and subsequent Mu integration is responsive to DNA sequences that contribute to the structural organization of the chromosome.

  4. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture

    DEFF Research Database (Denmark)

    Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang

    2015-01-01

    . Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication...

  5. Whole-genome methylation caller designed for methyl- DNA ...

    African Journals Online (AJOL)

    etchie

    2013-02-20

    Feb 20, 2013 ... Our method uses a single-CpG-resolution, whole-genome methylation ... Key words: Methyl-DNA immunoprecipitation, next-generation sequencing, ...... methylation is prevalent in embryonic stem cells andmaybe mediated.

  6. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  7. A New Single Nucleotide Polymorphism Database for Rainbow Trout Generated Through Whole Genome Resequencing

    Directory of Open Access Journals (Sweden)

    Guangtu Gao

    2018-04-01

    Full Text Available Single-nucleotide polymorphisms (SNPs are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout (Oncorhynchus mykiss, SNP discovery has been previously done through sequencing of restriction-site associated DNA (RAD libraries, reduced representation libraries (RRL and RNA sequencing. Recently we have performed high coverage whole genome resequencing with 61 unrelated samples, representing a wide range of rainbow trout and steelhead populations, with 49 new samples added to 12 aquaculture samples from AquaGen (Norway that we previously used for SNP discovery. Of the 49 new samples, 11 were double-haploid lines from Washington State University (WSU and 38 represented wild and hatchery populations from a wide range of geographic distribution and with divergent migratory phenotypes. We then mapped the sequences to the new rainbow trout reference genome assembly (GCA_002163495.1 which is based on the Swanson YY doubled haploid line. Variant calling was conducted with FreeBayes and SAMtools mpileup, followed by filtering of SNPs based on quality score, sequence complexity, read depth on the locus, and number of genotyped samples. Results from the two variant calling programs were compared and genotypes of the double haploid samples were used for detecting and filtering putative paralogous sequence variants (PSVs and multi-sequence variants (MSVs. Overall, 30,302,087 SNPs were identified on the rainbow trout genome 29 chromosomes and 1,139,018 on unplaced scaffolds, with 4,042,723 SNPs having high minor allele frequency (MAF > 0.25. The average SNP density on the chromosomes was one SNP per 64 bp, or 15.6 SNPs per 1 kb. Results from the phylogenetic analysis that we conducted indicate that the SNP markers contain enough population-specific polymorphisms for recovering population relationships despite the small sample size used. Intra-Population polymorphism assessment revealed high level of polymorphism and

  8. The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants.

    Science.gov (United States)

    Reuter, Miriam S; Walker, Susan; Thiruvahindrapuram, Bhooma; Whitney, Joe; Cohn, Iris; Sondheimer, Neal; Yuen, Ryan K C; Trost, Brett; Paton, Tara A; Pereira, Sergio L; Herbrick, Jo-Anne; Wintle, Richard F; Merico, Daniele; Howe, Jennifer; MacDonald, Jeffrey R; Lu, Chao; Nalpathamkalam, Thomas; Sung, Wilson W L; Wang, Zhuozhi; Patel, Rohan V; Pellecchia, Giovanna; Wei, John; Strug, Lisa J; Bell, Sherilyn; Kellam, Barbara; Mahtani, Melanie M; Bassett, Anne S; Bombard, Yvonne; Weksberg, Rosanna; Shuman, Cheryl; Cohn, Ronald D; Stavropoulos, Dimitri J; Bowdin, Sarah; Hildebrandt, Matthew R; Wei, Wei; Romm, Asli; Pasceri, Peter; Ellis, James; Ray, Peter; Meyn, M Stephen; Monfared, Nasim; Hosseini, S Mohsen; Joseph-George, Ann M; Keeley, Fred W; Cook, Ryan A; Fiume, Marc; Lee, Hin C; Marshall, Christian R; Davies, Jill; Hazell, Allison; Buchanan, Janet A; Szego, Michael J; Scherer, Stephen W

    2018-02-05

    The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers. Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant. Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set ( n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants - associated with cancer, cardiac or neurodegenerative phenotypes - remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual. Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care. © 2018 Joule Inc. or its licensors.

  9. Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

    Science.gov (United States)

    Li, Miaoxin; Li, Jiang; Li, Mulin Jun; Pan, Zhicheng; Hsu, Jacob Shujui; Liu, Dajiang J; Zhan, Xiaowei; Wang, Junwen; Song, Youqiang; Sham, Pak Chung

    2017-05-19

    Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Whole genome sequence typing to investigate the Apophysomyces outbreak following a tornado in Joplin, Missouri, 2011.

    Science.gov (United States)

    Etienne, Kizee A; Gillece, John; Hilsabeck, Remy; Schupp, Jim M; Colman, Rebecca; Lockhart, Shawn R; Gade, Lalitha; Thompson, Elizabeth H; Sutton, Deanna A; Neblett-Fanfair, Robyn; Park, Benjamin J; Turabelidze, George; Keim, Paul; Brandt, Mary E; Deak, Eszter; Engelthaler, David M

    2012-01-01

    Case reports of Apophysomyces spp. in immunocompetent hosts have been a result of traumatic deep implantation of Apophysomyces spp. spore-contaminated soil or debris. On May 22, 2011 a tornado occurred in Joplin, MO, leaving 13 tornado victims with Apophysomyces trapeziformis infections as a result of lacerations from airborne material. We used whole genome sequence typing (WGST) for high-resolution phylogenetic SNP analysis of 17 outbreak Apophysomyces isolates and five additional temporally and spatially diverse Apophysomyces control isolates (three A. trapeziformis and two A. variabilis isolates). Whole genome SNP phylogenetic analysis revealed three clusters of genotypically related or identical A. trapeziformis isolates and multiple distinct isolates among the Joplin group; this indicated multiple genotypes from a single or multiple sources. Though no linkage between genotype and location of exposure was observed, WGST analysis determined that the Joplin isolates were more closely related to each other than to the control isolates, suggesting local population structure. Additionally, species delineation based on WGST demonstrated the need to reassess currently accepted taxonomic classifications of phylogenetic species within the genus Apophysomyces.

  11. Microarray-based whole-genome hybridization as a tool for determining procaryotic species relatedness

    Energy Technology Data Exchange (ETDEWEB)

    Wu, L.; Liu, X.; Fields, M.W.; Thompson, D.K.; Bagwell, C.E.; Tiedje, J. M.; Hazen, T.C.; Zhou, J.

    2008-01-15

    The definition and delineation of microbial species are of great importance and challenge due to the extent of evolution and diversity. Whole-genome DNA-DNA hybridization is the cornerstone for defining procaryotic species relatedness, but obtaining pairwise DNA-DNA reassociation values for a comprehensive phylogenetic analysis of procaryotes is tedious and time consuming. A previously described microarray format containing whole-genomic DNA (the community genome array or CGA) was rigorously evaluated as a high-throughput alternative to the traditional DNA-DNA reassociation approach for delineating procaryotic species relationships. DNA similarities for multiple bacterial strains obtained with the CGA-based hybridization were comparable to those obtained with various traditional whole-genome hybridization methods (r=0.87, P<0.01). Significant linear relationships were also observed between the CGA-based genome similarities and those derived from small subunit (SSU) rRNA gene sequences (r=0.79, P<0.0001), gyrB sequences (r=0.95, P<0.0001) or REP- and BOX-PCR fingerprinting profiles (r=0.82, P<0.0001). The CGA hybridization-revealed species relationships in several representative genera, including Pseudomonas, Azoarcus and Shewanella, were largely congruent with previous classifications based on various conventional whole-genome DNA-DNA reassociation, SSU rRNA and/or gyrB analyses. These results suggest that CGA-based DNA-DNA hybridization could serve as a powerful, high-throughput format for determining species relatedness among microorganisms.

  12. Whole genome shotgun sequencing of Indian strains of Streptococcus agalactiae

    Directory of Open Access Journals (Sweden)

    Balaji Veeraraghavan

    2017-12-01

    Full Text Available Group B streptococcus is known as a leading cause of neonatal infections in developing countries. The present study describes the whole genome shotgun sequences of four Group B Streptococcus (GBS isolates. Molecular data on clonality is lacking for GBS in India. The present genome report will add important information on the scarce genome data of GBS and will help in deriving comparative genome studies of GBS isolates at global level. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession numbers NHPL00000000 – NHPO00000000.

  13. Concept and design of a genome-wide association genotyping array tailored for transplantation-specific studies

    DEFF Research Database (Denmark)

    Li, Yun R.; van Setten, Jessica; Verma, Shefali S.

    2015-01-01

    genome-wide genotyping array, the 'TxArray', comprising approximately 782,000 markers with tailored content for deeper capture of variants across HLA, KIR, pharmacogenomic, and metabolic loci important in transplantation. To test concordance and genotyping quality, we genotyped 85 HapMap samples...

  14. Whole-genome single-nucleotide polymorphism (SNP marker discovery and association analysis with the eicosapentaenoic acid (EPA and docosahexaenoic acid (DHA content in Larimichthys crocea

    Directory of Open Access Journals (Sweden)

    Shijun Xiao

    2016-12-01

    Full Text Available Whole-genome single-nucleotide polymorphism (SNP markers are valuable genetic resources for the association and conservation studies. Genome-wide SNP development in many teleost species are still challenging because of the genome complexity and the cost of re-sequencing. Genotyping-By-Sequencing (GBS provided an efficient reduced representative method to squeeze cost for SNP detection; however, most of recent GBS applications were reported on plant organisms. In this work, we used an EcoRI-NlaIII based GBS protocol to teleost large yellow croaker, an important commercial fish in China and East-Asia, and reported the first whole-genome SNP development for the species. 69,845 high quality SNP markers that evenly distributed along genome were detected in at least 80% of 500 individuals. Nearly 95% randomly selected genotypes were successfully validated by Sequenom MassARRAY assay. The association studies with the muscle eicosapentaenoic acid (EPA and docosahexaenoic acid (DHA content discovered 39 significant SNP markers, contributing as high up to ∼63% genetic variance that explained by all markers. Functional genes that involved in fat digestion and absorption pathway were identified, such as APOB, CRAT and OSBPL10. Notably, PPT2 Gene, previously identified in the association study of the plasma n-3 and n-6 polyunsaturated fatty acid level in human, was re-discovered in large yellow croaker. Our study verified that EcoRI-NlaIII based GBS could produce quality SNP markers in a cost-efficient manner in teleost genome. The developed SNP markers and the EPA and DHA associated SNP loci provided invaluable resources for the population structure, conservation genetics and genomic selection of large yellow croaker and other fish organisms.

  15. A simple optimization can improve the performance of single feature polymorphism detection by Affymetrix expression arrays

    Directory of Open Access Journals (Sweden)

    Fujisawa Hironori

    2010-05-01

    Full Text Available Abstract Background High-density oligonucleotide arrays are effective tools for genotyping numerous loci simultaneously. In small genome species (genome size: Results We compared the single feature polymorphism (SFP detection performance of whole-genome and transcript hybridizations using the Affymetrix GeneChip® Rice Genome Array, using the rice cultivars with full genome sequence, japonica cultivar Nipponbare and indica cultivar 93-11. Both genomes were surveyed for all probe target sequences. Only completely matched 25-mer single copy probes of the Nipponbare genome were extracted, and SFPs between them and 93-11 sequences were predicted. We investigated optimum conditions for SFP detection in both whole genome and transcript hybridization using differences between perfect match and mismatch probe intensities of non-polymorphic targets, assuming that these differences are representative of those between mismatch and perfect targets. Several statistical methods of SFP detection by whole-genome hybridization were compared under the optimized conditions. Causes of false positives and negatives in SFP detection in both types of hybridization were investigated. Conclusions The optimizations allowed a more than 20% increase in true SFP detection in whole-genome hybridization and a large improvement of SFP detection performance in transcript hybridization. Significance analysis of the microarray for log-transformed raw intensities of PM probes gave the best performance in whole genome hybridization, and 22,936 true SFPs were detected with 23.58% false positives by whole genome hybridization. For transcript hybridization, stable SFP detection was achieved for highly expressed genes, and about 3,500 SFPs were detected at a high sensitivity (> 50% in both shoot and young panicle transcripts. High SFP detection performances of both genome and transcript hybridizations indicated that microarrays of a complex genome (e.g., of Oryza sativa can be

  16. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.

    2014-08-14

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  17. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.; Michell, Craig; Apprill, Amy; Voolstra, Christian R.

    2014-01-01

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  18. Large SNP arrays for genotyping in crop plants

    Indian Academy of Sciences (India)

    Genotyping with large numbers of molecular markers is now an indispensable tool within plant genetics and breeding. Especially through the identification of large numbers of single nucleotide polymorphism (SNP) markers using the novel high-throughput sequencing technologies, it is now possible to reliably identify many ...

  19. Whole genome sequencing as the ultimate tool to diagnose tuberculosis

    Directory of Open Access Journals (Sweden)

    Dick van Soolingen

    2016-01-01

    Full Text Available In the past two decades, DNA techniques have been increasingly used in the laboratory diagnosis of tuberculosis (TB. The (sub species of the Mycobacterium tuberculosis complex are usually identified using reverse line blot techniques. The resistance is predicted by the detection of mutations in genes associated with resistance. Nevertheless, all cases are still subjected to cumbersome phenotypic resistance testing. The production of a strain-characteristic DNA fingerprint, to investigate the epidemiology of TB, is done by the 24-locus variable number tandem repeat (VNTR typing. However, most of the molecular techniques in the diagnosis of TB can eventually be replaced by whole genome sequencing (WGS. Many international TB reference laboratories are currently working on the introduction of WGS; however, standardization in the international context is lacking. The European Centre for Infectious Disease Prevention and Control in Stockholm, Sweden organizes a yearly round of quality control on VNTR typing and in 2015 for the first time also WGS. In this first proficiency study, only three out of eight international TB laboratories produced WGS results in line with those of the reference laboratory. The whole process of DNA isolation, purification, quantification, sequencing, and analysis/interpretation of data is still under development. In this presentation, many aspects will be covered that influence the quality and interpretation of WGS results. The turn-around-time, analysis, and utility of WGS will be discussed. Moreover, the experiences in the use of WGS in the molecular epidemiology of TB in The Netherlands are detailed. It can be concluded that many difficulties still have to be conquered. The state of the art is that bacteria still have to be cultured to have sufficient quality and quantity of DNA for succesful WGS. The quality of sequencing has improved significantly over the past 7 years, and the detection of mutations has, therefore

  20. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  1. Whole genomes redefine the mutational landscape of pancreatic cancer

    OpenAIRE

    Waddell, Nicola; Pajic, Marina; Patch, Ann-Marie; Chang, David K.; Kassahn, Karin S.; Bailey, Peter; Johns, Amber L.; Miller, David; Nones, Katia; Quek, Kelly; Quinn, Michael C. J.; Robertson, Alan J.; Fadlullah, Muhammad Z. H.; Bruxner, Tim J. C.; Christ, Angelika N.

    2015-01-01

    Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (...

  2. Whole-Genome DNA Methylation Status Associated with Clinical PTSD Measures of OIF/OEF Veterans (Open Access)

    Science.gov (United States)

    2017-07-11

    OIF) veterans with PTSD and 51 age/ethnicity/ gender -matched combat-exposed PTSD-negative controls. Agilent whole-genome array detected ~ 5600...exclusion criteria were used19,20 to identify a training set comprising 48 male veterans with PTSD (PTSD+) and 51 age-/ethnicity-/ gender -matched controls...568 Doughten Drive, Fort Detrick, Frederick, MD 21702-5010, USA. E-mail: Rasha.Hammamieh1.civ@mail.mil 11These authors contributed equally to this

  3. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Directory of Open Access Journals (Sweden)

    David Koslicki

    Full Text Available With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  4. Whole-genome shotgun optical mapping of rhodospirillumrubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, Susan; Zhou, Shiguo; Place, Mike; Zhang, Yaoping; Briska, Adam; Goldstein, Steve; Churas, Chris; Runnheim, Rod; Forrest,Dan; Lim, Alex; Lapidus, Alla; Han, Cliff S.; Roberts, Gary P.; Schwartz,David C.

    2004-07-01

    Rhodospirillum rubrum is a phototrophic purple non-sulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems, and as a source of hydrogen and biodegradable plastics production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction maps (Xba I, Nhe I, and Hind III) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction maps from randomly sheared genomic DNA molecules extracted directly from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the Hind III map acted as a scaffold for high resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and validation of genome sequence, our work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a ''molecular cytogenetics'' approach to solving problems in genomic analysis.

  5. Whole-genome shotgun optical mapping of Rhodospirillum rubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, S. [Univ. Wisc.-Madison; Zhou, S. [Univ. Wisc.-Madison; Place, M. [Univ. Wisc.-Madison; Zhang, Y. [Univ. Wisc.-Madison; Briska, A. [Univ. Wisc.-Madison; Goldstein, S. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Forrest, D. [Univ. Wisc.-Madison; Lim, A. [Univ. Wisc.-Madison; Lapidus, A. [Univ. Wisc.-Madison; Han, C. S. [Univ. Wisc.-Madison; Roberts, G. P. [Univ. Wisc.-Madison; Schwartz, D. C. [Univ. Wisc.-Madison

    2005-09-01

    Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.

  6. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array

    DEFF Research Database (Denmark)

    Eeles, Rosalind A; Olama, Ali Amin Al; Benlloch, Sara

    2013-01-01

    Prostate cancer is the most frequently diagnosed cancer in males in developed countries. To identify common prostate cancer susceptibility alleles, we genotyped 211,155 SNPs on a custom Illumina array (iCOGS) in blood DNA from 25,074 prostate cancer cases and 24,272 controls from the internationa...

  7. Investigation of isoniazid and ethionamide cross-resistance by whole genome sequencing and association with poor treatment outcomes of multidrug-resistant tuberculosis patients in South Africa

    Directory of Open Access Journals (Sweden)

    L Malinga

    2016-01-01

    Conclusion: Baseline ETH molecular resistance before second-line treatment is a concern. Unfavorable treatment outcomes of patients with ethA, ethR, and inhA mutations highlight the importance of genotypic testing before initiation of treatment containing ETH. The clinical significance of whole genome analysis for early detection of mutations predictive of treatment failure needs further investigation.

  8. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    International Nuclear Information System (INIS)

    Fröhlich, Eleonore; Meindl, Claudia; Wagner, Karin; Leitinger, Gerd; Roblegg, Eva

    2014-01-01

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays for NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. - Highlights: • Regulated functions were screened using whole genome expression assays. • Polystyrene particles regulated more genes than short carbon nanotubes. • Protein coating of polystyrene particles did not change regulation pattern. • Functions regulated by microarray were confirmed by cell-based assay

  9. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    Energy Technology Data Exchange (ETDEWEB)

    Fröhlich, Eleonore, E-mail: eleonore.froehlich@medunigraz.at [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Meindl, Claudia; Wagner, Karin [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Leitinger, Gerd [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Institute for Cell Biology, Histology and Embryology, Medical University of Graz, Harrachgasse 21, 8010 Graz (Austria); Roblegg, Eva [Institute of Pharmaceutical Sciences, Department of Pharmaceutical Technology, Karl-Franzens-University of Graz, Universitätsplatz 1, 8010 Graz (Austria)

    2014-10-15

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays for NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. - Highlights: • Regulated functions were screened using whole genome expression assays. • Polystyrene particles regulated more genes than short carbon nanotubes. • Protein coating of polystyrene particles did not change regulation pattern. • Functions regulated by microarray were confirmed by cell-based assay.

  10. Screening of whole genome sequences identified high-impact variants for stallion fertility.

    Science.gov (United States)

    Schrimpf, Rahel; Gottschalk, Maren; Metzger, Julia; Martinsson, Gunilla; Sieme, Harald; Distl, Ottmar

    2016-04-14

    Stallion fertility is an economically important trait due to the increase of artificial insemination in horses. The availability of whole genome sequence data facilitates identification of rare high-impact variants contributing to stallion fertility. The aim of our study was to genotype rare high-impact variants retrieved from next-generation sequencing (NGS)-data of 11 horses in order to unravel harmful genetic variants in large samples of stallions. Gene ontology (GO) terms and search results from public databases were used to obtain a comprehensive list of human und mice genes predicted to participate in the regulation of male reproduction. The corresponding equine orthologous genes were searched in whole genome sequence data of seven stallions and four mares and filtered for high-impact genetic variants using SnpEFF, SIFT and Polyphen 2 software. All genetic variants with the missing homozygous mutant genotype were genotyped on 337 fertile stallions of 19 breeds using KASP genotyping assays or PCR-RFLP. Mixed linear model analysis was employed for an association analysis with de-regressed estimated breeding values of the paternal component of the pregnancy rate per estrus (EBV-PAT). We screened next generation sequenced data of whole genomes from 11 horses for equine genetic variants in 1194 human and mice genes involved in male fertility and linked through common gene ontology (GO) with male reproductive processes. Variants were filtered for high-impact on protein structure and validated through SIFT and Polyphen 2. Only those genetic variants were followed up when the homozygote mutant genotype was missing in the detection sample comprising 11 horses. After this filtering process, 17 single nucleotide polymorphism (SNPs) were left. These SNPs were genotyped in 337 fertile stallions of 19 breeds using KASP genotyping assays or PCR-RFLP. An association analysis in 216 Hanoverian stallions revealed a significant association of the splice-site disruption variant

  11. The use of mycobacterial interspersed repetitive unit typing and whole genome sequencing to inform tuberculosis prevention and control activities.

    Science.gov (United States)

    Gilbert, Gwendolyn L; Sintchenko, Vitali

    2013-07-01

    Molecular strain typing of Mycobacterium tuberculosis has been possible for only about 20 years; it has significantly improved our understanding of the evolution and epidemiology of Mycobacterium tuberculosis and tuberculosis disease. Mycobacterial interspersed repetitive unit typing, based on 24 variable number tandem repeat unit loci, is highly discriminatory, relatively easy to perform and interpret and is currently the most widely used molecular typing system for tuberculosis surveillance. Nevertheless, clusters identified by mycobacterial interspersed repetitive unit typing sometimes cannot be confirmed or adequately defined by contact tracing and additional methods are needed. Recently, whole genome sequencing has been used to identify single nucleotide polymorphisms and other mutations, between genotypically indistinguishable isolates from the same cluster, to more accurately trace transmission pathways. Rapidly increasing speed and quality and reduced costs will soon make large scale whole genome sequencing feasible, combined with the use of sophisticated bioinformatics tools, for epidemiological surveillance of tuberculosis.

  12. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Directory of Open Access Journals (Sweden)

    Can Alkan

    2007-09-01

    Full Text Available The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  13. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Science.gov (United States)

    Alkan, Can; Ventura, Mario; Archidiacono, Nicoletta; Rocchi, Mariano; Sahinalp, S Cenk; Eichler, Evan E

    2007-09-01

    The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  14. Whole-genome sequencing of bloodstream Staphylococcus aureus isolates does not distinguish bacteraemia from endocarditis

    DEFF Research Database (Denmark)

    Lilje, Berit; Rasmussen, Rasmus Vedby; Dahl, Anders

    2017-01-01

    Most Staphylococcus aureus isolates can cause invasive disease given the right circumstances, but it is unknown if some isolates are more likely to cause severe infections than others. S. aureus bloodstream isolates from 120 patients with definite infective endocarditis and 121 with S. aureus...... bacteraemia without infective endocarditis underwent whole-genome sequencing. Genome-wide association analysis was performed using a variety of bioinformatics approaches including SNP analysis, accessory genome analysis and k-mer based analysis. Core and accessory genome analyses found no association...... with either of the two clinical groups. In this study, the genome sequences of S. aureus bloodstream isolates did not discriminate between bacteraemia and infective endocarditis. Based on our study and the current literature, it is not convincing that a specific S. aureus genotype is clearly associated...

  15. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits

    DEFF Research Database (Denmark)

    Voight, Benjamin F; Kang, Hyun Min; Ding, Jun

    2012-01-01

    Genome-wide association studies have identified hundreds of loci for type 2 diabetes, coronary artery disease and myocardial infarction, as well as for related traits such as body mass index, glucose and insulin levels, lipid levels, and blood pressure. These studies also have pointed to thousands...... dramatic cost efficiencies compared to designing single-trait follow-up reagents, and provides the opportunity to compare results across a range of related traits. The metabochip and similar custom genotyping arrays offer a powerful and cost-effective approach to follow-up large-scale genotyping...

  16. Whole genome re-sequencing reveals genome-wide variations among parental lines of 16 mapping populations in chickpea (Cicer arietinum L.).

    Science.gov (United States)

    Thudi, Mahendar; Khan, Aamir W; Kumar, Vinay; Gaur, Pooran M; Katta, Krishnamohan; Garg, Vanika; Roorkiwal, Manish; Samineni, Srinivasan; Varshney, Rajeev K

    2016-01-27

    Chickpea (Cicer arietinum L.) is the second most important grain legume cultivated by resource poor farmers in South Asia and Sub-Saharan Africa. In order to harness the untapped genetic potential available for chickpea improvement, we re-sequenced 35 chickpea genotypes representing parental lines of 16 mapping populations segregating for abiotic (drought, heat, salinity), biotic stresses (Fusarium wilt, Ascochyta blight, Botrytis grey mould, Helicoverpa armigera) and nutritionally important (protein content) traits using whole genome re-sequencing approach. A total of 192.19 Gb data, generated on 35 genotypes of chickpea, comprising 973.13 million reads, with an average sequencing depth of ~10 X for each line. On an average 92.18 % reads from each genotype were aligned to the chickpea reference genome with 82.17 % coverage. A total of 2,058,566 unique single nucleotide polymorphisms (SNPs) and 292,588 Indels were detected while comparing with the reference chickpea genome. Highest number of SNPs were identified on the Ca4 pseudomolecule. In addition, copy number variations (CNVs) such as gene deletions and duplications were identified across the chickpea parental genotypes, which were minimum in PI 489777 (1 gene deletion) and maximum in JG 74 (1,497). A total of 164,856 line specific variations (144,888 SNPs and 19,968 Indels) with the highest percentage were identified in coding regions in ICC 1496 (21 %) followed by ICCV 97105 (12 %). Of 539 miscellaneous variations, 339, 138 and 62 were inter-chromosomal variations (CTX), intra-chromosomal variations (ITX) and inversions (INV) respectively. Genome-wide SNPs, Indels, CNVs, PAVs, and miscellaneous variations identified in different mapping populations are a valuable resource in genetic research and helpful in locating genes/genomic segments responsible for economically important traits. Further, the genome-wide variations identified in the present study can be used for developing high density SNP arrays for

  17. Assessment of the Roche Linear Array HPV Genotyping Test within the VALGENT framework.

    Science.gov (United States)

    Xu, Lan; Oštrbenk, Anja; Poljak, Mario; Arbyn, Marc

    2018-01-01

    Cervical cancer screening programs are switching from cytology-based screening to high-risk (hr) HPV testing. Only clinically validated tests should be used in clinical practice. To assess the clinical performance of the Roche Linear Array HPV genotyping test (Linear Array) within the VALGENT-3 framework. The VALGENT framework is designed for comprehensive comparison and clinical validation of HPV tests that have limited to extended genotyping capacity. The Linear Array enables type-specific detection of 37 HPV types. For the purpose of this study, Linear Array results were designated as positive only if one of the 13 hrHPV types also included in the Hybrid Capture 2 (HC2) was detected. The VALGENT-3 framework comprised 1600 samples obtained from Slovenian women (1300 sequential cases from routine cervical cancer screening enriched with 300 cytological abnormal samples). Sensitivity for cervical intraepithelial neoplasia of grade 2 or worse (CIN2+) (n=127) and specificity for Linear Array and for HC2 and non-inferiority of Linear Array relative to HC2 was checked. In addition, the prevalence of separate hrHPV types in the screening population, as well as the concordance for presence of HPV16, HPV18 and other hrHPV types between Linear Array and the Abbott RealTime High Risk HPV test (RealTime) were assessed. The clinical sensitivity and specificity for CIN2+ of the Linear Array in the total study population was 97.6% (95% CI, 93.3-99.5%) and 91.7% (95% CI, 90.0-93.2%), respectively. The relative sensitivity and specificity of Linear Array vs HC2 was 1.02 [95% CI, 0.98-1.05, (pLinear Array in the screening population was 10.5% (95% CI, 8.9-12.3%) with HPV16 and HPV18 detected in 2.3% and 0.9% of the samples, respectively. Excellent agreement for presence or absence of HPV16, HPV18 and other hrHPV between Linear Array and RealTime was observed. Linear Array showed similar sensitivity with higher specificity to detect CIN2+ compared to HC2. Detection of 13 hrHPV types

  18. Genome U-Plot: a whole genome visualization.

    Science.gov (United States)

    Gaitatzes, Athanasios; Johnson, Sarah H; Smadbeck, James B; Vasmatzis, George

    2018-05-15

    The ability to produce and analyze whole genome sequencing (WGS) data from samples with structural variations (SV) generated the need to visualize such abnormalities in simplified plots. Conventional two-dimensional representations of WGS data frequently use either circular or linear layouts. There are several diverse advantages regarding both these representations, but their major disadvantage is that they do not use the two-dimensional space very efficiently. We propose a layout, termed the Genome U-Plot, which spreads the chromosomes on a two-dimensional surface and essentially quadruples the spatial resolution. We present the Genome U-Plot for producing clear and intuitive graphs that allows researchers to generate novel insights and hypotheses by visualizing SVs such as deletions, amplifications, and chromoanagenesis events. The main features of the Genome U-Plot are its layered layout, its high spatial resolution and its improved aesthetic qualities. We compare conventional visualization schemas with the Genome U-Plot using visualization metrics such as number of line crossings and crossing angle resolution measures. Based on our metrics, we improve the readability of the resulting graph by at least 2-fold, making apparent important features and making it easy to identify important genomic changes. A whole genome visualization tool with high spatial resolution and improved aesthetic qualities. An implementation and documentation of the Genome U-Plot is publicly available at https://github.com/gaitat/GenomeUPlot. vasmatzis.george@mayo.edu. Supplementary data are available at Bioinformatics online.

  19. Vitis phylogenomics: hybridization intensities from a SNP array outperform genotype calls.

    Directory of Open Access Journals (Sweden)

    Allison J Miller

    Full Text Available Understanding relationships among species is a fundamental goal of evolutionary biology. Single nucleotide polymorphisms (SNPs identified through next generation sequencing and related technologies enable phylogeny reconstruction by providing unprecedented numbers of characters for analysis. One approach to SNP-based phylogeny reconstruction is to identify SNPs in a subset of individuals, and then to compile SNPs on an array that can be used to genotype additional samples at hundreds or thousands of sites simultaneously. Although powerful and efficient, this method is subject to ascertainment bias because applying variation discovered in a representative subset to a larger sample favors identification of SNPs with high minor allele frequencies and introduces bias against rare alleles. Here, we demonstrate that the use of hybridization intensity data, rather than genotype calls, reduces the effects of ascertainment bias. Whereas traditional SNP calls assess known variants based on diversity housed in the discovery panel, hybridization intensity data survey variation in the broader sample pool, regardless of whether those variants are present in the initial SNP discovery process. We apply SNP genotype and hybridization intensity data derived from the Vitis9kSNP array developed for grape to show the effects of ascertainment bias and to reconstruct evolutionary relationships among Vitis species. We demonstrate that phylogenies constructed using hybridization intensities suffer less from the distorting effects of ascertainment bias, and are thus more accurate than phylogenies based on genotype calls. Moreover, we reconstruct the phylogeny of the genus Vitis using hybridization data, show that North American subgenus Vitis species are monophyletic, and resolve several previously poorly known relationships among North American species. This study builds on earlier work that applied the Vitis9kSNP array to evolutionary questions within Vitis vinifera

  20. Development and Applications of a High Throughput Genotyping Tool for Polyploid Crops: Single Nucleotide Polymorphism (SNP Array

    Directory of Open Access Journals (Sweden)

    Qian You

    2018-02-01

    Full Text Available Polypoid species play significant roles in agriculture and food production. Many crop species are polyploid, such as potato, wheat, strawberry, and sugarcane. Genotyping has been a daunting task for genetic studies of polyploid crops, which lags far behind the diploid crop species. Single nucleotide polymorphism (SNP array is considered to be one of, high-throughput, relatively cost-efficient and automated genotyping approaches. However, there are significant challenges for SNP identification in complex, polyploid genomes, which has seriously slowed SNP discovery and array development in polyploid species. Ploidy is a significant factor impacting SNP qualities and validation rates of SNP markers in SNP arrays, which has been proven to be a very important tool for genetic studies and molecular breeding. In this review, we (1 discussed the pros and cons of SNP array in general for high throughput genotyping, (2 presented the challenges of and solutions to SNP calling in polyploid species, (3 summarized the SNP selection criteria and considerations of SNP array design for polyploid species, (4 illustrated SNP array applications in several different polyploid crop species, then (5 discussed challenges, available software, and their accuracy comparisons for genotype calling based on SNP array data in polyploids, and finally (6 provided a series of SNP array design and genotype calling recommendations. This review presents a complete overview of SNP array development and applications in polypoid crops, which will benefit the research in molecular breeding and genetics of crops with complex genomes.

  1. Advanced statistical tools for SNP arrays : signal calibration, copy number estimation and single array genotyping

    NARCIS (Netherlands)

    Rippe, Ralph Christian Alexander

    2012-01-01

    Fluorescence bias in in signals from individual SNP arrays can be calibrated using linear models. Given the data, the system of equations is very large, so a specialized symbolic algorithm was developed. These models are also used to illustrate that genomic waves do not exist, but are merely an

  2. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected...... with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index...... itself. Depending on the trait’s economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage...

  3. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits.

    Directory of Open Access Journals (Sweden)

    Benjamin F Voight

    Full Text Available Genome-wide association studies have identified hundreds of loci for type 2 diabetes, coronary artery disease and myocardial infarction, as well as for related traits such as body mass index, glucose and insulin levels, lipid levels, and blood pressure. These studies also have pointed to thousands of loci with promising but not yet compelling association evidence. To establish association at additional loci and to characterize the genome-wide significant loci by fine-mapping, we designed the "Metabochip," a custom genotyping array that assays nearly 200,000 SNP markers. Here, we describe the Metabochip and its component SNP sets, evaluate its performance in capturing variation across the allele-frequency spectrum, describe solutions to methodological challenges commonly encountered in its analysis, and evaluate its performance as a platform for genotype imputation. The metabochip achieves dramatic cost efficiencies compared to designing single-trait follow-up reagents, and provides the opportunity to compare results across a range of related traits. The metabochip and similar custom genotyping arrays offer a powerful and cost-effective approach to follow-up large-scale genotyping and sequencing studies and advance our understanding of the genetic basis of complex human diseases and traits.

  4. SNP Discovery and Development of a High-Density Genotyping Array for Sunflower

    Science.gov (United States)

    Bachlava, Eleni; Taylor, Christopher A.; Tang, Shunxue; Bowers, John E.; Mandel, Jennifer R.; Burke, John M.; Knapp, Steven J.

    2012-01-01

    Recent advances in next-generation DNA sequencing technologies have made possible the development of high-throughput SNP genotyping platforms that allow for the simultaneous interrogation of thousands of single-nucleotide polymorphisms (SNPs). Such resources have the potential to facilitate the rapid development of high-density genetic maps, and to enable genome-wide association studies as well as molecular breeding approaches in a variety of taxa. Herein, we describe the development of a SNP genotyping resource for use in sunflower (Helianthus annuus L.). This work involved the development of a reference transcriptome assembly for sunflower, the discovery of thousands of high quality SNPs based on the generation and analysis of ca. 6 Gb of transcriptome re-sequencing data derived from multiple genotypes, the selection of 10,640 SNPs for inclusion in the genotyping array, and the use of the resulting array to screen a diverse panel of sunflower accessions as well as related wild species. The results of this work revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, greater than 95% of successful SNP assays revealed polymorphism, and more than 90% of these assays could be successfully transferred to related wild species. Analysis of the polymorphism data revealed patterns of genetic differentiation that were largely congruent with the evolutionary history of sunflower, though the large number of markers allowed for finer resolution than has previously been possible. PMID:22238659

  5. High-throughput single nucleotide polymorphism genotyping using nanofluidic Dynamic Arrays

    Directory of Open Access Journals (Sweden)

    Crenshaw Andrew

    2009-01-01

    Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs have emerged as the genetic marker of choice for mapping disease loci and candidate gene association studies, because of their high density and relatively even distribution in the human genomes. There is a need for systems allowing medium multiplexing (ten to hundreds of SNPs with high throughput, which can efficiently and cost-effectively generate genotypes for a very large sample set (thousands of individuals. Methods that are flexible, fast, accurate and cost-effective are urgently needed. This is also important for those who work on high throughput genotyping in non-model systems where off-the-shelf assays are not available and a flexible platform is needed. Results We demonstrate the use of a nanofluidic Integrated Fluidic Circuit (IFC - based genotyping system for medium-throughput multiplexing known as the Dynamic Array, by genotyping 994 individual human DNA samples on 47 different SNP assays, using nanoliter volumes of reagents. Call rates of greater than 99.5% and call accuracies of greater than 99.8% were achieved from our study, which demonstrates that this is a formidable genotyping platform. The experimental set up is very simple, with a time-to-result for each sample of about 3 hours. Conclusion Our results demonstrate that the Dynamic Array is an excellent genotyping system for medium-throughput multiplexing (30-300 SNPs, which is simple to use and combines rapid throughput with excellent call rates, high concordance and low cost. The exceptional call rates and call accuracy obtained may be of particular interest to those working on validation and replication of genome- wide- association (GWA studies.

  6. Development and validation of a high density SNP genotyping array for Atlantic salmon (Salmo salar)

    Science.gov (United States)

    2014-01-01

    Background Dense single nucleotide polymorphism (SNP) genotyping arrays provide extensive information on polymorphic variation across the genome of species of interest. Such information can be used in studies of the genetic architecture of quantitative traits and to improve the accuracy of selection in breeding programs. In Atlantic salmon (Salmo salar), these goals are currently hampered by the lack of a high-density SNP genotyping platform. Therefore, the aim of the study was to develop and test a dense Atlantic salmon SNP array. Results SNP discovery was performed using extensive deep sequencing of Reduced Representation (RR-Seq), Restriction site-Associated DNA (RAD-Seq) and mRNA (RNA-Seq) libraries derived from farmed and wild Atlantic salmon samples (n = 283) resulting in the discovery of > 400 K putative SNPs. An Affymetrix Axiom® myDesign Custom Array was created and tested on samples of animals of wild and farmed origin (n = 96) revealing a total of 132,033 polymorphic SNPs with high call rate, good cluster separation on the array and stable Mendelian inheritance in our sample. At least 38% of these SNPs are from transcribed genomic regions and therefore more likely to include functional variants. Linkage analysis utilising the lack of male recombination in salmonids allowed the mapping of 40,214 SNPs distributed across all 29 pairs of chromosomes, highlighting the extensive genome-wide coverage of the SNPs. An identity-by-state clustering analysis revealed that the array can clearly distinguish between fish of different origins, within and between farmed and wild populations. Finally, Y-chromosome-specific probes included on the array provide an accurate molecular genetic test for sex. Conclusions This manuscript describes the first high-density SNP genotyping array for Atlantic salmon. This array will be publicly available and is likely to be used as a platform for high-resolution genetics research into traits of evolutionary and economic importance in

  7. Whole genome sequencing: an efficient approach to ensuring food safety

    Science.gov (United States)

    Lakicevic, B.; Nastasijevic, I.; Dimitrijevic, M.

    2017-09-01

    Whole genome sequencing is an effective, powerful tool that can be applied to a wide range of public health and food safety applications. A major difference between WGS and the traditional typing techniques is that WGS allows all genes to be included in the analysis, instead of a well-defined subset of genes or variable intergenic regions. Also, the use of WGS can facilitate the understanding of contamination/colonization routes of foodborne pathogens within the food production environment, and can also afford efficient tracking of pathogens’ entry routes and distribution from farm-to-consumer. Tracking foodborne pathogens in the food processing-distribution-retail-consumer continuum is of the utmost importance for facilitation of outbreak investigations and rapid action in controlling/preventing foodborne outbreaks. Therefore, WGS likely will replace most of the numerous workflows used in public health laboratories to characterize foodborne pathogens into one consolidated, efficient workflow.

  8. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  9. Methyl-Analyzer--whole genome DNA methylation profiling.

    Science.gov (United States)

    Xin, Yurong; Ge, Yongchao; Haghighi, Fatemeh G

    2011-08-15

    Methyl-Analyzer is a python package that analyzes genome-wide DNA methylation data produced by the Methyl-MAPS (methylation mapping analysis by paired-end sequencing) method. Methyl-MAPS is an enzymatic-based method that uses both methylation-sensitive and -dependent enzymes covering >80% of CpG dinucleotides within mammalian genomes. It combines enzymatic-based approaches with high-throughput next-generation sequencing technology to provide whole genome DNA methylation profiles. Methyl-Analyzer processes and integrates sequencing reads from methylated and unmethylated compartments and estimates CpG methylation probabilities at single base resolution. Methyl-Analyzer is available at http://github.com/epigenomics/methylmaps. Sample dataset is available for download at http://epigenomicspub.columbia.edu/methylanalyzer_data.html. fgh3@columbia.edu Supplementary data are available at Bioinformatics online.

  10. Whole genome sequence phylogenetic analysis of four Mexican rabies viruses isolated from cattle.

    Science.gov (United States)

    Bárcenas-Reyes, I; Loza-Rubio, E; Cantó-Alarcón, G J; Luna-Cozar, J; Enríquez-Vázquez, A; Barrón-Rodríguez, R J; Milián-Suazo, F

    2017-08-01

    Phylogenetic analysis of the rabies virus in molecular epidemiology has been traditionally performed on partial sequences of the genome, such as the N, G, and P genes; however, that approach raises concerns about the discriminatory power compared to whole genome sequencing. In this study we characterized four strains of the rabies virus isolated from cattle in Querétaro, Mexico by comparing the whole genome sequence to that of strains from the American, European and Asian continents. Four cattle brain samples positive to rabies and characterized as AgV11, genotype 1, were used in the study. A cDNA sequence was generated by reverse transcription PCR (RT-PCR) using oligo dT. cDNA samples were sequenced in an Illumina NextSeq 500 platform. The phylogenetic analysis was performed with MEGA 6.0. Minimum evolution phylogenetic trees were constructed with the Neighbor-Joining method and bootstrapped with 1000 replicates. Three large and seven small clusters were formed with the 26 sequences used. The largest cluster grouped strains from different species in South America: Brazil, and the French Guyana. The second cluster grouped five strains from Mexico. A Mexican strain reported in a different study was highly related to our four strains, suggesting common source of infection. The phylogenetic analysis shows that the type of host is different for the different regions in the American Continent; rabies is more related to bats. It was concluded that the rabies virus in central Mexico is genetically stable and that it is transmitted by the vampire bat Desmodus rotundus. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Integrating Crop Growth Models with Whole Genome Prediction through Approximate Bayesian Computation.

    Directory of Open Access Journals (Sweden)

    Frank Technow

    Full Text Available Genomic selection, enabled by whole genome prediction (WGP methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (G×E, continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs attempt to represent the impact of functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of G×E and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC, a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with synthetic data sets. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising and novel approach to improving prediction accuracy for some of the most challenging scenarios in plant breeding and applied genetics.

  12. Diversity arrays technology (DArT) markers in apple for genetic linkage maps

    OpenAIRE

    Schouten, H.J.; Weg, van de, W.E.; Carling, J.; Khan, S.A.; McKay, S.J.; Kaauwen, van, M.P.W.

    2012-01-01

    Diversity Arrays Technology (DArT) provides a high-throughput whole-genome genotyping platform for the detection and scoring of hundreds of polymorphic loci without any need for prior sequence information. The work presented here details the development and performance of a DArT genotyping array for apple. This is the first paper on DArT in horticultural trees. Genetic mapping of DArT markers in two mapping populations and their integration with other marker types showed that DArT is a powerf...

  13. Diversity arrays technology (DArT) markers in apple for genetic linkage maps

    OpenAIRE

    Schouten, Henk J.; van de Weg, W. Eric; Carling, Jason; Khan, Sabaz Ali; McKay, Steven J.; van Kaauwen, Martijn P. W.; Wittenberg, Alexander H. J.; Koehorst-van Putten, Herma J. J.; Noordijk, Yolanda; Gao, Zhongshan; Rees, D. Jasper G.; Van Dyk, Maria M.; Jaccoud, Damian; Considine, Michael J.; Kilian, Andrzej

    2011-01-01

    Diversity Arrays Technology (DArT) provides a high-throughput whole-genome genotyping platform for the detection and scoring of hundreds of polymorphic loci without any need for prior sequence information. The work presented here details the development and performance of a DArT genotyping array for apple. This is the first paper on DArT in horticultural trees. Genetic mapping of DArT markers in two mapping populations and their integration with other marker types showed that DArT is a powerf...

  14. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  15. Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.

    Science.gov (United States)

    Johnson, Eric O; Hancock, Dana B; Levy, Joshua L; Gaddis, Nathan C; Saccone, Nancy L; Bierut, Laura J; Page, Grier P

    2013-05-01

    A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.

  16. Exome genotyping arrays to identify rare and low frequency variants associated with epithelial ovarian cancer risk

    Science.gov (United States)

    Permuth, Jennifer B.; Pirie, Ailith; Ann Chen, Y.; Lin, Hui-Yi; Reid, Brett M.; Chen, Zhihua; Monteiro, Alvaro; Dennis, Joe; Mendoza-Fandino, Gustavo; Anton-Culver, Hoda; Bandera, Elisa V.; Bisogna, Maria; Brinton, Louise; Brooks-Wilson, Angela; Carney, Michael E.; Chenevix-Trench, Georgia; Cook, Linda S.; Cramer, Daniel W.; Cunningham, Julie M.; Cybulski, Cezary; D’Aloisio, Aimee A.; Anne Doherty, Jennifer; Earp, Madalene; Edwards, Robert P.; Fridley, Brooke L.; Gayther, Simon A.; Gentry-Maharaj, Aleksandra; Goodman, Marc T.; Gronwald, Jacek; Hogdall, Estrid; Iversen, Edwin S.; Jakubowska, Anna; Jensen, Allan; Karlan, Beth Y.; Kelemen, Linda E.; Kjaer, Suzanne K.; Kraft, Peter; Le, Nhu D.; Levine, Douglas A.; Lissowska, Jolanta; Lubinski, Jan; Matsuo, Keitaro; Menon, Usha; Modugno, Rosemary; Moysich, Kirsten B.; Nakanishi, Toru; Ness, Roberta B.; Olson, Sara; Orlow, Irene; Pearce, Celeste L.; Pejovic, Tanja; Poole, Elizabeth M.; Ramus, Susan J.; Anne Rossing, Mary; Sandler, Dale P.; Shu, Xiao-Ou; Song, Honglin; Taylor, Jack A.; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J.; Tworoger, Shelley S.; Webb, Penelope M.; Wentzensen, Nicolas; Wilkens, Lynne R.; Winham, Stacey; Woo, Yin-Ling; Wu, Anna H.; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Phelan, Catherine M.; Schildkraut, Joellen M.; Berchuck, Andrew; Goode, Ellen L.; Pharoah, Paul D. P.; Sellers, Thomas A.

    2016-01-01

    Rare and low frequency variants are not well covered in most germline genotyping arrays and are understudied in relation to epithelial ovarian cancer (EOC) risk. To address this gap, we used genotyping arrays targeting rarer protein-coding variation in 8,165 EOC cases and 11,619 controls from the international Ovarian Cancer Association Consortium (OCAC). Pooled association analyses were conducted at the variant and gene level for 98,543 variants directly genotyped through two exome genotyping projects. Only common variants that represent or are in strong linkage disequilibrium (LD) with previously-identified signals at established loci reached traditional thresholds for exome-wide significance (P  P≥5.0 ×10 − 7) were detected for rare and low-frequency variants at 16 novel loci. Four rare missense variants were identified (ACTBL2 rs73757391 (5q11.2), BTD rs200337373 (3p25.1), KRT13 rs150321809 (17q21.2) and MC2R rs104894658 (18p11.21)), but only MC2R rs104894668 had a large effect size (OR = 9.66). Genes most strongly associated with EOC risk included ACTBL2 (PAML = 3.23 × 10 − 5; PSKAT-o = 9.23 × 10 − 4) and KRT13 (PAML = 1.67 × 10 − 4; PSKAT-o = 1.07 × 10 − 5), reaffirming variant-level analysis. In summary, this large study identified several rare and low-frequency variants and genes that may contribute to EOC susceptibility, albeit with possible small effects. Future studies that integrate epidemiology, sequencing, and functional assays are needed to further unravel the unexplained heritability and biology of this disease. PMID:27378695

  17. Deep whole-genome sequencing of 90 Han Chinese genomes.

    Science.gov (United States)

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000

  18. Sequence analysis of the whole genomes of five African human G9 rotavirus strains.

    Science.gov (United States)

    Nyaga, Martin M; Jere, Khuzwayo C; Peenze, Ina; Mlera, Luwanika; van Dijk, Alberdina A; Seheri, Mapaseka L; Mphahlele, M Jeffrey

    2013-06-01

    The G9 rotaviruses are amongst the most common global rotavirus strains causing severe childhood diarrhoea. However, the whole genomes of only a few G9 rotaviruses have been fully sequenced and characterised of which only one G9P[6] and one G9P[8] are from Africa. We determined the consensus sequence of the whole genomes of five African human group A G9 rotavirus strains, four G9P[8] strains and one G9P[6] strain collected in Cameroon (central Africa), Kenya (eastern Africa), South Africa and Zimbabwe (southern Africa) in 1999, 2009 and 2010. Strain RVA/Human-wt/ZWE/MRC-DPRU1723/2009/G9P[8] from Zimbabwe, RVA/Human-wt/ZAF/MRC-DPRU4677/2010/G9P[8] from South Africa, RVA/Human-wt/CMR/1424/2009/G9P[8] from Cameroon and RVA/Human-wt/KEN/MRC-DPRU2427/2010/G9P[8] from Kenya were on a Wa-like genetic backbone and were genotyped as G9-P[8]-I1-R1-C1-M1-A1-N1-T1-E1-H1. Strain RVA/Human-wt/ZAF/MRC-DPRU9317/1999/G9P[6] from South Africa was genotyped as G9-P[6]-I2-R2-C2-M2-A2-N1-T2-E2-H2. Rotavirus A strain MRC-DPRU9317 is the second G9 strain to be reported on a DS-1-like genetic backbone, the other being RVA/Human-wt/ZAF/GR10924/1999/G9P[6]. MRC-DPRU9317 was found to be a reassortant between DS-1-like (I2, R2, C2, M2, A2, T2, E2 and H2) and Wa-like (N1) genome segments. All the genome segments of the five strains grouped strictly according to their genotype Wa- or DS-1-like clusters. Within their respective genotypes, the genome segments of the three G9 study strains from southern Africa clustered most closely with rotaviruses from the same geographical origin and with those with the same G and P types. The highest nucleotide identity of genome segments of the study strains from eastern and central Africa regions on a Wa-like backbone was not limited to rotaviruses with G9P[8] genotypes only, they were also closely related to G12P[6], G8P[8], G1P[8] and G11P[25] rotaviruses, indicating a close inter-genotype relationship between the G9 and other rotavirus genotypes

  19. Development and validation of the Axiom(®) Apple480K SNP genotyping array.

    Science.gov (United States)

    Bianco, Luca; Cestaro, Alessandro; Linsmith, Gareth; Muranty, Hélène; Denancé, Caroline; Théron, Anthony; Poncet, Charles; Micheletti, Diego; Kerschbamer, Emanuela; Di Pierro, Erica A; Larger, Simone; Pindo, Massimo; Van de Weg, Eric; Davassi, Alessandro; Laurens, François; Velasco, Riccardo; Durel, Charles-Eric; Troggio, Michela

    2016-04-01

    Cultivated apple (Malus × domestica Borkh.) is one of the most important fruit crops in temperate regions, and has great economic and cultural value. The apple genome is highly heterozygous and has undergone a recent duplication which, combined with a rapid linkage disequilibrium decay, makes it difficult to perform genome-wide association (GWA) studies. Single nucleotide polymorphism arrays offer highly multiplexed assays at a relatively low cost per data point and can be a valid tool for the identification of the markers associated with traits of interest. Here, we describe the development and validation of a 487K SNP Affymetrix Axiom(®) genotyping array for apple and discuss its potential applications. The array has been built from the high-depth resequencing of 63 different cultivars covering most of the genetic diversity in cultivated apple. The SNPs were chosen by applying a focal points approach to enrich genic regions, but also to reach a uniform coverage of non-genic regions. A total of 1324 apple accessions, including the 92 progenies of two mapping populations, have been genotyped with the Axiom(®) Apple480K to assess the effectiveness of the array. A large majority of SNPs (359 994 or 74%) fell in the stringent class of poly high resolution polymorphisms. We also devised a filtering procedure to identify a subset of 275K very robust markers that can be safely used for germplasm surveys in apple. The Axiom(®) Apple480K has now been commercially released both for public and proprietary use and will likely be a reference tool for GWA studies in apple. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  20. Whole-genome landscape of pancreatic neuroendocrine tumours.

    Science.gov (United States)

    Scarpa, Aldo; Chang, David K; Nones, Katia; Corbo, Vincenzo; Patch, Ann-Marie; Bailey, Peter; Lawlor, Rita T; Johns, Amber L; Miller, David K; Mafficini, Andrea; Rusev, Borislav; Scardoni, Maria; Antonello, Davide; Barbi, Stefano; Sikora, Katarzyna O; Cingarlini, Sara; Vicentini, Caterina; McKay, Skye; Quinn, Michael C J; Bruxner, Timothy J C; Christ, Angelika N; Harliwong, Ivon; Idrisoglu, Senel; McLean, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wilson, Peter J; Anderson, Matthew J; Fink, J Lynn; Newell, Felicity; Waddell, Nick; Holmes, Oliver; Kazakoff, Stephen H; Leonard, Conrad; Wood, Scott; Xu, Qinying; Nagaraj, Shivashankar Hiriyur; Amato, Eliana; Dalai, Irene; Bersani, Samantha; Cataldo, Ivana; Dei Tos, Angelo P; Capelli, Paola; Davì, Maria Vittoria; Landoni, Luca; Malpaga, Anna; Miotto, Marco; Whitehall, Vicki L J; Leggett, Barbara A; Harris, Janelle L; Harris, Jonathan; Jones, Marc D; Humphris, Jeremy; Chantrill, Lorraine A; Chin, Venessa; Nagrial, Adnan M; Pajic, Marina; Scarlett, Christopher J; Pinho, Andreia; Rooman, Ilse; Toon, Christopher; Wu, Jianmin; Pinese, Mark; Cowley, Mark; Barbour, Andrew; Mawson, Amanda; Humphrey, Emily S; Colvin, Emily K; Chou, Angela; Lovell, Jessica A; Jamieson, Nigel B; Duthie, Fraser; Gingras, Marie-Claude; Fisher, William E; Dagg, Rebecca A; Lau, Loretta M S; Lee, Michael; Pickett, Hilda A; Reddel, Roger R; Samra, Jaswinder S; Kench, James G; Merrett, Neil D; Epari, Krishna; Nguyen, Nam Q; Zeps, Nikolajs; Falconi, Massimo; Simbolo, Michele; Butturini, Giovanni; Van Buren, George; Partelli, Stefano; Fassan, Matteo; Khanna, Kum Kum; Gill, Anthony J; Wheeler, David A; Gibbs, Richard A; Musgrove, Elizabeth A; Bassi, Claudio; Tortora, Giampaolo; Pederzoli, Paolo; Pearson, John V; Waddell, Nicola; Biankin, Andrew V; Grimmond, Sean M

    2017-03-02

    The diagnosis of pancreatic neuroendocrine tumours (PanNETs) is increasing owing to more sensitive detection methods, and this increase is creating challenges for clinical management. We performed whole-genome sequencing of 102 primary PanNETs and defined the genomic events that characterize their pathogenesis. Here we describe the mutational signatures they harbour, including a deficiency in G:C > T:A base excision repair due to inactivation of MUTYH, which encodes a DNA glycosylase. Clinically sporadic PanNETs contain a larger-than-expected proportion of germline mutations, including previously unreported mutations in the DNA repair genes MUTYH, CHEK2 and BRCA2. Together with mutations in MEN1 and VHL, these mutations occur in 17% of patients. Somatic mutations, including point mutations and gene fusions, were commonly found in genes involved in four main pathways: chromatin remodelling, DNA damage repair, activation of mTOR signalling (including previously undescribed EWSR1 gene fusions), and telomere maintenance. In addition, our gene expression analyses identified a subgroup of tumours associated with hypoxia and HIF signalling.

  1. Whole genomes redefine the mutational landscape of pancreatic cancer.

    Science.gov (United States)

    Waddell, Nicola; Pajic, Marina; Patch, Ann-Marie; Chang, David K; Kassahn, Karin S; Bailey, Peter; Johns, Amber L; Miller, David; Nones, Katia; Quek, Kelly; Quinn, Michael C J; Robertson, Alan J; Fadlullah, Muhammad Z H; Bruxner, Tim J C; Christ, Angelika N; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Wilson, Peter J; Markham, Emma; Cloonan, Nicole; Anderson, Matthew J; Fink, J Lynn; Holmes, Oliver; Kazakoff, Stephen H; Leonard, Conrad; Newell, Felicity; Poudel, Barsha; Song, Sarah; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Wu, Jianmin; Pinese, Mark; Cowley, Mark J; Lee, Hong C; Jones, Marc D; Nagrial, Adnan M; Humphris, Jeremy; Chantrill, Lorraine A; Chin, Venessa; Steinmann, Angela M; Mawson, Amanda; Humphrey, Emily S; Colvin, Emily K; Chou, Angela; Scarlett, Christopher J; Pinho, Andreia V; Giry-Laterriere, Marc; Rooman, Ilse; Samra, Jaswinder S; Kench, James G; Pettitt, Jessica A; Merrett, Neil D; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Jamieson, Nigel B; Graham, Janet S; Niclou, Simone P; Bjerkvig, Rolf; Grützmann, Robert; Aust, Daniela; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Corbo, Vincenzo; Bassi, Claudio; Falconi, Massimo; Zamboni, Giuseppe; Tortora, Giampaolo; Tempero, Margaret A; Gill, Anthony J; Eshleman, James R; Pilarsky, Christian; Scarpa, Aldo; Musgrove, Elizabeth A; Pearson, John V; Biankin, Andrew V; Grimmond, Sean M

    2015-02-26

    Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded.

  2. Current Developments in Prokaryotic Single Cell Whole Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Goudeau, Danielle; Nath, Nandita; Ciobanu, Doina; Cheng, Jan-Fang; Malmstrom, Rex

    2014-03-14

    Our approach to prokaryotic single-cell Whole Genome Amplification at the JGI continues to evolve. To increase both the quality and number of single-cell genomes produced, we explore all aspects of the process from cell sorting to sequencing. For example, we now utilize specialized reagents, acoustic liquid handling, and reduced reaction volumes eliminate non-target DNA contamination in WGA reactions. More specifically, we use a cleaner commercial WGA kit from Qiagen that employs a UV decontamination procedure initially developed at the JGI, and we use the Labcyte Echo for tip-less liquid transfer to set up 2uL reactions. Acoustic liquid handling also dramatically reduces reagent costs. In addition, we are exploring new cell lysis methods including treatment with Proteinase K, lysozyme, and other detergents, in order to complement standard alkaline lysis and allow for more efficient disruption of a wider range of cells. Incomplete lysis represents a major hurdle for WGA on some environmental samples, especially rhizosphere, peatland, and other soils. Finding effective lysis strategies that are also compatible with WGA is challenging, and we are currently assessing the impact of various strategies on genome recovery.

  3. Whole genomes redefine the mutational landscape of pancreatic cancer

    Science.gov (United States)

    Waddell, Nicola; Pajic, Marina; Patch, Ann-Marie; Chang, David K.; Kassahn, Karin S.; Bailey, Peter; Johns, Amber L.; Miller, David; Nones, Katia; Quek, Kelly; Quinn, Michael C. J.; Robertson, Alan J.; Fadlullah, Muhammad Z. H.; Bruxner, Tim J. C.; Christ, Angelika N.; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Wilson, Peter J; Markham, Emma; Cloonan, Nicole; Anderson, Matthew J.; Fink, J. Lynn; Holmes, Oliver; Kazakoff, Stephen H.; Leonard, Conrad; Newell, Felicity; Poudel, Barsha; Song, Sarah; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Wu, Jianmin; Pinese, Mark; Cowley, Mark J.; Lee, Hong C.; Jones, Marc D.; Nagrial, Adnan M.; Humphris, Jeremy; Chantrill, Lorraine A.; Chin, Venessa; Steinmann, Angela M.; Mawson, Amanda; Humphrey, Emily S.; Colvin, Emily K.; Chou, Angela; Scarlett, Christopher J.; Pinho, Andreia V.; Giry-Laterriere, Marc; Rooman, Ilse; Samra, Jaswinder S.; Kench, James G.; Pettitt, Jessica A.; Merrett, Neil D.; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q.; Barbour, Andrew; Zeps, Nikolajs; Jamieson, Nigel B.; Graham, Janet S.; Niclou, Simone P.; Bjerkvig, Rolf; Grützmann, Robert; Aust, Daniela; Hruban, Ralph H.; Maitra, Anirban; Iacobuzio-Donahue, Christine A.; Wolfgang, Christopher L.; Morgan, Richard A.; Lawlor, Rita T.; Corbo, Vincenzo; Bassi, Claudio; Falconi, Massimo; Zamboni, Giuseppe; Tortora, Giampaolo; Tempero, Margaret A.; Gill, Anthony J.; Eshleman, James R.; Pilarsky, Christian; Scarpa, Aldo; Musgrove, Elizabeth A.; Pearson, John V.; Biankin, Andrew V.; Grimmond, Sean M.

    2015-01-01

    Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded. PMID:25719666

  4. Review:Whole genome amplification in preimplantation genetic diagnosis

    Institute of Scientific and Technical Information of China (English)

    Ying-ming ZHENG; Ning WANG; Lei LI; Fan JIN

    2011-01-01

    Preimplantation genetic diagnosis(PGD)refers to a procedure for genetically analyzing embryos prior to implantation,improving the chance of conception for patients at high risk of transmitting specific inherited disorders.This method has been widely used for a large number of genetic disorders since the first successful application in the early 1990s.Polymerase chain reaction(PCR)and fluorescent in situ hybridization(FISH)are the two main methods in PGD,but there are some inevitable shortcomings limiting the scope of genetic diagnosis.Fortunately,different whole genome amplification(WGA)techniques have been developed to overcome these problems.Sufficient DNA can be amplified and multiple tasks which need abundant DNA can be performed.Moreover,WGA products can be analyzed as a template for multi-loci and multi-gene during the subsequent DNA analysis.In this review,we will focus on the currently available WGA techniques and their applications,as well as the new technical trends from WGA products.

  5. Genomic V exons from whole genome shotgun data in reptiles.

    Science.gov (United States)

    Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

    2014-08-01

    Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).

  6. Differential retention of metabolic genes following whole-genome duplication.

    Science.gov (United States)

    Gout, Jean-François; Duret, Laurent; Kahn, Daniel

    2009-05-01

    Classical studies in Metabolic Control Theory have shown that metabolic fluxes usually exhibit little sensitivity to changes in individual enzyme activity, yet remain sensitive to global changes of all enzymes in a pathway. Therefore, little selective pressure is expected on the dosage or expression of individual metabolic genes, yet entire pathways should still be constrained. However, a direct estimate of this selective pressure had not been evaluated. Whole-genome duplications (WGDs) offer a good opportunity to address this question by analyzing the fates of metabolic genes during the massive gene losses that follow. Here, we take advantage of the successive rounds of WGD that occurred in the Paramecium lineage. We show that metabolic genes exhibit different gene retention patterns than nonmetabolic genes. Contrary to what was expected for individual genes, metabolic genes appeared more retained than other genes after the recent WGD, which was best explained by selection for gene expression operating on entire pathways. Metabolic genes also tend to be less retained when present at high copy number before WGD, contrary to other genes that show a positive correlation between gene retention and preduplication copy number. This is rationalized on the basis of the classical concave relationship relating metabolic fluxes with enzyme expression.

  7. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  8. Signatures of selection in tilapia revealed by whole genome resequencing.

    Science.gov (United States)

    Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua

    2015-09-16

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.

  9. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    Francioli, Laurent C.; Menelaou, Andronild; Pulit, Sara L.; Van Dijk, Freerk; Palamara, Pier Francesco; Elbers, Clara C.; Neerincx, Pieter B. T.; Ye, Kai; Guryev, Victor; Kloosterman, Wigard P.; Deelen, Patrick; Abdellaoui, Abdel; Van Leeuwen, Elisabeth M.; Van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F. J.; Karssen, Lennart C.; Kanterakis, Alexandros; Amin, Najaf; Hottenga, Jouke Jan; Lameijer, Eric-Wubbo; Kattenberg, Mathijs; Dijkstra, Martijn; Byelas, Heorhiy; Van Settenl, Jessica; Van Schaik, Barbera D. C.; Bot, Jan; Nijman, Isaac J.; Renkens, Ivo; Marscha, Tobias; Schonhuth, Alexander; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Polak, Paz; Sohail, Mashaal; Vuzman, Dana; Hormozdiari, Fereydoun; Van Enckevort, David; Mei, Hailiang; Koval, Vyacheslav; Moed, Ma-Tthijs H.; Van der Velde, K. Joeri; Rivadeneira, Fernando; Estrada, Karol; Medina-Gomez, Carolina; Isaacs, Aaron; Platteel, Mathieu; Swertz, Morris A.; Wijmenga, Cisca

    Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring

  10. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    The Genome of the Netherlands Consortium; T. Marschall (Tobias); A. Schönhuth (Alexander)

    2014-01-01

    htmlabstractWhole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch

  11. Exome genotyping arrays to identify rare and low frequency variants associated with epithelial ovarian cancer risk

    DEFF Research Database (Denmark)

    Permuth, Jennifer B; Pirie, Ailith; Ann Chen, Y

    2016-01-01

    Rare and low frequency variants are not well covered in most germline genotyping arrays and are understudied in relation to epithelial ovarian cancer (EOC) risk. To address this gap, we used genotyping arrays targeting rarer protein-coding variation in 8,165 EOC cases and 11,619 controls from...... that is in LD (r(2 )=( )0.90) with a previously identified 'best hit' (rs7651446) mapping to an intron of TIPARP. Suggestive associations (5.0 × 10 (-)  (5 )>( )P≥5.0 ×10 (-)  (7)) were detected for rare and low-frequency variants at 16 novel loci. Four rare missense variants were identified (ACTBL2 rs73757391.......67 × 10 (-)  (4); PSKAT-o = 1.07 × 10 (-)  (5)), reaffirming variant-level analysis. In summary, this large study identified several rare and low-frequency variants and genes that may contribute to EOC susceptibility, albeit with possible small effects. Future studies that integrate epidemiology...

  12. Prediction of Phenotypic Antimicrobial Resistance Profiles From Whole Genome Sequences of Non-typhoidal Salmonella enterica.

    Science.gov (United States)

    Neuert, Saskia; Nair, Satheesh; Day, Martin R; Doumith, Michel; Ashton, Philip M; Mellor, Kate C; Jenkins, Claire; Hopkins, Katie L; Woodford, Neil; de Pinna, Elizabeth; Godbole, Gauri; Dallman, Timothy J

    2018-01-01

    Surveillance of antimicrobial resistance (AMR) in non-typhoidal Salmonella enterica (NTS), is essential for monitoring transmission of resistance from the food chain to humans, and for establishing effective treatment protocols. We evaluated the prediction of phenotypic resistance in NTS from genotypic profiles derived from whole genome sequencing (WGS). Genes and chromosomal mutations responsible for phenotypic resistance were sought in WGS data from 3,491 NTS isolates received by Public Health England's Gastrointestinal Bacteria Reference Unit between April 2014 and March 2015. Inferred genotypic AMR profiles were compared with phenotypic susceptibilities determined for fifteen antimicrobials using EUCAST guidelines. Discrepancies between phenotypic and genotypic profiles for one or more antimicrobials were detected for 76 isolates (2.18%) although only 88/52,365 (0.17%) isolate/antimicrobial combinations were discordant. Of the discrepant results, the largest number were associated with streptomycin (67.05%, n = 59). Pan-susceptibility was observed in 2,190 isolates (62.73%). Overall, resistance to tetracyclines was most common (26.27% of isolates, n = 917) followed by sulphonamides (23.72%, n = 828) and ampicillin (21.43%, n = 748). Multidrug resistance (MDR), i.e., resistance to three or more antimicrobial classes, was detected in 848 isolates (24.29%) with resistance to ampicillin, streptomycin, sulphonamides and tetracyclines being the most common MDR profile ( n = 231; 27.24%). For isolates with this profile, all but one were S . Typhimurium and 94.81% ( n = 219) had the resistance determinants bla TEM-1, strA-strB, sul2 and tet (A). Extended-spectrum β-lactamase genes were identified in 41 isolates (1.17%) and multiple mutations in chromosomal genes associated with ciprofloxacin resistance in 82 isolates (2.35%). This study showed that WGS is suitable as a rapid means of determining AMR patterns of NTS for public health surveillance.

  13. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  14. Whole-genome sequencing reveals a potential causal mutation for dwarfism in the Miniature Shetland pony.

    Science.gov (United States)

    Metzger, Julia; Gast, Alana Christina; Schrimpf, Rahel; Rau, Janina; Eikelberg, Deborah; Beineke, Andreas; Hellige, Maren; Distl, Ottmar

    2017-04-01

    The Miniature Shetland pony represents a horse breed with an extremely small body size. Clinical examination of a dwarf Miniature Shetland pony revealed a lowered size at the withers, malformed skull and brachygnathia superior. Computed tomography (CT) showed a shortened maxilla and a cleft of the hard and soft palate which protruded into the nasal passage leading to breathing difficulties. Pathological examination confirmed these findings but did not reveal histopathological signs of premature ossification in limbs or cranial sutures. Whole-genome sequencing of this dwarf Miniature Shetland pony and comparative sequence analysis using 26 reference equids from NCBI Sequence Read Archive revealed three probably damaging missense variants which could be exclusively found in the affected foal. Validation of these three missense mutations in 159 control horses from different horse breeds and five donkeys revealed only the aggrecan (ACAN)-associated g.94370258G>C variant as homozygous wild-type in all control samples. The dwarf Miniature Shetland pony had the homozygous mutant genotype C/C of the ACAN:g.94370258G>C variant and the normal parents were heterozygous G/C. An unaffected full sib and 3/5 unaffected half-sibs were heterozygous G/C for the ACAN:g.94370258G>C variant. In summary, we could demonstrate a dwarf phenotype in a miniature pony breed perfectly associated with a missense mutation within the ACAN gene.

  15. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

    Directory of Open Access Journals (Sweden)

    Ruibang Luo

    2014-06-01

    Full Text Available This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels, BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads, or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  16. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

    Science.gov (United States)

    Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  17. Using whole genome sequencing to study American foulbrood epidemiology in honeybees.

    Directory of Open Access Journals (Sweden)

    Joakim Ågren

    Full Text Available American foulbrood (AFB, caused by Paenibacillus larvae, is a devastating disease in honeybees. In most countries, the disease is controlled through compulsory burning of symptomatic colonies causing major economic losses in apiculture. The pathogen is endemic to honeybees world-wide and is readily transmitted via the movement of hive equipment or bees. Molecular epidemiology of AFB currently largely relies on placing isolates in one of four ERIC-genotypes. However, a more powerful alternative is multi-locus sequence typing (MLST using whole-genome sequencing (WGS, which allows for high-resolution studies of disease outbreaks. To evaluate WGS as a tool for AFB-epidemiology, we applied core genome MLST (cgMLST on isolates from a recent outbreak of AFB in Sweden. The high resolution of the cgMLST allowed different bacterial clones involved in the disease outbreak to be identified and to trace the source of infection. The source was found to be a beekeeper who had sold bees to two other beekeepers, proving the epidemiological link between them. No such conclusion could have been made using conventional MLST or ERIC-typing. This is the first time that WGS has been used to study the epidemiology of AFB. The results show that the technique is very powerful for high-resolution tracing of AFB-outbreaks.

  18. Living laboratory: whole-genome sequencing as a learning healthcare enterprise.

    Science.gov (United States)

    Angrist, M; Jamal, L

    2015-04-01

    With the proliferation of affordable large-scale human genomic data come profound and vexing questions about management of such data and their clinical uncertainty. These issues challenge the view that genomic research on human beings can (or should) be fully segregated from clinical genomics, either conceptually or practically. Here, we argue that the sharp distinction between clinical care and research is especially problematic in the context of large-scale genomic sequencing of people with suspected genetic conditions. Core goals of both enterprises (e.g. understanding genotype-phenotype relationships; generating an evidence base for genomic medicine) are more likely to be realized at a population scale if both those ordering and those undergoing sequencing for diagnostic reasons are routinely and longitudinally studied. Rather than relying on expensive and lengthy randomized clinical trials and meta-analyses, we propose leveraging nascent clinical-research hybrid frameworks into a broader, more permanent instantiation of exploratory medical sequencing. Such an investment could enlighten stakeholders about the real-life challenges posed by whole-genome sequencing, such as establishing the clinical actionability of genetic variants, returning 'off-target' results to families, developing effective service delivery models and monitoring long-term outcomes. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  19. Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

    Science.gov (United States)

    van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

    2017-10-01

    Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially ( P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related. IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is

  20. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array

    Science.gov (United States)

    Eeles, Rosalind A; Olama, Ali Amin Al; Benlloch, Sara; Saunders, Edward J; Leongamornlert, Daniel A; Tymrakiewicz, Malgorzata; Ghoussaini, Maya; Luccarini, Craig; Dennis, Joe; Jugurnauth-Little, Sarah; Dadaev, Tokhir; Neal, David E; Hamdy, Freddie C; Donovan, Jenny L; Muir, Ken; Giles, Graham G; Severi, Gianluca; Wiklund, Fredrik; Gronberg, Henrik; Haiman, Christopher A; Schumacher, Fredrick; Henderson, Brian; Le Marchand, Loic; Lindstrom, Sara; Kraft, Peter; Hunter, David J; Gapstur, Susan; Chanock, Stephen J; Berndt, Sonja I; Albanes, Demetrius; Andriole, Gerald; Schleutker, Johanna; Weischer, Maren; Canzian, Federico; Riboli, Elio; Key, Tim J; Travis, Ruth; Campa, Daniele; Ingles, Sue A; John, Esther M; Hayes, Richard B; Pharoah, Paul DP; Pashayan, Nora; Khaw, Kay-Tee; Stanford, Janet; Ostrander, Elaine A; Signorello, Lisa B; Thibodeau, Stephen N; Schaid, Dan; Maier, Christiane; Vogel, Walther; Kibel, Adam S; Cybulski, Cezary; Lubinski, Jan; Cannon-Albright; Brenner, Hermann; Park, Jong Y; Kaneva, Radka; Batra, Jyotsna; Spurdle, Amanda B; Clements, Judith A; Teixeira, Manuel R; Dicks, Ed; Lee, Andrew; Dunning, Alison; Baynes, Caroline; Conroy, Don; Maranian, Melanie J; Ahmed, Shahana; Govindasami, Koveela; Guy, Michelle; Wilkinson, Rosemary A; Sawyer, Emma J; Morgan, Angela; Dearnaley, David P; Horwich, Alan; Huddart, Robert A; Khoo, Vincent S; Parker, Christopher C; Van As, Nicholas J; Woodhouse, J; Thompson, Alan; Dudderidge, Tim; Ogden, Chris; Cooper, Colin; Lophatananon, Artitaya; Cox, Angela; Southey, Melissa; Hopper, John L; English, Dallas R; Aly, Markus; Adolfsson, Jan; Xu, Jiangfeng; Zheng, Siqun; Yeager, Meredith; Kaaks, Rudolf; Diver, W Ryan; Gaudet, Mia M; Stern, Mariana; Corral, Roman; Joshi, Amit D; Shahabi, Ahva; Wahlfors, Tiina; Tammela, Teuvo J; Auvinen, Anssi; Virtamo, Jarmo; Klarskov, Peter; Nordestgaard, Børge G; Røder, Andreas; Nielsen, Sune F; Bojesen, Stig E; Siddiq, Afshan; FitzGerald, Liesel; Kolb, Suzanne; Kwon, Erika; Karyadi, Danielle; Blot, William J; Zheng, Wei; Cai, Qiuyin; McDonnell, Shannon K; Rinckleb, Antje; Drake, Bettina; Colditz, Graham; Wokolorczyk, Dominika; Stephenson, Robert A; Teerlink, Craig; Muller, Heiko; Rothenbacher, Dietrich; Sellers, Thomas A; Lin, Hui-Yi; Slavov, Chavdar; Mitev, Vanio; Lose, Felicity; Srinivasan, Srilakshmi; Maia, Sofia; Paulo, Paula; Lange, Ethan; Cooney, Kathleen A; Antoniou, Antonis; Vincent, Daniel; Bacot, François; Tessier; Kote-Jarai, Zsofia; Easton, Douglas F

    2013-01-01

    Prostate cancer is the most frequently diagnosed cancer in males in developed countries. To identify common prostate cancer susceptibility alleles, we genotyped 211,155 SNPs on a custom Illumina array (iCOGS) in blood DNA from 25,074 prostate cancer cases and 24,272 controls from the international PRACTICAL Consortium. Twenty-three new prostate cancer susceptibility loci were identified at genome-wide significance (P < 5 × 10−8). More than 70 prostate cancer susceptibility loci, explaining ~30% of the familial risk for this disease, have now been identified. On the basis of combined risks conferred by the new and previously known risk loci, the top 1% of the risk distribution has a 4.7-fold higher risk than the average of the population being profiled. These results will facilitate population risk stratification for clinical studies. PMID:23535732

  1. Whole genome analysis of selected human and animal rotaviruses identified in Uganda from 2012 to 2014 reveals complex genome reassortment events between human, bovine, caprine and porcine strains.

    Science.gov (United States)

    Bwogi, Josephine; Jere, Khuzwayo C; Karamagi, Charles; Byarugaba, Denis K; Namuwulya, Prossy; Baliraine, Frederick N; Desselberger, Ulrich; Iturriza-Gomara, Miren

    2017-01-01

    Rotaviruses of species A (RVA) are a common cause of diarrhoea in children and the young of various other mammals and birds worldwide. To investigate possible interspecies transmission of RVAs, whole genomes of 18 human and 6 domestic animal RVA strains identified in Uganda between 2012 and 2014 were sequenced using the Illumina HiSeq platform. The backbone of the human RVA strains had either a Wa- or a DS-1-like genetic constellation. One human strain was a Wa-like mono-reassortant containing a DS-1-like VP2 gene of possible animal origin. All eleven genes of one bovine RVA strain were closely related to those of human RVAs. One caprine strain had a mixed genotype backbone, suggesting that it emerged from multiple reassortment events involving different host species. The porcine RVA strains had mixed genotype backbones with possible multiple reassortant events with strains of human and bovine origin.Overall, whole genome characterisation of rotaviruses found in domestic animals in Uganda strongly suggested the presence of human-to animal RVA transmission, with concomitant circulation of multi-reassortant strains potentially derived from complex interspecies transmission events. However, whole genome data from the human RVA strains causing moderate and severe diarrhoea in under-fives in Uganda indicated that they were primarily transmitted from person-to-person.

  2. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach.

    Science.gov (United States)

    Kohl, Thomas A; Diel, Roland; Harmsen, Dag; Rothgänger, Jörg; Walter, Karen Meywald; Merker, Matthias; Weniger, Thomas; Niemann, Stefan

    2014-07-01

    Whole-genome sequencing (WGS) allows for effective tracing of Mycobacterium tuberculosis complex (MTBC) (tuberculosis pathogens) transmission. However, it is difficult to standardize and, therefore, is not yet employed for interlaboratory prospective surveillance. To allow its widespread application, solutions for data standardization and storage in an easily expandable database are urgently needed. To address this question, we developed a core genome multilocus sequence typing (cgMLST) scheme for clinical MTBC isolates using the Ridom SeqSphere(+) software, which transfers the genome-wide single nucleotide polymorphism (SNP) diversity into an allele numbering system that is standardized, portable, and not computationally intensive. To test its performance, we performed WGS analysis of 26 isolates with identical IS6110 DNA fingerprints and spoligotyping patterns from a longitudinal outbreak in the federal state of Hamburg, Germany (notified between 2001 and 2010). The cgMLST approach (3,041 genes) discriminated the 26 strains with a resolution comparable to that of SNP-based WGS typing (one major cluster of 22 identical or closely related and four outlier isolates with at least 97 distinct SNPs or 63 allelic variants). Resulting tree topologies are highly congruent and grouped the isolates in both cases analogously. Our data show that SNP- and cgMLST-based WGS analyses facilitate high-resolution discrimination of longitudinal MTBC outbreaks. cgMLST allows for a meaningful epidemiological interpretation of the WGS genotyping data. It enables standardized WGS genotyping for epidemiological investigations, e.g., on the regional public health office level, and the creation of web-accessible databases for global TB surveillance with an integrated early warning system. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  3. Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

    Directory of Open Access Journals (Sweden)

    Jonathan A Shortt

    2017-01-01

    Full Text Available In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies.We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample.This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species

  4. Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

    Science.gov (United States)

    Shortt, Jonathan A; Card, Daren C; Schield, Drew R; Liu, Yang; Zhong, Bo; Castoe, Todd A; Carlton, Elizabeth J; Pollock, David D

    2017-01-01

    In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other

  5. seXY: a tool for sex inference from genotype arrays.

    Science.gov (United States)

    Qian, David C; Busam, Jonathan A; Xiao, Xiangjun; O'Mara, Tracy A; Eeles, Rosalind A; Schumacher, Frederick R; Phelan, Catherine M; Amos, Christopher I

    2017-02-15

    Checking concordance between reported sex and genotype-inferred sex is a crucial quality control measure in genome-wide association studies (GWAS). However, limited insights exist regarding the true accuracy of software that infer sex from genotype array data. We present seXY, a logistic regression model trained on both X chromosome heterozygosity and Y chromosome missingness, that consistently demonstrated >99.5% sex inference accuracy in cross-validation for 889 males and 5,361 females enrolled in prostate cancer and ovarian cancer GWAS. Compared to PLINK, one of the most popular tools for sex inference in GWAS that assesses only X chromosome heterozygosity, seXY achieved marginally better male classification and 3% more accurate female classification. https://github.com/Christopher-Amos-Lab/seXY. Christopher.I.Amos@dartmouth.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  6. Whole-genome sequencing of a laboratory-evolved yeast strain

    Directory of Open Access Journals (Sweden)

    Dunham Maitreya J

    2010-02-01

    Full Text Available Abstract Background Experimental evolution of microbial populations provides a unique opportunity to study evolutionary adaptation in response to controlled selective pressures. However, until recently it has been difficult to identify the precise genetic changes underlying adaptation at a genome-wide scale. New DNA sequencing technologies now allow the genome of parental and evolved strains of microorganisms to be rapidly determined. Results We sequenced >93.5% of the genome of a laboratory-evolved strain of the yeast Saccharomyces cerevisiae and its ancestor at >28× depth. Both single nucleotide polymorphisms and copy number amplifications were found, with specific gains over array-based methodologies previously used to analyze these genomes. Applying a segmentation algorithm to quantify structural changes, we determined the approximate genomic boundaries of a 5× gene amplification. These boundaries guided the recovery of breakpoint sequences, which provide insights into the nature of a complex genomic rearrangement. Conclusions This study suggests that whole-genome sequencing can provide a rapid approach to uncover the genetic basis of evolutionary adaptations, with further applications in the study of laboratory selections and mutagenesis screens. In addition, we show how single-end, short read sequencing data can provide detailed information about structural rearrangements, and generate predictions about the genomic features and processes that underlie genome plasticity.

  7. SNP detection for massively parallel whole-genome resequencing

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Li, Yingrui; Fang, Xiaodong

    2009-01-01

    -genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All...... of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.......25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered...

  8. New perspectives on microbial community distortion after whole-genome amplification

    Science.gov (United States)

    Whole-genome amplification (WGA) has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the e...

  9. Systematic evaluation of bias in microbial community profiles induced by whole genome amplification.

    NARCIS (Netherlands)

    Direito, S.; Zaura, E.; Little, M.; Ehrenfreund, P.; Roling, W.F.M.

    2014-01-01

    Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used [multiple displacement

  10. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA; Vos, M. de; Louw, GE; Merwe, RG van der; Dippenaar, A.; Streicher, EM; Abdallah, AM; Sampson, SL; Victor, TC; Dolby, T.; Simpson, JA; Helden, PD van; Warren, RM; Pain, Arnab

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug

  11. Systematic evaluation of bias in microbial community profiles induced by whole genome amplification

    NARCIS (Netherlands)

    Direito, S.O.L.; Zaura, E.; Little, M.; Ehrenfreund, P.; Röling, W.F.M.

    2014-01-01

    Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used [multiple displacement

  12. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences

    KAUST Repository

    Coll, Francesc; McNerney, Ruth; Preston, Mark D; Guerra-Assunç ã o, José Afonso; Warry, Andrew; Hill-Cawthorne, Grant A.; Mallard, Kim; Nair, Mridul; Miranda, Anabela; Alves, Adriana; Perdigã o, Joã o; Viveiros, Miguel; Portugal, Isabel; Hasan, Zahra; Hasan, Rumina; Glynn, Judith R; Martin, Nigel; Pain, Arnab; Clark, Taane G

    2015-01-01

    Mycobacterium tuberculosis drug resistance (DR) challenges effective tuberculosis disease control. Current molecular tests examine limited numbers of mutations, and although whole genome sequencing approaches could fully characterise DR, data

  13. Blood group genotyping: the power and limitations of the Hemo ID Panel and MassARRAY platform.

    Science.gov (United States)

    McBean, Rhiannon S; Hyland, Catherine A; Flower, Robert L

    2015-01-01

    Matrix-assisted laser desorption/ionization, time-of-flight mass spectrometry (MALDI-TOF MS), is a sensitive analytical method capable of resolving DNA fragments varying in mass by a single nucleotide. MALDI-TOF MS is applicable to blood group genotyping, as the majority of blood group antigens are encoded by single nucleotide polymorphisms. Blood group genotyping by MALDI-TOF MS can be performed using a panel (Hemo ID Blood Group Genotyping Panel, Agena Bioscience Inc., San Diego, CA) that is a set of genotyping assays that predict the phenotype for 101 antigens from 16 blood group systems. These assays involve three fundamental stages: multiplex target-specific polymerase chain reaction amplification, allele-specific single base primer extension, and MALDI-TOFMS analysis using the MassARRAY system. MALDI-TOF MS-based genotyping has many advantages over alternative methods including high throughput, high multiplex capability, flexibility and adaptability, and the high level of accuracy based on the direct detection method. Currently available platforms for MALDI-TOF MS-based genotyping are not without limitations, including high upfront instrumentation costs and the number of non-automated steps. The Hemo ID Blood Group Genotyping Panel, developed and optimized in a collaboration between the vendor and the Blood Transfusion Service of the Swiss Red Cross in Zurich, Switzerland, is not yet widely utilized, although several laboratories are currently evaluating the MassARRAY system for blood group genotyping. Based on the accuracy and other advantages offered by MALDITOF MS analysis, in the future, this method is likely to become widely adopted for blood group genotyping, in particular, for population screening.

  14. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    Directory of Open Access Journals (Sweden)

    Nedenia Bonvino Stafuzza

    Full Text Available Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose, Gyr, Girolando and Holstein (dairy production. A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs and 3,828,041 insertions/deletions (InDels were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  15. Whole genome sequencing and evolutionary analysis of human respiratory syncytial virus A and B from Milwaukee, WI 1998-2010.

    Directory of Open Access Journals (Sweden)

    Cecilia Rebuffo-Scheer

    Full Text Available BACKGROUND: Respiratory Syncytial Virus (RSV is the leading cause of lower respiratory-tract infections in infants and young children worldwide. Despite this, only six complete genome sequences of original strains have been previously published, the most recent of which dates back 35 and 26 years for RSV group A and group B respectively. METHODOLOGY/PRINCIPAL FINDINGS: We present a semi-automated sequencing method allowing for the sequencing of four RSV whole genomes simultaneously. We were able to sequence the complete coding sequences of 13 RSV A and 4 RSV B strains from Milwaukee collected from 1998-2010. Another 12 RSV A and 5 RSV B strains sequenced in this study cover the majority of the genome. All RSV A and RSV B sequences were analyzed by neighbor-joining, maximum parsimony and Bayesian phylogeny methods. Genetic diversity was high among RSV A viruses in Milwaukee including the circulation of multiple genotypes (GA1, GA2, GA5, GA7 with GA2 persisting throughout the 13 years of the study. However, RSV B genomes showed little variation with all belonging to the BA genotype. For RSV A, the same evolutionary patterns and clades were seen consistently across the whole genome including all intergenic, coding, and non-coding regions sequences. CONCLUSIONS/SIGNIFICANCE: The sequencing strategy presented in this work allows for RSV A and B genomes to be sequenced simultaneously in two working days and with a low cost. We have significantly increased the amount of genomic data that is available for both RSV A and B, providing the basic molecular characteristics of RSV strains circulating in Milwaukee over the last 13 years. This information can be used for comparative analysis with strains circulating in other communities around the world which should also help with the development of new strategies for control of RSV, specifically vaccine development and improvement of RSV diagnostics.

  16. Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery

    Directory of Open Access Journals (Sweden)

    Stothard Paul

    2011-11-01

    Full Text Available Abstract Background One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle. Results The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs, 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs. Ten

  17. Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition.

    Science.gov (United States)

    Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen

    2017-12-27

    Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP

  18. Strategies for Enriching Variant Coverage in Candidate Disease Loci on a Multiethnic Genotyping Array.

    Directory of Open Access Journals (Sweden)

    Stephanie A Bien

    Full Text Available Investigating genetic architecture of complex traits in ancestrally diverse populations is imperative to understand the etiology of disease. However, the current paucity of genetic research in people of African and Latin American ancestry, Hispanic and indigenous peoples in the United States is likely to exacerbate existing health disparities for many common diseases. The Population Architecture using Genomics and Epidemiology, Phase II (PAGE II, Study was initiated in 2013 by the National Human Genome Research Institute to expand our understanding of complex trait loci in ethnically diverse and well characterized study populations. To meet this goal, the Multi-Ethnic Genotyping Array (MEGA was designed to substantially improve fine-mapping and functional discovery by increasing variant coverage across multiple ethnicities at known loci for metabolic, cardiovascular, renal, inflammatory, anthropometric, and a variety of lifestyle traits. Studying the frequency distribution of clinically relevant mutations, putative risk alleles, and known functional variants across multiple populations will provide important insight into the genetic architecture of complex diseases and facilitate the discovery of novel, sometimes population-specific, disease associations. DNA samples from 51,650 self-identified African ancestry (17,328, Hispanic/Latino (22,379, Asian/Pacific Islander (8,640, and American Indian (653 and an additional 2,650 participants of either South Asian or European ancestry, and other reference panels have been genotyped on MEGA by PAGE II. MEGA was designed as a new resource for studying ancestrally diverse populations. Here, we describe the methodology for selecting trait-specific content for use in multi-ethnic populations and how enriching MEGA for this content may contribute to deeper biological understanding of the genetic etiology of complex disease.

  19. DNA fingerprinting of Mycobacterium tuberculosis: from phage typing to whole-genome sequencing.

    Science.gov (United States)

    Schürch, Anita C; van Soolingen, Dick

    2012-06-01

    Current typing methods for Mycobacterium tuberculosis complex evolved from simple phenotypic approaches like phage typing and drug susceptibility profiling to DNA-based strain typing methods, such as IS6110-restriction fragment length polymorphisms (RFLP) and variable number of tandem repeats (VNTR) typing. Examples of the usefulness of molecular typing are source case finding and epidemiological linkage of tuberculosis (TB) cases, international transmission of MDR/XDR-TB, the discrimination between endogenous reactivation and exogenous re-infection as a cause of relapses after curative treatment of tuberculosis, the evidence of multiple M. tuberculosis infections, and the disclosure of laboratory cross-contaminations. Simultaneously, phylogenetic analyses were developed based on single nucleotide polymorphisms (SNPs), genomic deletions usually referred to as regions of difference (RDs) and spoligotyping which served both strain typing and phylogenetic analysis. National and international initiatives that rely on the application of these typing methods have brought significant insight into the molecular epidemiology of tuberculosis. However, current DNA fingerprinting methods have important limitations. They can often not distinguish between genetically closely related strains and the turn-over of these markers is variable. Moreover, the suitability of most DNA typing methods for phylogenetic reconstruction is limited as they show a high propensity of convergent evolution or misinfer genetic distances. In order to fully explore the possibilities of genotyping in the molecular epidemiology of tuberculosis and to study the phylogeny of the causative bacteria reliably, the application of whole-genome sequencing (WGS) analysis for all M. tuberculosis isolates is the optimal, although currently still a costly solution. In the last years WGS for typing of pathogens has been explored and yielded important additional information on strain diversity in comparison to the

  20. Whole Genome Sequencing Increases Molecular Diagnostic Yield Compared with Current Diagnostic Testing for Inherited Retinal Disease.

    Science.gov (United States)

    Ellingford, Jamie M; Barton, Stephanie; Bhaskar, Sanjeev; Williams, Simon G; Sergouniotis, Panagiotis I; O'Sullivan, James; Lamb, Janine A; Perveen, Rahat; Hall, Georgina; Newman, William G; Bishop, Paul N; Roberts, Stephen A; Leach, Rick; Tearle, Rick; Bayliss, Stuart; Ramsden, Simon C; Nemeth, Andrea H; Black, Graeme C M

    2016-05-01

    To compare the efficacy of whole genome sequencing (WGS) with targeted next-generation sequencing (NGS) in the diagnosis of inherited retinal disease (IRD). Case series. A total of 562 patients diagnosed with IRD. We performed a direct comparative analysis of current molecular diagnostics with WGS. We retrospectively reviewed the findings from a diagnostic NGS DNA test for 562 patients with IRD. A subset of 46 of 562 patients (encompassing potential clinical outcomes of diagnostic analysis) also underwent WGS, and we compared mutation detection rates and molecular diagnostic yields. In addition, we compared the sensitivity and specificity of the 2 techniques to identify known single nucleotide variants (SNVs) using 6 control samples with publically available genotype data. Diagnostic yield of genomic testing. Across known disease-causing genes, targeted NGS and WGS achieved similar levels of sensitivity and specificity for SNV detection. However, WGS also identified 14 clinically relevant genetic variants through WGS that had not been identified by NGS diagnostic testing for the 46 individuals with IRD. These variants included large deletions and variants in noncoding regions of the genome. Identification of these variants confirmed a molecular diagnosis of IRD for 11 of the 33 individuals referred for WGS who had not obtained a molecular diagnosis through targeted NGS testing. Weighted estimates, accounting for population structure, suggest that WGS methods could result in an overall 29% (95% confidence interval, 15-45) uplift in diagnostic yield. We show that WGS methods can detect disease-causing genetic variants missed by current NGS diagnostic methodologies for IRD and thereby demonstrate the clinical utility and additional value of WGS. Copyright © 2016 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  1. A whole genome analyses of genetic variants in two Kelantan Malay individuals.

    Science.gov (United States)

    Wan Juhari, Wan Khairunnisa; Md Tamrin, Nur Aida; Mat Daud, Mohd Hanif Ridzuan; Isa, Hatin Wan; Mohd Nasir, Nurfazreen; Maran, Sathiya; Abdul Rajab, Nur Shafawati; Ahmad Amin Noordin, Khairul Bariah; Nik Hassan, Nik Norliza; Tearle, Rick; Razali, Rozaimi; Merican, Amir Feisal; Zilfalil, Bin Alwi

    2014-12-01

    The sequencing of two members of the Royal Kelantan Malay family genomes will provide insights on the Kelantan Malay whole genome sequences. The two Kelantan Malay genomes were analyzed for the SNP markers associated with thalassemia and Helicobacter pylori infection. Helicobacter pylori infection was reported to be low prevalence in the north-east as compared to the west coast of the Peninsular Malaysia and beta-thalassemia was known to be one of the most common inherited and genetic disorder in Malaysia. By combining SNP information from literatures, GWAS study and NCBI ClinVar, 18 unique SNPs were selected for further analysis. From these 18 SNPs, 10 SNPs came from previous study of Helicobacter pylori infection among Malay patients, 6 SNPs were from NCBI ClinVar and 2 SNPs from GWAS studies. The analysis reveals that both Royal Kelantan Malay genomes shared all the 10 SNPs identified by Maran (Single Nucleotide Polymorphims (SNPs) genotypic profiling of Malay patients with and without Helicobacter pylori infection in Kelantan, 2011) and one SNP from GWAS study. In addition, the analysis also reveals that both Royal Kelantan Malay genomes shared 3 SNP markers; HBG1 (rs1061234), HBB (rs1609812) and BCL11A (rs766432) where all three markers were associated with beta-thalassemia. Our findings suggest that the Royal Kelantan Malays carry the SNPs which are associated with protection to Helicobacter pylori infection. In addition they also carry SNPs which are associated with beta-thalassemia. These findings are in line with the findings by other researchers who conducted studies on thalassemia and Helicobacter pylori infection in the non-royal Malay population.

  2. Evaluation of Bovine High-Density SNP Genotyping Array in Indigenous Dairy Cattle Breeds.

    Science.gov (United States)

    Dash, S; Singh, A; Bhatia, A K; Jayakumar, S; Sharma, A; Singh, S; Ganguly, I; Dixit, S P

    2018-04-03

    In total 52 samples of Sahiwal ( 19 ), Tharparkar ( 17 ), and Gir ( 16 ) were genotyped by using BovineHD SNP chip to analyze minor allele frequency (MAF), genetic diversity, and linkage disequilibrium among these cattle. The common SNPs of BovineHD and 54K SNP Chips were also extracted and evaluated for their performance. Only 40%-50% SNPs of these arrays was found informative for genetic analysis in these cattle breeds. The overall mean of MAF for SNPs of BovineHD SNPChip was 0.248 ± 0.006, 0.241 ± 0.007, and 0.242 ± 0.009 in Sahiwal, Tharparkar and Gir, respectively, while that for 54K SNPs was on lower side. The average Reynold's genetic distance between breeds ranged from 0.042 to 0.055 based on BovineHD Beadchip, and from 0.052 to 0.084 based on 54K SNP Chip. The estimates of genetic diversity based on HD and 54K chips were almost same and, hence, low density chip seems to be good enough to decipher genetic diversity of these cattle breeds. The linkage disequilibrium started decaying (r 2  < 0.2) at 140 kb inter-marker distance and, hence, a 20K low density customized SNP array from HD chip could be designed for genomic selection in these cattle else the 54K Bead Chip as such will be useful.

  3. Whole-Genome Expression Analysis of Human Mesenchymal Stromal Cells Exposed to Ultrasmooth Tantalum vs. Titanium Oxide Surfaces

    DEFF Research Database (Denmark)

    Stiehler, C.; Bunger, C.; Overall, R. W.

    2013-01-01

    to titanium (Ti) surface. The aim of this study was to extend the previous investigation of biocompatibility by monitoring temporal gene expression of MSCs on topographically comparable smooth Ta and Ti surfaces using whole-genome gene expression analysis. Total RNA samples from telomerase-immortalized human...... MSCs cultivated on plain sputter-coated surfaces of Ta or Ti for 1, 2, 4, and 8 days were hybridized to n = 16 U133 Plus 2.0 arrays (Affymetrix(A (R))). Functional annotation, cluster and pathway analyses were performed. The vast majority of genes were differentially regulated after 4 days...... of cultivation and genes upregulated by MSCs exposed to Ta and Ti were predominantly related to the processes of differentiation and transcription, respectively. Functional annotation analysis of the 1,000 temporally most significantly regulated genes suggests earlier cellular differentiation on Ta compared...

  4. Whole genome typing of the recently emerged Canadian serogroup W Neisseria meningitidis sequence type 11 clonal complex isolates associated with invasive meningococcal disease

    Directory of Open Access Journals (Sweden)

    Raymond S.W. Tsang

    2018-04-01

    Full Text Available Objectives: This study was performed to analyze the Canadian invasive serogroup W Neisseria meningitidis (MenW sequence type 11 (ST-11 clonal complex (CC isolates by whole genome typing and to compare Canadian isolates with similar isolates from elsewhere. Methods: Whole genome typing of 30 MenW ST-11 CC, 20 meningococcal group C (MenC ST-11 CC, and 31 MenW ST-22 CC isolates was performed on the Bacterial Isolate Genome Sequence database platform. Canadian MenW ST-11 CC isolates were compared with the 2000 MenW Hajj outbreak strain, as well as with MenW ST-11 CC from other countries. Results: Whole genome typing showed that the Canadian MenW ST-11 CC isolates were distinct from the traditional MenW ST-22 CC; they were not capsule-switched contemporary MenC strains that incorporated MenW capsules. While some recent MenW disease cases in Canada were caused by MenW ST-11 CC isolates showing relatedness to the 2000 MenW Hajj strain, many were non-Hajj isolates similar to current MenW ST-11 isolates found globally. Geographical and temporal variations in genotypes and surface protein antigen genes were found among the MenW ST-11 CC isolates. Conclusions: The current MenW ST-11 isolates did not arise by capsule switching from contemporary MenC ST-11 isolates. Both the Hajj-related and non-Hajj MenW ST-11 CC strains were associated with invasive meningococcal disease in Canada. Keywords: Neisseria meningitidis, Invasive meningococcal disease, Whole genome typing

  5. Whole Genome Sequence Analysis of Salmonella Typhi Isolated in Thailand before and after the Introduction of a National Immunization Program.

    Directory of Open Access Journals (Sweden)

    Zoe A Dyson

    2017-01-01

    Full Text Available Vaccines against Salmonella Typhi, the causative agent of typhoid fever, are commonly used by travellers, however, there are few examples of national immunization programs in endemic areas. There is therefore a paucity of data on the impact of typhoid immunization programs on localised populations of S. Typhi. Here we have used whole genome sequencing (WGS to characterise 44 historical bacterial isolates collected before and after a national typhoid immunization program that was implemented in Thailand in 1977 in response to a large outbreak; the program was highly effective in reducing typhoid case numbers. Thai isolates were highly diverse, including 10 distinct phylogenetic lineages or genotypes. Novel prophage and plasmids were also detected, including examples that were previously only reported in Shigella sonnei and Escherichia coli. The majority of S. Typhi genotypes observed prior to the immunization program were not observed following it. Post-vaccine era isolates were more closely related to S. Typhi isolated from neighbouring countries than to earlier Thai isolates, providing no evidence for the local persistence of endemic S. Typhi following the national immunization program. Rather, later cases of typhoid appeared to be caused by the occasional importation of common genotypes from neighbouring Vietnam, Laos, and Cambodia. These data show the value of WGS in understanding the impacts of vaccination on pathogen populations and provide support for the proposal that large-scale typhoid immunization programs in endemic areas could result in lasting local disease elimination, although larger prospective studies are needed to test this directly.

  6. Whole-genome DNA methylation status associated with clinical PTSD measures of OIF/OEF veterans

    Science.gov (United States)

    Hammamieh, R; Chakraborty, N; Gautam, A; Muhie, S; Yang, R; Donohue, D; Kumar, R; Daigle, B J; Zhang, Y; Amara, D A; Miller, S-A; Srinivasan, S; Flory, J; Yehuda, R; Petzold, L; Wolkowitz, O M; Mellon, S H; Hood, L; Doyle, F J; Marmar, C; Jett, M

    2017-01-01

    Emerging knowledge suggests that post-traumatic stress disorder (PTSD) pathophysiology is linked to the patients’ epigenetic changes, but comprehensive studies examining genome-wide methylation have not been performed. In this study, we examined genome-wide DNA methylation in peripheral whole blood in combat veterans with and without PTSD to ascertain differentially methylated probes. Discovery was initially made in a training sample comprising 48 male Operation Enduring Freedom (OEF)/Operation Iraqi Freedom (OIF) veterans with PTSD and 51 age/ethnicity/gender-matched combat-exposed PTSD-negative controls. Agilent whole-genome array detected ~5600 differentially methylated CpG islands (CpGI) annotated to ~2800 differently methylated genes (DMGs). The majority (84.5%) of these CpGIs were hypermethylated in the PTSD cases. Functional analysis was performed using the DMGs encoding the promoter-bound CpGIs to identify networks related to PTSD. The identified networks were further validated by an independent test set comprising 31 PTSD+/29 PTSD− veterans. Targeted bisulfite sequencing was also used to confirm the methylation status of 20 DMGs shown to be highly perturbed in the training set. To improve the statistical power and mitigate the assay bias and batch effects, a union set combining both training and test set was assayed using a different platform from Illumina. The pathways curated from this analysis confirmed 65% of the pool of pathways mined from training and test sets. The results highlight the importance of assay methodology and use of independent samples for discovery and validation of differentially methylated genes mined from whole blood. Nonetheless, the current study demonstrates that several important epigenetically altered networks may distinguish combat-exposed veterans with and without PTSD. PMID:28696412

  7. [Whole Genome Sequencing of Human mtDNA Based on Ion Torrent PGM™ Platform].

    Science.gov (United States)

    Cao, Y; Zou, K N; Huang, J P; Ma, K; Ping, Y

    2017-08-01

    To analyze and detect the whole genome sequence of human mitochondrial DNA (mtDNA) by Ion Torrent PGM™ platform and to study the differences of mtDNA sequence in different tissues. Samples were collected from 6 unrelated individuals by forensic postmortem examination, including chest blood, hair, costicartilage, nail, skeletal muscle and oral epithelium. Amplification of whole genome sequence of mtDNA was performed by 4 pairs of primer. Libraries were constructed with Ion Shear™ Plus Reagents kit and Ion Plus Fragment Library kit. Whole genome sequencing of mtDNA was performed using Ion Torrent PGM™ platform. Sanger sequencing was used to determine the heteroplasmy positions and the mutation positions on HVⅠ region. The whole genome sequence of mtDNA from all samples were amplified successfully. Six unrelated individuals belonged to 6 different haplotypes. Different tissues in one individual had heteroplasmy difference. The heteroplasmy positions and the mutation positions on HVⅠ region were verified by Sanger sequencing. After a consistency check by the Kappa method, it was found that the results of mtDNA sequence had a high consistency in different tissues. The testing method used in present study for sequencing the whole genome sequence of human mtDNA can detect the heteroplasmy difference in different tissues, which have good consistency. The results provide guidance for the further applications of mtDNA in forensic science. Copyright© by the Editorial Department of Journal of Forensic Medicine

  8. Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

    DEFF Research Database (Denmark)

    Zhan, Bujie; Fadista, João; Thomsen, Bo

    2011-01-01

    Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome...... of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation...

  9. Whole Genome Sequencing Based Characterization of Extensively Drug-Resistant Mycobacterium tuberculosis Isolates from Pakistan

    KAUST Repository

    Ali, Asho; Hasan, Zahra; McNerney, Ruth; Mallard, Kim; Hill-Cawthorne, Grant A.; Coll, Francesc; Nair, Mridul; Pain, Arnab; Clark, Taane G.; Hasan, Rumina

    2015-01-01

    Improved molecular diagnostic methods for detection drug resistance in Mycobacterium tuberculosis (MTB) strains are required. Resistance to first- and second- line anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs) in particular genes. However, these SNPs can vary between MTB lineages therefore local data is required to describe different strain populations. We used whole genome sequencing (WGS) to characterize 37 extensively drug-resistant (XDR) MTB isolates from Pakistan and investigated 40 genes associated with drug resistance. Rifampicin resistance was attributable to SNPs in the rpoB hot-spot region. Isoniazid resistance was most commonly associated with the katG codon 315 (92%) mutation followed by inhA S94A (8%) however, one strain did not have SNPs in katG, inhA or oxyR-ahpC. All strains were pyrazimamide resistant but only 43% had pncA SNPs. Ethambutol resistant strains predominantly had embB codon 306 (62%) mutations, but additional SNPs at embB codons 406, 378 and 328 were also present. Fluoroquinolone resistance was associated with gyrA 91-94 codons in 81% of strains; four strains had only gyr B mutations, while others did not have SNPs in either gyrA or gyrB. Streptomycin resistant strains had mutations in ribosomal RNA genes; rpsL codon 43 (42%); rrs 500 region (16%), and gidB (34%) while six strains did not have mutations in any of these genes. Amikacin/kanamycin/capreomycin resistance was associated with SNPs in rrs at nt1401 (78%) and nt1484 (3%), except in seven (19%) strains. We estimate that if only the common hot-spot region targets of current commercial assays were used, the concordance between phenotypic and genotypic testing for these XDR strains would vary between rifampicin (100%), isoniazid (92%), flouroquinolones (81%), aminoglycoside (78%) and ethambutol (62%); while pncA sequencing would provide genotypic resistance in less than half the isolates. This work highlights the importance of expanded

  10. Whole Genome Sequencing Based Characterization of Extensively Drug-Resistant Mycobacterium tuberculosis Isolates from Pakistan

    KAUST Repository

    Ali, Asho

    2015-02-26

    Improved molecular diagnostic methods for detection drug resistance in Mycobacterium tuberculosis (MTB) strains are required. Resistance to first- and second- line anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs) in particular genes. However, these SNPs can vary between MTB lineages therefore local data is required to describe different strain populations. We used whole genome sequencing (WGS) to characterize 37 extensively drug-resistant (XDR) MTB isolates from Pakistan and investigated 40 genes associated with drug resistance. Rifampicin resistance was attributable to SNPs in the rpoB hot-spot region. Isoniazid resistance was most commonly associated with the katG codon 315 (92%) mutation followed by inhA S94A (8%) however, one strain did not have SNPs in katG, inhA or oxyR-ahpC. All strains were pyrazimamide resistant but only 43% had pncA SNPs. Ethambutol resistant strains predominantly had embB codon 306 (62%) mutations, but additional SNPs at embB codons 406, 378 and 328 were also present. Fluoroquinolone resistance was associated with gyrA 91-94 codons in 81% of strains; four strains had only gyr B mutations, while others did not have SNPs in either gyrA or gyrB. Streptomycin resistant strains had mutations in ribosomal RNA genes; rpsL codon 43 (42%); rrs 500 region (16%), and gidB (34%) while six strains did not have mutations in any of these genes. Amikacin/kanamycin/capreomycin resistance was associated with SNPs in rrs at nt1401 (78%) and nt1484 (3%), except in seven (19%) strains. We estimate that if only the common hot-spot region targets of current commercial assays were used, the concordance between phenotypic and genotypic testing for these XDR strains would vary between rifampicin (100%), isoniazid (92%), flouroquinolones (81%), aminoglycoside (78%) and ethambutol (62%); while pncA sequencing would provide genotypic resistance in less than half the isolates. This work highlights the importance of expanded

  11. Whole-genome resequencing reveals candidate mutations for pig prolificacy.

    Science.gov (United States)

    Li, Wen-Ting; Zhang, Meng-Meng; Li, Qi-Gang; Tang, Hui; Zhang, Li-Fan; Wang, Ke-Jun; Zhu, Mu-Zhen; Lu, Yun-Feng; Bao, Hai-Gang; Zhang, Yuan-Ming; Li, Qiu-Yan; Wu, Ke-Liang; Wu, Chang-Xin

    2017-12-20

    Changes in pig fertility have occurred as a result of domestication, but are not understood at the level of genetic variation. To identify variations potentially responsible for prolificacy, we sequenced the genomes of the highly prolific Taihu pig breed and four control breeds. Genes involved in embryogenesis and morphogenesis were targeted in the Taihu pig, consistent with the morphological differences observed between the Taihu pig and others during pregnancy. Additionally, excessive functional non-coding mutations have been specifically fixed or nearly fixed in the Taihu pig. We focused attention on an oestrogen response element (ERE) within the first intron of the bone morphogenetic protein receptor type-1B gene ( BMPR1B ) that overlaps with a known quantitative trait locus (QTL) for pig fecundity. Using 242 pigs from 30 different breeds, we confirmed that the genotype of the ERE was nearly fixed in the Taihu pig. ERE function was assessed by luciferase assays, examination of histological sections, chromatin immunoprecipitation, quantitative polymerase chain reactions, and western blots. The results suggest that the ERE may control pig prolificacy via the cis-regulation of BMPR1B expression. This study provides new insight into changes in reproductive performance and highlights the role of non-coding mutations in generating phenotypic diversity between breeds. © 2017 The Author(s).

  12. Whole-Genome de novo Sequencing Of Quail And Grey Partridge

    DEFF Research Database (Denmark)

    Holm, Lars-Erik; Panitz, Frank; Burt, Dave

    2011-01-01

    The development in sequencing methods has made it possible to perform whole genome de novo sequencing of species without large commercial interests. Within the EU-financed QUANTOMICS project (KBBE-2A-222664), we have performed de novo sequencing of quail (Coturnix coturnix) and grey partridge...... (Perdix perdix) on a Genome Analyzer GAII (Illumina) using paired-end sequencing. The amount of generated sequences amounts to 8 to 9 Gb for each species. The analysis and assembly of the generated sequences is ongoing. Access to the whole genome sequence from these two species will enable enhanced...... comparative studies towards the chicken genome and will aid in identifying evolutionarily conserved sequences within the Galliformes. The obtained sequences from quail and partridge represent a beginning of generating the whole genome sequence for these species. The continuation of establishing the genome...

  13. Whole-genome SNP association in the horse: identification of a deletion in myosin Va responsible for Lavender Foal Syndrome.

    Directory of Open Access Journals (Sweden)

    Samantha A Brooks

    2010-04-01

    Full Text Available Lavender Foal Syndrome (LFS is a lethal inherited disease of horses with a suspected autosomal recessive mode of inheritance. LFS has been primarily diagnosed in a subgroup of the Arabian breed, the Egyptian Arabian horse. The condition is characterized by multiple neurological abnormalities and a dilute coat color. Candidate genes based on comparative phenotypes in mice and humans include the ras-associated protein RAB27a (RAB27A and myosin Va (MYO5A. Here we report mapping of the locus responsible for LFS using a small set of 36 horses segregating for LFS. These horses were genotyped using a newly available single nucleotide polymorphism (SNP chip containing 56,402 discriminatory elements. The whole genome scan identified an associated region containing these two functional candidate genes. Exon sequencing of the MYO5A gene from an affected foal revealed a single base deletion in exon 30 that changes the reading frame and introduces a premature stop codon. A PCR-based Restriction Fragment Length Polymorphism (PCR-RFLP assay was designed and used to investigate the frequency of the mutant gene. All affected horses tested were homozygous for this mutation. Heterozygous carriers were detected in high frequency in families segregating for this trait, and the frequency of carriers in unrelated Egyptian Arabians was 10.3%. The mapping and discovery of the LFS mutation represents the first successful use of whole-genome SNP scanning in the horse for any trait. The RFLP assay can be used to assist breeders in avoiding carrier-to-carrier matings and thus in preventing the birth of affected foals.

  14. Soybean (Glycine max) SWEET gene family: insights through comparative genomics, transcriptome profiling and whole genome re-sequence analysis.

    Science.gov (United States)

    Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T

    2015-07-11

    SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research

  15. Genotyping of 75 SNPs using arrays for individual identification in five population groups.

    Science.gov (United States)

    Hwa, Hsiao-Lin; Wu, Lawrence Shih Hsin; Lin, Chun-Yen; Huang, Tsun-Ying; Yin, Hsiang-I; Tseng, Li-Hui; Lee, James Chun-I

    2016-01-01

    Single nucleotide polymorphism (SNP) typing offers promise to forensic genetics. Various strategies and panels for analyzing SNP markers for individual identification have been published. However, the best panels with fewer identity SNPs for all major population groups are still under discussion. This study aimed to find more autosomal SNPs with high heterozygosity for individual identification among Asian populations. Ninety-six autosomal SNPs of 502 DNA samples from unrelated individuals of five population groups (208 Taiwanese Han, 83 Filipinos, 62 Thais, 69 Indonesians, and 80 individuals with European, Near Eastern, or South Asian ancestry) were analyzed using arrays in an initial screening, and 75 SNPs (group A, 46 newly selected SNPs; groups B, 29 SNPs based on a previous SNP panel) were selected for further statistical analyses. Some SNPs with high heterozygosity from Asian populations were identified. The combined random match probability of the best 40 and 45 SNPs was between 3.16 × 10(-17) and 7.75 × 10(-17) and between 2.33 × 10(-19) and 7.00 × 10(-19), respectively, in all five populations. These loci offer comparable power to short tandem repeats (STRs) for routine forensic profiling. In this study, we demonstrated the population genetic characteristics and forensic parameters of 75 SNPs with high heterozygosity from five population groups. This SNPs panel can provide valuable genotypic information and can be helpful in forensic casework for individual identification among these populations.

  16. Whole-Genome Sequences of Two Borrelia afzelii and Two Borrelia garinii Lyme Disease Agent Isolates

    Energy Technology Data Exchange (ETDEWEB)

    Casjens, S.R.; Dunn, J.; Mongodin, E. F.; Qiu, W.-G.; Luft, B. J.; Fraser-Liggett, C. M.; Schutzer, S. E.

    2011-12-01

    Human Lyme disease is commonly caused by several species of spirochetes in the Borrelia genus. In Eurasia these species are largely Borrelia afzelii, B. garinii, B. burgdorferi, and B. bavariensis sp. nov. Whole-genome sequencing is an excellent tool for investigating and understanding the influence of bacterial diversity on the pathogenesis and etiology of Lyme disease. We report here the whole-genome sequences of four isolates from two of the Borrelia species that cause human Lyme disease, B. afzelii isolates ACA-1 and PKo and B. garinii isolates PBr and Far04.

  17. Rice-arsenate interactions in hydroponics: whole genome transcriptional analysis.

    Science.gov (United States)

    Norton, Gareth J; Lou-Hing, Daniel E; Meharg, Andrew A; Price, Adam H

    2008-01-01

    Rice (Oryza sativa) varieties that are arsenate-tolerant (Bala) and -sensitive (Azucena) were used to conduct a transcriptome analysis of the response of rice seedlings to sodium arsenate (AsV) in hydroponic solution. RNA extracted from the roots of three replicate experiments of plants grown for 1 week in phosphate-free nutrient with or without 13.3 muM AsV was used to challenge the Affymetrix (52K) GeneChip Rice Genome array. A total of 576 probe sets were significantly up-regulated at least 2-fold in both varieties, whereas 622 were down-regulated. Ontological classification is presented. As expected, a large number of transcription factors, stress proteins, and transporters demonstrated differential expression. Striking is the lack of response of classic oxidative stress-responsive genes or phytochelatin synthases/synthatases. However, the large number of responses from genes involved in glutathione synthesis, metabolism, and transport suggests that glutathione conjugation and arsenate methylation may be important biochemical responses to arsenate challenge. In this report, no attempt is made to dissect differences in the response of the tolerant and sensitive variety, but analysis in a companion article will link gene expression to the known tolerance loci available in the BalaxAzucena mapping population.

  18. Rice–arsenate interactions in hydroponics: whole genome transcriptional analysis

    Science.gov (United States)

    Norton, Gareth J.; Lou-Hing, Daniel E.; Meharg, Andrew A.; Price, Adam H.

    2008-01-01

    Rice (Oryza sativa) varieties that are arsenate-tolerant (Bala) and -sensitive (Azucena) were used to conduct a transcriptome analysis of the response of rice seedlings to sodium arsenate (AsV) in hydroponic solution. RNA extracted from the roots of three replicate experiments of plants grown for 1 week in phosphate-free nutrient with or without 13.3 μM AsV was used to challenge the Affymetrix (52K) GeneChip Rice Genome array. A total of 576 probe sets were significantly up-regulated at least 2-fold in both varieties, whereas 622 were down-regulated. Ontological classification is presented. As expected, a large number of transcription factors, stress proteins, and transporters demonstrated differential expression. Striking is the lack of response of classic oxidative stress-responsive genes or phytochelatin synthases/synthatases. However, the large number of responses from genes involved in glutathione synthesis, metabolism, and transport suggests that glutathione conjugation and arsenate methylation may be important biochemical responses to arsenate challenge. In this report, no attempt is made to dissect differences in the response of the tolerant and sensitive variety, but analysis in a companion article will link gene expression to the known tolerance loci available in the Bala×Azucena mapping population. PMID:18453530

  19. A 34K SNP genotyping array for Populus trichocarpa: design, application to the study of natural populations and transferability to other Populus species.

    Science.gov (United States)

    Geraldes, A; Difazio, S P; Slavov, G T; Ranjan, P; Muchero, W; Hannemann, J; Gunter, L E; Wymore, A M; Grassa, C J; Farzaneh, N; Porth, I; McKown, A D; Skyba, O; Li, E; Fujita, M; Klápště, J; Martin, J; Schackwitz, W; Pennacchio, C; Rokhsar, D; Friedmann, M C; Wasteneys, G O; Guy, R D; El-Kassaby, Y A; Mansfield, S D; Cronk, Q C B; Ehlting, J; Douglas, C J; Tuskan, G A

    2013-03-01

    Genetic mapping of quantitative traits requires genotypic data for large numbers of markers in many individuals. For such studies, the use of large single nucleotide polymorphism (SNP) genotyping arrays still offers the most cost-effective solution. Herein we report on the design and performance of a SNP genotyping array for Populus trichocarpa (black cottonwood). This genotyping array was designed with SNPs pre-ascertained in 34 wild accessions covering most of the species latitudinal range. We adopted a candidate gene approach to the array design that resulted in the selection of 34 131 SNPs, the majority of which are located in, or within 2 kb of, 3543 candidate genes. A subset of the SNPs on the array (539) was selected based on patterns of variation among the SNP discovery accessions. We show that more than 95% of the loci produce high quality genotypes and that the genotyping error rate for these is likely below 2%. We demonstrate that even among small numbers of samples (n = 10) from local populations over 84% of loci are polymorphic. We also tested the applicability of the array to other species in the genus and found that the number of polymorphic loci decreases rapidly with genetic distance, with the largest numbers detected in other species in section Tacamahaca. Finally, we provide evidence for the utility of the array to address evolutionary questions such as intraspecific studies of genetic differentiation, species assignment and the detection of natural hybrids. © 2013 Blackwell Publishing Ltd.

  20. The Use of Non-Variant Sites to Improve the Clinical Assessment of Whole-Genome Sequence Data.

    Directory of Open Access Journals (Sweden)

    Alberto Ferrarini

    Full Text Available Genetic testing, which is now a routine part of clinical practice and disease management protocols, is often based on the assessment of small panels of variants or genes. On the other hand, continuous improvements in the speed and per-base costs of sequencing have now made whole exome sequencing (WES and whole genome sequencing (WGS viable strategies for targeted or complete genetic analysis, respectively. Standard WGS/WES data analytical workflows generally rely on calling of sequence variants respect to the reference genome sequence. However, the reference genome sequence contains a large number of sites represented by rare alleles, by known pathogenic alleles and by alleles strongly associated to disease by GWAS. It's thus critical, for clinical applications of WGS and WES, to interpret whether non-variant sites are homozygous for the reference allele or if the corresponding genotype cannot be reliably called. Here we show that an alternative analytical approach based on the analysis of both variant and non-variant sites from WGS data allows to genotype more than 92% of sites corresponding to known SNPs compared to 6% genotyped by standard variant analysis. These include homozygous reference sites of clinical interest, thus leading to a broad and comprehensive characterization of variation necessary to an accurate evaluation of disease risk. Altogether, our findings indicate that characterization of both variant and non-variant clinically informative sites in the genome is necessary to allow an accurate clinical assessment of a personal genome. Finally, we propose a highly efficient extended VCF (eVCF file format which allows to store genotype calls for sites of clinical interest while remaining compatible with current variant interpretation software.

  1. Functional regression method for whole genome eQTL epistasis analysis with sequencing data.

    Science.gov (United States)

    Xu, Kelin; Jin, Li; Xiong, Momiao

    2017-05-18

    Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction

  2. Whole Genome and Tandem Duplicate Retention facilitated Glucosinolate Pathway Diversification in the Mustard Family.

    NARCIS (Netherlands)

    Hofberger, J.A.; Lyons, E.; Edger, P.P.; Pires, J.C.; Schranz, M.E.

    2013-01-01

    Plants share a common history of successive whole genome duplication (WGD) events retaining genomic patterns of duplicate gene copies (ohnologs) organized in conserved syntenic blocks. Duplication was often proposed to affect the origin of novel traits during evolution. However, genetic evidence

  3. Whole-genome regression and prediction methods applied to plant and animal breeding

    NARCIS (Netherlands)

    Los Campos, De G.; Hickey, J.M.; Pong-Wong, R.; Daetwyler, H.D.; Calus, M.P.L.

    2013-01-01

    Genomic-enabled prediction is becoming increasingly important in animal and plant breeding, and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of

  4. Direct whole-genome sequencing of Plasmodium falciparum specimens from dried erythrocyte spots

    DEFF Research Database (Denmark)

    Nag, Sidsel; Kofoed, Poul Erik; Ursing, Johan

    2018-01-01

    -infected individuals living in rural areas, away from main infrastructure and the electrical grid. The aim of this study was to describe a low-tech procedure to sample P. falciparum specimens for direct whole genome sequencing (WGS), without use of electricity and cold-chain. Methods: Venous blood samples were...

  5. Bos taurus strain:dairy beef (cattle): 1000 Bull Genomes Run 2, Bovine Whole Genome Sequence

    NARCIS (Netherlands)

    Bouwman, A.C.; Daetwyler, H.D.; Chamberlain, Amanda J.; Ponce, Carla Hurtado; Sargolzaei, Mehdi; Schenkel, Flavio S.; Sahana, Goutam; Govignon-Gion, Armelle; Boitard, Simon; Dolezal, Marlies; Pausch, Hubert; Brøndum, Rasmus F.; Bowman, Phil J.; Thomsen, Bo; Guldbrandtsen, Bernt; Lund, Mogens S.; Servin, Bertrand; Garrick, Dorian J.; Reecy, James M.; Vilkki, Johanna; Bagnato, Alessandro; Wang, Min; Hoff, Jesse L.; Schnabel, Robert D.; Taylor, Jeremy F.; Vinkhuyzen, Anna A.E.; Panitz, Frank; Bendixen, Christian; Holm, Lars-Erik; Gredler, Birgit; Hozé, Chris; Boussaha, Mekki; Sanchez, Marie Pierre; Rocha, Dominique; Capitan, Aurelien; Tribout, Thierry; Barbat, Anne; Croiseau, Pascal; Drögemüller, Cord; Jagannathan, Vidhya; Vander Jagt, Christy; Crowley, John J.; Bieber, Anna; Purfield, Deirdre C.; Berry, Donagh P.; Emmerling, Reiner; Götz, Kay Uwe; Frischknecht, Mirjam; Russ, Ingolf; Sölkner, Johann; Tassell, van Curtis P.; Fries, Ruedi; Stothard, Paul; Veerkamp, R.F.; Boichard, Didier; Goddard, Mike E.; Hayes, Ben J.

    2014-01-01

    Whole genome sequence data (BAM format) of 234 bovine individuals aligned to UMD3.1. The aim of the study was to identify genetic variants (SNPs and indels) for downstream analysis such as imputation, GWAS, and detection of lethal recessives. Additional sequences for later 1000 bull genomes runs can

  6. A synergism between adaptive effects and evolvability drives whole genome duplication to fixation

    NARCIS (Netherlands)

    Cuypers, Thomas D; Hogeweg, Paulien; Hogeweg, P.

    Whole genome duplication has shaped eukaryotic evolutionary history and has been associated with drastic environmental change and species radiation. While the most common fate of WGD duplicates is a return to single copy, retained duplicates have been found enriched for highly interacting genes.

  7. Tolerance of Whole-Genome Doubling Propagates Chromosomal Instability and Accelerates Cancer Genome Evolution

    DEFF Research Database (Denmark)

    Dewhurst, Sally M.; McGranahan, Nicholas; Burrell, Rebecca A.

    2014-01-01

    The contribution of whole-genome doubling to chromosomal instability (CIN) and tumor evolution is unclear. We use long-term culture of isogenic tetraploid cells from a stable diploid colon cancer progenitor to investigate how a genome-doubling event affects genome stability over time. Rare cells...

  8. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    NARCIS (Netherlands)

    Yuen, Ryan K C; Merico, Daniele; Bookman, Matt; Howe, Jennifer L.; Thiruvahindrapuram, Bhooma; Patel, Rohan V.; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A.; Walker, Susan; Marshall, Christian R.; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D'Abate, Lia; Chan, Ada J S; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L.; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J.; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R.; Nalpathamkalam, Thomas; Sung, Wilson W L; Tsoi, Fiona J.; Wei, John; Xu, Lizhen; Tasse, Anne Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie Mackinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M.; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H.; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A.; Parr, Jeremy R.; Spence, Sarah J.; Vorstman, Jacob; Frey, Brendan J.; Robinson, James T.; Strug, Lisa J.; Fernandez, Bridget A.; Elsabbagh, Mayada; Carter, Melissa T.; Hallmayer, Joachim; Knoppers, Bartha M.; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H.; Glazer, David; Pletcher, Mathew T.; Scherer, Stephen W.

    2017-01-01

    We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information,

  9. Determination of Elizabethkingia Diversity by MALDI-TOF Mass Spectrometry and Whole-Genome Sequencing

    DEFF Research Database (Denmark)

    Eriksen, Helle Brander; Gumpert, Heidi; Faurholt, Cecilie Haase

    2017-01-01

    In a hospital-acquired infection with multidrug-resistant Elizabethkingia, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry and 16S rRNA gene analysis identified the pathogen as Elizabethkingia miricola. Whole-genome sequencing, genus-level core genome analysis, and in...

  10. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    NARCIS (Netherlands)

    I. Tachmazidou (Ioanna); Süveges, D. (Dániel); J. Min (Josine); G.R.S. Ritchie (Graham R.S.); Steinberg, J. (Julia); K. Walter (Klaudia); V. Iotchkova (Valentina); J.A. Schwartzentruber (Jeremy); J. Huang (Jian); Y. Memari (Yasin); McCarthy, S. (Shane); Crawford, A.A. (Andrew A.); C. Bombieri (Cristina); M. Cocca (Massimiliano); A.-E. Farmaki (Aliki-Eleni); T.R. Gaunt (Tom); P. Jousilahti (Pekka); M.N. Kooijman (Marjolein ); Lehne, B. (Benjamin); G. Malerba (Giovanni); S. Männistö (Satu); A. Matchan (Angela); M.C. Medina-Gomez (Carolina); S. Metrustry (Sarah); A. Nag (Abhishek); I. Ntalla (Ioanna); L. Paternoster (Lavinia); N.W. Rayner (Nigel William); C. Sala (Cinzia); W.R. Scott (William R.); H.A. Shihab (Hashem A.); L. Southam (Lorraine); B. St Pourcain (Beate); M. Traglia (Michela); K. Trajanoska (Katerina); Zaza, G. (Gialuigi); W. Zhang (Weihua); M.S. Artigas; Bansal, N. (Narinder); M. Benn (Marianne); Chen, Z. (Zhongsheng); P. Danecek (Petr); Lin, W.-Y. (Wei-Yu); A. Locke (Adam); J. Luan (Jian'An); A.K. Manning (Alisa); Mulas, A. (Antonella); C. Sidore (Carlo); A. Tybjaerg-Hansen; A. Varbo (Anette); M. Zoledziewska (Magdalena); C. Finan (Chris); Hatzikotoulas, K. (Konstantinos); A.E. Hendricks (Audrey E.); J.P. Kemp (John); A. Moayyeri (Alireza); Panoutsopoulou, K. (Kalliope); Szpak, M. (Michal); S.G. Wilson (Scott); M. Boehnke (Michael); F. Cucca (Francesco); Di Angelantonio, E. (Emanuele); C. Langenberg (Claudia); C.M. Lindgren (Cecilia M.); McCarthy, M.I. (Mark I.); A.P. Morris (Andrew); B.G. Nordestgaard (Børge); R.A. Scott (Robert); M.D. Tobin (Martin); N.J. Wareham (Nick); P.R. Burton (Paul); J.C. Chambers (John); Smith, G.D. (George Davey); G.V. Dedoussis (George); J.F. Felix (Janine); O.H. Franco (Oscar); Gambaro, G. (Giovanni); P. Gasparini (Paolo); C.J. Hammond (Christopher J.); A. Hofman (Albert); V.W.V. Jaddoe (Vincent); M.E. Kleber (Marcus); J.S. Kooner (Jaspal S.); M. Perola (Markus); C.L. Relton (Caroline); S.M. Ring (Susan); F. Rivadeneira Ramirez (Fernando); V. Salomaa (Veikko); T.D. Spector (Timothy); O. Stegle (Oliver); D. Toniolo (Daniela); A.G. Uitterlinden (André); I.E. Barroso (Inês); C.M.T. Greenwood (Celia); Perry, J.R.B. (John R.B.); Walker, B.R. (Brian R.); A.S. Butterworth (Adam); Y. Xue (Yali); R. Durbin (Richard); K.S. Small (Kerrin); N. Soranzo (Nicole); N.J. Timpson (Nicholas); E. Zeggini (Eleftheria)

    2016-01-01

    textabstractDeep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the

  11. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans).

    Science.gov (United States)

    Tran, Phuong N; Tan, Nicholas E H; Lee, Yin Peng; Gan, Han Ming; Polter, Steven J; Dailey, Lucas K; Hudson, André O; Savka, Michael A

    2015-11-19

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy. Copyright © 2015 Tran et al.

  12. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans)

    OpenAIRE

    Tran, Phuong N.; Tan, Nicholas E. H.; Lee, Yin Peng; Gan, Han Ming; Polter, Steven J.; Dailey, Lucas K.; Hudson, Andr? O.; Savka, Michael A.

    2015-01-01

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy.

  13. The effect of whole genome amplification on samples originating from more than one donor

    DEFF Research Database (Denmark)

    Thacker, C.R.; Balogh, M.K.; Børsting, Claus

    2006-01-01

    In this study, the GenomiPhi(TM) DNA Amplification Kit (Amersham Biosciences) was used to investigate the potential of whole genome amplification (WGA) when considering samples originating from more than one donor. DNA was extracted from blood samples, quantified and normalised before being mixed...

  14. Evolutionary insight from whole-genome sequencing of Pseudomonas aeruginosa from cystic fibrosis patients

    DEFF Research Database (Denmark)

    Marvig, Rasmus Lykke; Madsen Sommer, Lea Mette; Jelsbak, Lars

    2015-01-01

    is suggested to be due to the large genetic repertoire of P. aeruginosa and its ability to genetically adapt to the host environment. Here, we review the recent work that has applied whole-genome sequencing to understand P. aeruginosa population genomics, within-host microevolution and diversity, mutational...

  15. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    DEFF Research Database (Denmark)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  16. Whole genome analysis of Klebsiella pneumoniae T2-1-1 from human oral cavity

    Directory of Open Access Journals (Sweden)

    Kok-Gan Chan

    2016-03-01

    Full Text Available Klebsiella pneumoniae T2-1-1 was isolated from the human tongue debris and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession JAQL00000000. Keywords: Human tongue surface, Oral cavity, Oral bacteria, Virulence

  17. Challenges and opportunities for whole-genome sequencing–based surveillance of antibiotic resistance

    NARCIS (Netherlands)

    Schürch, Anita C.; van Schaik, Willem

    2017-01-01

    Infections caused by drug-resistant bacteria are increasingly reported across the planet, and drug-resistant bacteria are recognized to be a major threat to public health and modern medicine. In this review, we discuss how whole-genome sequencing (WGS)–based approaches can contribute to the

  18. Rapid whole genome sequencing for the detection and characterization of microorganisms directly from clinical samples

    DEFF Research Database (Denmark)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Pontén, Thomas

    2014-01-01

    Whole genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples this could further reduce diagnostic time and thereby improve control and treatment. A major bottle-neck is the availability of fast and reliable bioinformatics...

  19. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer

    NARCIS (Netherlands)

    Wang, Kai; Yuen, Siu Tsan; Xu, Jiangchun; Lee, Siu Po; Yan, Helen H N; Shi, Stephanie T; Siu, Hoi Cheong; Deng, Shibing; Chu, Kent Man; Law, Simon; Chan, Kok Hoe; Chan, Annie S Y; Tsui, Wai Yin; Ho, Siu Lun; Chan, Anthony K W; Man, Jonathan L K; Foglizzo, Valentina; Ng, Man Kin; Chan, April S; Ching, Yick Pang; Cheng, Grace H W; Xie, Tao; Fernandez, Julio; Li, Vivian S W; Clevers, Hans; Rejto, Paul A; Mao, Mao; Leung, Suet Yi

    Gastric cancer is a heterogeneous disease with diverse molecular and histological subtypes. We performed whole-genome sequencing in 100 tumor-normal pairs, along with DNA copy number, gene expression and methylation profiling, for integrative genomic analysis. We found subtype-specific genetic and

  20. Effective Normalization for Copy Number Variation Detection from Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2012-01-01

    Background Whole genome sequencing enables a high resolution view ofthe human genome and provides unique insights into genome structureat an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools while validatedalso include a number of

  1. Whole-genome sequence of the bacteriophage-sensitive strain Campylobacter jejuni NCTC12662

    DEFF Research Database (Denmark)

    Gencay, Yilmaz Emre; Sørensen, Martine C.H.; Brøndsted, Lone

    2017-01-01

    Campylobacter jejuni NCTC12662 has been the choice bacteriophage isolation strain due to its susceptibility to C. jejuni bacteriophages. This trait makes it a good candidate for studying bacteriophage-host interactions. We report here the whole-genome sequence of NCTC12662, allowing future...

  2. Targeting 160 candidate genes for blood pressure regulation with a genome-wide genotyping array.

    Directory of Open Access Journals (Sweden)

    Siim Sõber

    2009-06-01

    Full Text Available The outcome of Genome-Wide Association Studies (GWAS has challenged the field of blood pressure (BP genetics as previous candidate genes have not been among the top loci in these scans. We used Affymetrix 500K genotyping data of KORA S3 cohort (n = 1,644; Southern-Germany to address (i SNP coverage in 160 BP candidate genes; (ii the evidence for associations with BP traits in genome-wide and replication data, and haplotype analysis. In total, 160 gene regions (genic region+/-10 kb covered 2,411 SNPs across 11.4 Mb. Marker densities in genes varied from 0 (n = 11 to 0.6 SNPs/kb. On average 52.5% of the HAPMAP SNPs per gene were captured. No evidence for association with BP was obtained for 1,449 tested SNPs. Considerable associations (P50% of HAPMAP SNPs were tagged. In general, genes with higher marker density (>0.2 SNPs/kb revealed a better chance to reach close to significance associations. Although, none of the detected P-values remained significant after Bonferroni correction (P<0.05/2319, P<2.15 x 10(-5, the strength of some detected associations was close to this level: rs10889553 (LEPR and systolic BP (SBP (P = 4.5 x 10(-5 as well as rs10954174 (LEP and diastolic BP (DBP (P = 5.20 x 10(-5. In total, 12 markers in 7 genes (ADRA2A, LEP, LEPR, PTGER3, SLC2A1, SLC4A2, SLC8A1 revealed considerable association (P<10(-3 either with SBP, DBP, and/or hypertension (HYP. None of these were confirmed in replication samples (KORA S4, HYPEST, BRIGHT. However, supportive evidence for the association of rs10889553 (LEPR and rs11195419 (ADRA2A with BP was obtained in meta-analysis across samples stratified either by body mass index, smoking or alcohol consumption. Haplotype analysis highlighted LEPR and PTGER3. In conclusion, the lack of associations in BP candidate genes may be attributed to inadequate marker coverage on the genome-wide arrays, small phenotypic effects of the loci and/or complex interaction with life-style and metabolic parameters.

  3. Routine Whole-Genome Sequencing for Outbreak Investigations of Staphylococcus aureus in a National Reference Center

    Directory of Open Access Journals (Sweden)

    Geraldine Durand

    2018-03-01

    Full Text Available The French National Reference Center for Staphylococci currently uses DNA arrays and spa typing for the initial epidemiological characterization of Staphylococcus aureus strains. We here describe the use of whole-genome sequencing (WGS to investigate retrospectively four distinct and virulent S. aureus lineages [clonal complexes (CCs: CC1, CC5, CC8, CC30] involved in hospital and community outbreaks or sporadic infections in France. We used a WGS bioinformatics pipeline based on de novo assembly (reference-free approach, single nucleotide polymorphism analysis, and on the inclusion of epidemiological markers. We examined the phylogeographic diversity of the French dominant hospital-acquired CC8-MRSA (methicillin-resistant S. aureus Lyon clone through WGS analysis which did not demonstrate evidence of large-scale geographic clustering. We analyzed sporadic cases along with two outbreaks of a CC1-MSSA (methicillin-susceptible S. aureus clone containing the Panton–Valentine leukocidin (PVL and results showed that two sporadic cases were closely related. We investigated an outbreak of PVL-positive CC30-MSSA in a school environment and were able to reconstruct the transmission history between eight families. We explored different outbreaks among newborns due to the CC5-MRSA Geraldine clone and we found evidence of an unsuspected link between two otherwise distinct outbreaks. Here, WGS provides the resolving power to disprove transmission events indicated by conventional methods (same sequence type, spa type, toxin profile, and antibiotic resistance profile and, most importantly, WGS can reveal unsuspected transmission events. Therefore, WGS allows to better describe and understand outbreaks and (inter-national dissemination of S. aureus lineages. Our findings underscore the importance of adding WGS for (inter-national surveillance of infections caused by virulent clones of S. aureus but also substantiate the fact that technological optimization at

  4. Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success.

    Science.gov (United States)

    Humble, Emily; Thorne, Michael A S; Forcada, Jaume; Hoffman, Joseph I

    2016-08-26

    Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be

  5. Genotyping of Single Nucleotide Polymorphisms in DNA Isolated from Serum Using Sequenom MassARRAY Technology.

    Directory of Open Access Journals (Sweden)

    Tess V Clendenen

    Full Text Available Large epidemiologic studies have the potential to make valuable contributions to the assessment of gene-environment interactions because they prospectively collected detailed exposure data. Some of these studies, however, have only serum or plasma samples as a low quantity source of DNA.We examined whether DNA isolated from serum can be used to reliably and accurately genotype single nucleotide polymorphisms (SNPs using Sequenom multiplex SNP genotyping technology. We genotyped 81 SNPs using samples from 158 participants in the NYU Women's Health Study. Each participant had DNA from serum and at least one paired DNA sample isolated from a high quality source of DNA, i.e. clots and/or cell precipitates, for comparison.We observed that 60 of the 81 SNPs (74% had high call frequencies (≥95% using DNA from serum, only slightly lower than the 85% of SNPs with high call frequencies in DNA from clots or cell precipitates. Of the 57 SNPs with high call frequencies for serum, clot, and cell precipitate DNA, 54 (95% had highly concordant (>98% genotype calls across all three sample types. High purity was not a critical factor to successful genotyping.Our results suggest that this multiplex SNP genotyping method can be used reliably on DNA from serum in large-scale epidemiologic studies.

  6. The first whole genome sequence and pathogenicity characterization of a fowl adenovirus 4 isolated from ducks associated with inclusion body hepatitis and hydropericardium syndrome.

    Science.gov (United States)

    Pan, Qing; Liu, Linlin; Wang, Yongqiang; Zhang, Yanping; Qi, Xiaole; Liu, Changjun; Gao, Yulong; Wang, Xiaomei; Cui, Hongyu

    2017-10-01

    In June 2015, an infectious disease with high prevalence causing severe hydropericardium syndrome (HPS) first appeared in duck farms of northeast China. The disease showed high morbidity of 35% and mortality of 15% in a commercial duck farm with 200,000 45-day-old ducks. One strain of hypervirulent fowl adenovirus serotype 4 was identified and designated as HLJDAd15. The whole genome of the duck isolate was sequenced and found to contain the same large deletions as a genotype that has become prevalent in chickens in China recently, indicating that this disease might be transmitted from chickens to ducks. The pathogenicity of HLJDAd15 was evaluated in SPF chickens and ducks. The results showed that chickens were more susceptible to this new genotype of fowl adenovirus, and it was more difficult to infect ducks than chickens with the duck origin virus. Thus, it appears that this severe HPS in ducks is far more likely to have been transmitted from chickens to ducks than from ducks to ducks. Therefore, transmission from chickens to ducks constitutes a threat to the duck farming industry, and this transmission route is a very important consideration for the prevention and control of the new genotype of fowl adenovirus. This is the first whole genome sequence of a FAdV-4 isolated from ducks, and this information is important for understanding the molecular characteristics and evolution of aviadenoviruses. The potential risks of infection with this new hypervirulent FAdV-4 genotype in chickens and ducks urgently require an effective vaccine.

  7. Efficient genome-wide genotyping strategies and data integration in crop plants.

    Science.gov (United States)

    Torkamaneh, Davoud; Boyle, Brian; Belzile, François

    2018-03-01

    Next-generation sequencing (NGS) has revolutionized plant and animal research by providing powerful genotyping methods. This review describes and discusses the advantages, challenges and, most importantly, solutions to facilitate data processing, the handling of missing data, and cross-platform data integration. Next-generation sequencing technologies provide powerful and flexible genotyping methods to plant breeders and researchers. These methods offer a wide range of applications from genome-wide analysis to routine screening with a high level of accuracy and reproducibility. Furthermore, they provide a straightforward workflow to identify, validate, and screen genetic variants in a short time with a low cost. NGS-based genotyping methods include whole-genome re-sequencing, SNP arrays, and reduced representation sequencing, which are widely applied in crops. The main challenges facing breeders and geneticists today is how to choose an appropriate genotyping method and how to integrate genotyping data sets obtained from various sources. Here, we review and discuss the advantages and challenges of several NGS methods for genome-wide genetic marker development and genotyping in crop plants. We also discuss how imputation methods can be used to both fill in missing data in genotypic data sets and to integrate data sets obtained using different genotyping tools. It is our hope that this synthetic view of genotyping methods will help geneticists and breeders to integrate these NGS-based methods in crop plant breeding and research.

  8. Whole genome typing of the recently emerged Canadian serogroup W Neisseria meningitidis sequence type 11 clonal complex isolates associated with invasive meningococcal disease.

    Science.gov (United States)

    Tsang, Raymond S W; Ahmad, Tauqeer; Tyler, Shaun; Lefebvre, Brigitte; Deeks, Shelley L; Gilca, Rodica; Hoang, Linda; Tyrrell, Gregory; Van Caeseele, Paul; Van Domselaar, Gary; Jamieson, Frances B

    2018-04-01

    This study was performed to analyze the Canadian invasive serogroup W Neisseria meningitidis (MenW) sequence type 11 (ST-11) clonal complex (CC) isolates by whole genome typing and to compare Canadian isolates with similar isolates from elsewhere. Whole genome typing of 30 MenW ST-11 CC, 20 meningococcal group C (MenC) ST-11 CC, and 31 MenW ST-22 CC isolates was performed on the Bacterial Isolate Genome Sequence database platform. Canadian MenW ST-11 CC isolates were compared with the 2000 MenW Hajj outbreak strain, as well as with MenW ST-11 CC from other countries. Whole genome typing showed that the Canadian MenW ST-11 CC isolates were distinct from the traditional MenW ST-22 CC; they were not capsule-switched contemporary MenC strains that incorporated MenW capsules. While some recent MenW disease cases in Canada were caused by MenW ST-11 CC isolates showing relatedness to the 2000 MenW Hajj strain, many were non-Hajj isolates similar to current MenW ST-11 isolates found globally. Geographical and temporal variations in genotypes and surface protein antigen genes were found among the MenW ST-11 CC isolates. The current MenW ST-11 isolates did not arise by capsule switching from contemporary MenC ST-11 isolates. Both the Hajj-related and non-Hajj MenW ST-11 CC strains were associated with invasive meningococcal disease in Canada. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  9. Whole genome sequence of Enterobacter ludwigii type strain EN-119T, isolated from clinical specimens.

    Science.gov (United States)

    Li, Gengmi; Hu, Zonghai; Zeng, Ping; Zhu, Bing; Wu, Lijuan

    2015-04-01

    Enterobacter ludwigii strain EN-119(T) is the type strain of E. ludwigii, which belongs to the E. cloacae complex (Ecc). This strain was first reported and nominated in 2005 and later been found in many hospitals. In this paper, the whole genome sequencing of this strain was carried out. The total genome size of EN-119(T) is 4952,770 bp with 4578 coding sequences, 88 tRNAs and 10 rRNAs. The genome sequence of EN-119(T) is the first whole genome sequence of E. ludwigii, which will further our understanding of Ecc. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  10. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology

    DEFF Research Database (Denmark)

    Hallin, Peter Fischer; Binnewies, Tim Terence; Ussery, David

    2008-01-01

    ://www.cbs.dtu.dk/ws/BLASTatlas), where programming examples are available in Perl. By providing an interoperable method to carry out whole genome visualization of homology, this service offers bioinformaticians as well as biologists an easy-to-adopt workflow that can be directly called from the programming language of the user, hence......The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced...... genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including...

  11. Whole genome sequencing of Mycobacterium tuberculosis SB24 isolated from Sabah, Malaysia

    Directory of Open Access Journals (Sweden)

    Noraini Philip

    2016-09-01

    Full Text Available Mycobacterium tuberculosis (M. tuberculosis is the causative agent of tuberculosis (TB that causes millions of death every year. We have sequenced the genome of M. tuberculosis isolated from cerebrospinal fluid (CSF of a patient diagnosed with tuberculous meningitis (TBM. The isolated strain was referred as M. tuberculosis SB24. Genomic DNA of the M. tuberculosis SB24 was extracted and subjected to whole genome sequencing using PacBio platform. The draft genome size of M. tuberculosis SB24 was determined to be 4,452,489 bp with a G + C content of 65.6%. The whole genome shotgun project has been deposited in NCBI SRA under the accession number SRP076503.

  12. Single Cell HLA Matching Feasibility by Whole Genomic Amplification and Nested PCR

    Institute of Scientific and Technical Information of China (English)

    Xiao-hong Li; Fang-yin Meng

    2004-01-01

    @@ PCR based single-cell DNA analysis has been widely used in forensic science, preimplantation genetic diagnosis and so on. However, the original sample cannot be efficiently retrieved following single cell PCR, consequently the amount of information gained is limited. HLA system is too sophisticated that it is very hard to complete HLA typing by single cell. A Taq polymerase-based method using random primers to amplify whole genome termed as whole genome amplification (WGA) has demonstrated to be a useful method in increasing the copies of minimum sample. We establish a technique in this study to amplify HLA-A and HLA-B loci at same time in a single cell using WGA.

  13. Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications.

    Science.gov (United States)

    Huang, Lei; Ma, Fei; Chapman, Alec; Lu, Sijia; Xie, Xiaoliang Sunney

    2015-01-01

    We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are defined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep sequencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mutation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic diagnosis (PGD), and preimplantation genomic screening (PGS) for in vitro-fertilized embryos.

  14. Determining the cause of recurrent Clostridium difficile infection using whole genome sequencing.

    Science.gov (United States)

    Sim, James Heng Chiak; Truong, Cynthia; Minot, Samuel S; Greenfield, Nick; Budvytiene, Indre; Lohith, Akshar; Anikst, Victoria; Pourmand, Nader; Banaei, Niaz

    2017-01-01

    Understanding the contribution of relapse and reinfection to recurrent Clostridium difficile infection (CDI) has implications for therapy and infection prevention, respectively. We used whole genome sequencing to determine the relation of C. difficile strains isolated from patients with recurrent CDI at an academic medical center in the United States. Thirty-five toxigenic C. difficile isolates from 16 patients with 19 recurrent CDI episodes with median time of 53.5days (range, 13-362) between episodes were whole genome sequenced on the Illumina MiSeq platform. In 84% (16) of recurrences, the cause of recurrence was relapse with prior strain of C. difficile. In 16% (3) of recurrent episodes, reinfection with a new strain of C. difficile was the cause. In conclusion, the majority of CDI recurrences at our institution were due to infection with the same strain rather than infection with a new strain. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Comparison of whole genome amplification techniques for human single cell exome sequencing.

    Science.gov (United States)

    Borgström, Erik; Paterlini, Marta; Mold, Jeff E; Frisen, Jonas; Lundeberg, Joakim

    2017-01-01

    Whole genome amplification (WGA) is currently a prerequisite for single cell whole genome or exome sequencing. Depending on the method used the rate of artifact formation, allelic dropout and sequence coverage over the genome may differ significantly. The largest difference between the evaluated protocols was observed when analyzing the target coverage and read depth distribution. These differences also had impact on the downstream variant calling. Conclusively, the products from the AMPLI1 and MALBAC kits were shown to be most similar to the bulk samples and are therefore recommended for WGA of single cells. In this study four commercial kits for WGA (AMPLI1, MALBAC, Repli-G and PicoPlex) were used to amplify human single cells. The WGA products were exome sequenced together with non-amplified bulk samples from the same source. The resulting data was evaluated in terms of genomic coverage, allelic dropout and SNP calling.

  16. How could disclosing incidental information from whole-genome sequencing affect patient behavior?

    Science.gov (United States)

    Christensen, Kurt D; Green, Robert C

    2013-06-01

    In this article, we argue that disclosure of incidental findings from whole-genome sequencing has the potential to motivate individuals to change health behaviors through psychological mechanisms that differ from typical risk assessment interventions. Their ability to do so, however, is likely to be highly contingent upon the nature of the incidental findings and how they are disclosed, the context of the disclosure and the characteristics of the patient. Moreover, clinicians need to be aware that behavioral responses may occur in unanticipated ways. This article argues for commentators and policy makers to take a cautious but optimistic perspective while empirical evidence is collected through ongoing research involving whole-genome sequencing and the disclosure of incidental information.

  17. Evaluation of a Stenotrophomonas maltophilia bacteremia cluster in hematopoietic stem cell transplantation recipients using whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Stefanie Kampmeier

    2017-11-01

    Full Text Available Abstract Background Stenotrophomonas maltophilia ubiquitously occurs in the hospital environment. This opportunistic pathogen can cause severe infections in immunocompromised hosts such as hematopoietic stem cell transplantation (HSCT recipients. Between February and July 2016, a cluster of four patients on the HSCT unit suffered from S. maltophilia bloodstream infections (BSI. Methods For epidemiological investigation we retrospectively identified the colonization status of patients admitted to the ward during this time period and performed environmental monitoring of shower heads, shower outlets, washbasins and toilets in patient rooms. We tested antibiotic susceptibility of detected S. maltophilia isolates. Environmental and blood culture samples were subjected to whole genome sequence (WGS-based typing. Results Of four patients with S. maltophlilia BSI, three were found to be colonized previously. In addition, retrospective investigations revealed two patients being colonized in anal swab samples but not infected. Environmental monitoring revealed one shower outlet contaminated with S. maltophilia. Antibiotic susceptibility testing of seven S. maltophlia strains resulted in two trimethoprim/sulfamethoxazole resistant and five susceptible isolates, however, not excluding an outbreak scenario. WGS-based typing did not result in any close genotypic relationship among the patients’ isolates. In contrast, one environmental isolate from a shower outlet was closely related to a single patient’s isolate. Conclusion WGS-based typing successfully refuted an outbreak of S. maltophilia on a HSCT ward but uncoverd that sanitary installations can be an actual source of S. maltophilia transmissions.

  18. The need for high-quality whole-genome sequence databases in microbial forensics.

    Science.gov (United States)

    Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats

    2013-09-01

    Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.

  19. Demographic history and biologically relevant genetic variation of Native Mexicans inferred from whole-genome sequencing

    OpenAIRE

    Romero-Hidalgo, Sandra; Ochoa-Leyva, Adrián; Garcíarrubio, Alejandro; Acuña-Alonzo, Victor; Antúnez-Argüelles, Erika; Balcazar-Quintero, Martha; Barquera-Lozano, Rodrigo; Carnevale, Alessandra; Cornejo-Granados, Fernanda; Fernández-López, Juan Carlos; García-Herrera, Rodrigo; García-Ortíz, Humberto; Granados-Silvestre, Ángeles; Granados, Julio; Guerrero-Romero, Fernando

    2017-01-01

    Understanding the genetic structure of Native American populations is important to clarify their diversity, demographic history, and to identify genetic factors relevant for biomedical traits. Here, we show a demographic history reconstruction from 12 Native American whole genomes belonging to six distinct ethnic groups representing the three main described genetic clusters of Mexico (Northern, Southern, and Maya). Effective population size estimates of all Native American groups remained bel...

  20. A synergism between adaptive effects and evolvability drives whole genome duplication to fixation

    OpenAIRE

    Cuypers, Thomas D; Hogeweg, Paulien; Hogeweg, P.

    2014-01-01

    Whole genome duplication has shaped eukaryotic evolutionary history and has been associated with drastic environmental change and species radiation. While the most common fate of WGD duplicates is a return to single copy, retained duplicates have been found enriched for highly interacting genes. This pattern has been explained by a neutral process of subfunctionalization and more recently, dosage balance selection. However, much about the relationship between environmental change, WGD and ada...

  1. A synergism between adaptive effects and evolvability drives whole genome duplication to fixation.

    OpenAIRE

    Thomas D Cuypers; Paulien Hogeweg

    2014-01-01

    Whole genome duplication has shaped eukaryotic evolutionary history and has been associated with drastic environmental change and species radiation. While the most common fate of WGD duplicates is a return to single copy, retained duplicates have been found enriched for highly interacting genes. This pattern has been explained by a neutral process of subfunctionalization and more recently, dosage balance selection. However, much about the relationship between environmental change, WGD and ada...

  2. Whole-genome sequencing reveals mutational landscape underlying phenotypic differences between two widespread Chinese cattle breeds

    OpenAIRE

    Xu, Yao; Jiang, Yu; Shi, Tao; Cai, Hanfang; Lan, Xianyong; Zhao, Xin; Plath, Martin; Chen, Hong

    2017-01-01

    Whole-genome sequencing provides a powerful tool to obtain more genetic variability that could produce a range of benefits for cattle breeding industry. Nanyang (Bos indicus) and Qinchuan (Bos taurus) are two important Chinese indigenous cattle breeds with distinct phenotypes. To identify the genetic characteristics responsible for variation in phenotypes between the two breeds, in the present study, we for the first time sequenced the genomes of four Nanyang and four Qinchuan cattle with 10 ...

  3. Whole-Genome Sequence of Chlamydia abortus Strain GN6 Isolated from Aborted Yak Fetus

    OpenAIRE

    Li, Zhaocai; Cai, Jinshan; Cao, Xiaoan; Lou, Zhongzi; Chao, Yilin; Kan, Wei; Zhou, Jizhang

    2017-01-01

    ABSTRACT The obligate intracellular Gram-negative bacterium Chlamydia abortus is one of the causative agents of abortion and fetal loss in sheep, goats, and cattle in many countries. It also affects the reproductivity of yaks (Bos grunniens). This study reports the whole-genome sequence of Chlamydia abortus strain GN6, which was isolated from aborted yak fetus in Qinghai-Tibetan Plateau, China.

  4. Whole-Genome Sequence of Chlamydia abortus Strain GN6 Isolated from Aborted Yak Fetus.

    Science.gov (United States)

    Li, Zhaocai; Cai, Jinshan; Cao, Xiaoan; Lou, Zhongzi; Chao, Yilin; Kan, Wei; Zhou, Jizhang

    2017-08-31

    The obligate intracellular Gram-negative bacterium Chlamydia abortus is one of the causative agents of abortion and fetal loss in sheep, goats, and cattle in many countries. It also affects the reproductivity of yaks ( Bos grunniens ). This study reports the whole-genome sequence of Chlamydia abortus strain GN6, which was isolated from aborted yak fetus in Qinghai-Tibetan Plateau, China. Copyright © 2017 Li et al.

  5. The Future of Whole-Genome Sequencing for Public Health and the Clinic

    OpenAIRE

    Allard, Marc W.

    2016-01-01

    An American Society for Microbiology (ASM) conference titled the Conference on Rapid Next-Generation Sequencing and Bioinformatic Pipelines for Enhanced Molecular Epidemiological Investigation of Pathogens provided a venue for discussing how technologies surrounding whole-genome sequencing (WGS) are advancing microbiology. Several applications in microbial taxonomy, microbial forensics, and genomics for public health pathogen surveillance were presented at the meeting and are reviewed. All of...

  6. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian

    2015-01-01

    methods, focusing particularly on variations detection. Low-coverage whole-genome sequencing revealed that DOP-PCR had the highest duplication ratio, but an even read distribution and the best reproducibility and accuracy for detection of copy-number variations (CNVs). However, MDA had significantly...... performance using SCRS amplified by different WGA methods. It will guide researchers to determine which WGA method is best suited to individual experimental needs at single-cell level....

  7. Whole-genome sequence of the orchid anthracnose pathogen Colletotrichum orchidophilum.

    Science.gov (United States)

    Baroncelli, Riccardo; Sukno, Serenella; Sarrocco, Sabrina; Cafà, Giovanni; Le Floch, Gaetan; Thon, Michael R

    2018-04-12

    Colletotrichum orchidophilum is a plant pathogenic fungus infecting a wide range of plant species belonging to the family Orchidaceae. Besides its economic impact, C. orchidophilum has been used in recent years in evolutionary studies as it represents the closest related species to the C. acutatum species complex. Here we present the first draft whole-genome sequence of C. orchidophilum IMI 309357, providing a resource for future research on anthracnose of Orchidaceae and other hosts.

  8. Whole-Genome Sequencing in Microbial Forensic Analysis of Gamma-Irradiated Microbial Materials.

    Science.gov (United States)

    Broomall, Stacey M; Ait Ichou, Mohamed; Krepps, Michael D; Johnsky, Lauren A; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; Betters, Janet L; Redmond, Brady W; Rivers, Bryan A; Liem, Alvin T; Hill, Jessica M; Fochler, Edward T; Roth, Pierce A; Rosenzweig, C Nicole; Skowronski, Evan W; Gibbons, Henry S

    2016-01-15

    Effective microbial forensic analysis of materials used in a potential biological attack requires robust methods of morphological and genetic characterization of the attack materials in order to enable the attribution of the materials to potential sources and to exclude other potential sources. The genetic homogeneity and potential intersample variability of many of the category A to C bioterrorism agents offer a particular challenge to the generation of attributive signatures, potentially requiring whole-genome or proteomic approaches to be utilized. Currently, irradiation of mail is standard practice at several government facilities judged to be at particularly high risk. Thus, initial forensic signatures would need to be recovered from inactivated (nonviable) material. In the study described in this report, we determined the effects of high-dose gamma irradiation on forensic markers of bacterial biothreat agent surrogate organisms with a particular emphasis on the suitability of genomic DNA (gDNA) recovered from such sources as a template for whole-genome analysis. While irradiation of spores and vegetative cells affected the retention of Gram and spore stains and sheared gDNA into small fragments, we found that irradiated material could be utilized to generate accurate whole-genome sequence data on the Illumina and Roche 454 sequencing platforms. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  9. Gene discovery by chemical mutagenesis and whole-genome sequencing in Dictyostelium.

    Science.gov (United States)

    Li, Cheng-Lin Frank; Santhanam, Balaji; Webb, Amanda Nicole; Zupan, Blaž; Shaulsky, Gad

    2016-09-01

    Whole-genome sequencing is a useful approach for identification of chemical-induced lesions, but previous applications involved tedious genetic mapping to pinpoint the causative mutations. We propose that saturation mutagenesis under low mutagenic loads, followed by whole-genome sequencing, should allow direct implication of genes by identifying multiple independent alleles of each relevant gene. We tested the hypothesis by performing three genetic screens with chemical mutagenesis in the social soil amoeba Dictyostelium discoideum Through genome sequencing, we successfully identified mutant genes with multiple alleles in near-saturation screens, including resistance to intense illumination and strong suppressors of defects in an allorecognition pathway. We tested the causality of the mutations by comparison to published data and by direct complementation tests, finding both dominant and recessive causative mutations. Therefore, our strategy provides a cost- and time-efficient approach to gene discovery by integrating chemical mutagenesis and whole-genome sequencing. The method should be applicable to many microbial systems, and it is expected to revolutionize the field of functional genomics in Dictyostelium by greatly expanding the mutation spectrum relative to other common mutagenesis methods. © 2016 Li et al.; Published by Cold Spring Harbor Laboratory Press.

  10. Rapid Identification of Potential Drugs for Diabetic Nephropathy Using Whole-Genome Expression Profiles of Glomeruli

    Directory of Open Access Journals (Sweden)

    Jingsong Shi

    2016-01-01

    Full Text Available Objective. To investigate potential drugs for diabetic nephropathy (DN using whole-genome expression profiles and the Connectivity Map (CMAP. Methodology. Eighteen Chinese Han DN patients and six normal controls were included in this study. Whole-genome expression profiles of microdissected glomeruli were measured using the Affymetrix human U133 plus 2.0 chip. Differentially expressed genes (DEGs between late stage and early stage DN samples and the CMAP database were used to identify potential drugs for DN using bioinformatics methods. Results. (1 A total of 1065 DEGs (FDR 1.5 were found in late stage DN patients compared with early stage DN patients. (2 Piperlongumine, 15d-PGJ2 (15-delta prostaglandin J2, vorinostat, and trichostatin A were predicted to be the most promising potential drugs for DN, acting as NF-κB inhibitors, histone deacetylase inhibitors (HDACIs, PI3K pathway inhibitors, or PPARγ agonists, respectively. Conclusion. Using whole-genome expression profiles and the CMAP database, we rapidly predicted potential DN drugs, and therapeutic potential was confirmed by previously published studies. Animal experiments and clinical trials are needed to confirm both the safety and efficacy of these drugs in the treatment of DN.

  11. Whole Genome Sequence Analysis Using JSpecies Tool Establishes Clonal Relationships between Listeria monocytogenes Strains from Epidemiologically Unrelated Listeriosis Outbreaks.

    Directory of Open Access Journals (Sweden)

    Laurel S Burall

    Full Text Available In an effort to build a comprehensive genomic approach to food safety challenges, the FDA has implemented a whole genome sequencing effort, GenomeTrakr, which involves the sequencing and analysis of genomes of foodborne pathogens. As a part of this effort, we routinely sequence whole genomes of Listeria monocytogenes (Lm isolates associated with human listeriosis outbreaks, as well as those isolated through other sources. To rapidly establish genetic relatedness of these genomes, we evaluated tetranucleotide frequency analysis via the JSpecies program to provide a cursory analysis of strain relatedness. The JSpecies tetranucleotide (tetra analysis plots standardized (z-score tetramer word frequencies of two strains against each other and uses linear regression analysis to determine similarity (r2. This tool was able to validate the close relationships between outbreak related strains from four different outbreaks. Included in this study was the analysis of Lm strains isolated during the recent caramel apple outbreak and stone fruit incident in 2014. We identified that many of the isolates from these two outbreaks shared a common 4b variant (4bV serotype, also designated as IVb-v1, using a qPCR protocol developed in our laboratory. The 4bV serotype is characterized by the presence of a 6.3 Kb DNA segment normally found in serotype 1/2a, 3a, 1/2c and 3c strains but not in serotype 4b or 1/2b strains. We decided to compare these strains at a genomic level using the JSpecies Tetra tool. Specifically, we compared several 4bV and 4b isolates and identified a high level of similarity between the stone fruit and apple 4bV strains, but not the 4b strains co-identified in the caramel apple outbreak or other 4b or 4bV strains in our collection. This finding was further substantiated by a SNP-based analysis. Additionally, we were able to identify close relatedness between isolates from clinical cases from 1993-1994 and a single case from 2011 as well as

  12. Genotype-phenotype analysis of recombinant chromosome 4 syndrome: an array-CGH study and literature review.

    Science.gov (United States)

    Hemmat, Morteza; Hemmat, Omid; Anguiano, Arturo; Boyar, Fatih Z; El Naggar, Mohammed; Wang, Jia-Chi; Wang, Borris T; Sahoo, Trilochan; Owen, Renius; Haddadin, Mary

    2013-05-02

    Recombinant chromosome 4, a rare constitutional rearrangement arising from pericentric inversion, comprises a duplicated segment of 4p13~p15→4pter and a deleted segment of 4q35→4qter. To date, 10 cases of recombinant chromosome 4 have been reported. We describe the second case in which array-CGH was used to characterize recombinant chromosome 4 syndrome. The patient was a one-year old boy with consistent clinical features. Conventional cytogenetics and FISH documented a recombinant chromosome 4, derived from a paternal pericentric inversion, leading to partial trisomy 4p and partial monosomy of 4q. Array-CGH, performed to further characterize the rearranged chromosome 4 and delineate the breakpoints, documented a small (4.36 Mb) 4q35.1 terminal deletion and a large (23.81 Mb) 4p15.1 terminal duplication. Genotype-phenotype analysis of 10 previously reported cases and the present case indicated relatively consistent clinical features and breakpoints. This consistency was more evident in our case and another characterized by array-CGH, where both showed the common breakpoints of p15.1 and q35.1. A genotype-phenotype correlation study between rec(4), dup(4p), and del(4q) syndromes revealed that urogenital and cardiac defects are probably due to the deletion of 4q whereas the other clinical features are likely due to 4p duplication. Our findings support that the clinical features of patients with rec(4) are relatively consistent and specific to the regions of duplication or deletion. Recombinant chromosome 4 syndrome thus appears to be a discrete entity that can be suspected on the basis of clinical features or specific deleted and duplicated chromosomal regions.

  13. Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Lai-Ping Wong

    2014-05-01

    Full Text Available South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP. The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP. SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.

  14. Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

    Science.gov (United States)

    Wong, Lai-Ping; Lai, Jason Kuan-Han; Saw, Woei-Yuh; Ong, Rick Twee-Hee; Cheng, Anthony Youzhi; Pillai, Nisha Esakimuthu; Liu, Xuanyao; Xu, Wenting; Chen, Peng; Foo, Jia-Nee; Tan, Linda Wei-Lin; Koo, Seok-Hwee; Soong, Richie; Wenk, Markus Rene; Lim, Wei-Yen; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2014-05-01

    South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.

  15. A new sieving matrix for DNA sequencing, genotyping and mutation detection and high-throughput genotyping with a 96-capillary array system

    Energy Technology Data Exchange (ETDEWEB)

    Gao, David [Iowa State Univ., Ames, IA (United States)

    1999-11-08

    Capillary electrophoresis has been widely accepted as a fast separation technique in DNA analysis. In this dissertation, a new sieving matrix is described for DNA analysis, especially DNA sequencing, genetic typing and mutation detection. A high-throughput 96 capillary array electrophoresis system was also demonstrated for simultaneous multiple genotyping. The authors first evaluated the influence of different capillary coatings on the performance of DNA sequencing. A bare capillary was compared with a DB-wax, an FC-coated and a polyvinylpyrrolidone dynamically coated capillary with PEO as sieving matrix. It was found that covalently-coated capillaries had no better performance than bare capillaries while PVP coating provided excellent and reproducible results. The authors also developed a new sieving Matrix for DNA separation based on commercially available poly(vinylpyrrolidone) (PVP). This sieving matrix has a very low viscosity and an excellent self-coating effect. Successful separations were achieved in uncoated capillaries. Sequencing of M13mp18 showed good resolution up to 500 bases in treated PVP solution. Temperature gradient capillary electrophoresis and PVP solution was applied to mutation detection. A heteroduplex sample and a homoduplex reference were injected during a pair of continuous runs. A temperature gradient of 10 C with a ramp of 0.7 C/min was swept throughout the capillary. Detection was accomplished by laser induced fluorescence detection. Mutation detection was performed by comparing the pattern changes between the homoduplex and the heteroduplex samples. High throughput, high detection rate and easy operation were achieved in this system. They further demonstrated fast and reliable genotyping based on CTTv STR system by multiple-capillary array electrophoresis. The PCR products from individuals were mixed with pooled allelic ladder as an absolute standard and coinjected with a 96-vial tray. Simultaneous one-color laser-induced fluorescence

  16. Whole-genome analysis of herbicide-tolerant mutant rice generated by Agrobacterium-mediated gene targeting.

    Science.gov (United States)

    Endo, Masaki; Kumagai, Masahiko; Motoyama, Ritsuko; Sasaki-Yamagata, Harumi; Mori-Hosokawa, Satomi; Hamada, Masao; Kanamori, Hiroyuki; Nagamura, Yoshiaki; Katayose, Yuichi; Itoh, Takeshi; Toki, Seiichi

    2015-01-01

    Gene targeting (GT) is a technique used to modify endogenous genes in target genomes precisely via homologous recombination (HR). Although GT plants are produced using genetic transformation techniques, if the difference between the endogenous and the modified gene is limited to point mutations, GT crops can be considered equivalent to non-genetically modified mutant crops generated by conventional mutagenesis techniques. However, it is difficult to guarantee the non-incorporation of DNA fragments from Agrobacterium in GT plants created by Agrobacterium-mediated GT despite screening with conventional Southern blot and/or PCR techniques. Here, we report a comprehensive analysis of herbicide-tolerant rice plants generated by inducing point mutations in the rice ALS gene via Agrobacterium-mediated GT. We performed genome comparative genomic hybridization (CGH) array analysis and whole-genome sequencing to evaluate the molecular composition of GT rice plants. Thus far, no integration of Agrobacterium-derived DNA fragments has been detected in GT rice plants. However, >1,000 single nucleotide polymorphisms (SNPs) and insertion/deletion (InDels) were found in GT plants. Among these mutations, 20-100 variants might have some effect on expression levels and/or protein function. Information about additive mutations should be useful in clearing out unwanted mutations by backcrossing. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.

  17. Whole Genome Analyses of a Well-Differentiated Liposarcoma Reveals Novel SYT1 and DDR2 Rearrangements

    Science.gov (United States)

    Egan, Jan B.; Barrett, Michael T.; Champion, Mia D.; Middha, Sumit; Lenkiewicz, Elizabeth; Evers, Lisa; Francis, Princy; Schmidt, Jessica; Shi, Chang-Xin; Van Wier, Scott; Badar, Sandra; Ahmann, Gregory; Kortuem, K. Martin; Boczek, Nicole J.; Fonseca, Rafael; Craig, David W.; Carpten, John D.; Borad, Mitesh J.; Stewart, A. Keith

    2014-01-01

    Liposarcoma is the most common soft tissue sarcoma, but little is known about the genomic basis of this disease. Given the low cell content of this tumor type, we utilized flow cytometry to isolate the diploid normal and aneuploid tumor populations from a well-differentiated liposarcoma prior to array comparative genomic hybridization and whole genome sequencing. This work revealed massive highly focal amplifications throughout the aneuploid tumor genome including MDM2, a gene that has previously been found to be amplified in well-differentiated liposarcoma. Structural analysis revealed massive rearrangement of chromosome 12 and 11 gene fusions, some of which may be part of double minute chromosomes commonly present in well-differentiated liposarcoma. We identified a hotspot of genomic instability localized to a region of chromosome 12 that includes a highly conserved, putative L1 retrotransposon element, LOC100507498 which resides within a gene cluster (NAV3, SYT1, PAWR) where 6 of the 11 fusion events occurred. Interestingly, a potential gene fusion was also identified in amplified DDR2, which is a potential therapeutic target of kinase inhibitors such as dastinib, that are not routinely used in the treatment of patients with liposarcoma. Furthermore, 7 somatic, damaging single nucleotide variants have also been identified, including D125N in the PTPRQ protein. In conclusion, this work is the first to report the entire genome of a well-differentiated liposarcoma with novel chromosomal rearrangements associated with amplification of therapeutically targetable genes such as MDM2 and DDR2. PMID:24505276

  18. Whole genome analyses of a well-differentiated liposarcoma reveals novel SYT1 and DDR2 rearrangements.

    Directory of Open Access Journals (Sweden)

    Jan B Egan

    Full Text Available Liposarcoma is the most common soft tissue sarcoma, but little is known about the genomic basis of this disease. Given the low cell content of this tumor type, we utilized flow cytometry to isolate the diploid normal and aneuploid tumor populations from a well-differentiated liposarcoma prior to array comparative genomic hybridization and whole genome sequencing. This work revealed massive highly focal amplifications throughout the aneuploid tumor genome including MDM2, a gene that has previously been found to be amplified in well-differentiated liposarcoma. Structural analysis revealed massive rearrangement of chromosome 12 and 11 gene fusions, some of which may be part of double minute chromosomes commonly present in well-differentiated liposarcoma. We identified a hotspot of genomic instability localized to a region of chromosome 12 that includes a highly conserved, putative L1 retrotransposon element, LOC100507498 which resides within a gene cluster (NAV3, SYT1, PAWR where 6 of the 11 fusion events occurred. Interestingly, a potential gene fusion was also identified in amplified DDR2, which is a potential therapeutic target of kinase inhibitors such as dastinib, that are not routinely used in the treatment of patients with liposarcoma. Furthermore, 7 somatic, damaging single nucleotide variants have also been identified, including D125N in the PTPRQ protein. In conclusion, this work is the first to report the entire genome of a well-differentiated liposarcoma with novel chromosomal rearrangements associated with amplification of therapeutically targetable genes such as MDM2 and DDR2.

  19. Human papillomavirus genotyping using an automated film-based chip array.

    Science.gov (United States)

    Erali, Maria; Pattison, David C; Wittwer, Carl T; Petti, Cathy A

    2009-09-01

    The INFINITI HPV-QUAD assay is a commercially available genotyping platform for human papillomavirus (HPV) that uses multiplex PCR, followed by automated processing for primer extension, hybridization, and detection. The analytical performance of the HPV-QUAD assay was evaluated using liquid cervical cytology specimens, and the results were compared with those results obtained using the digene High-Risk HPV hc2 Test (HC2). The specimen types included Surepath and PreservCyt transport media, as well as residual SurePath and HC2 transport media from the HC2 assay. The overall concordance of positive and negative results following the resolution of indeterminate and intermediate results was 83% among the 197 specimens tested. HC2 positive (+) and HPV-QUAD negative (-) results were noted in 24 specimens that were shown by real-time PCR and sequence analysis to contain no HPV, HPV types that were cross-reactive in the HC2 assay, or low virus levels. Conversely, HC2 (-) and HPV-QUAD (+) results were noted in four specimens and were subsequently attributed to cross-contamination. The most common HPV types to be identified in this study were HPV16, HPV18, HPV52/58, and HPV39/56. We show that the HPV-QUAD assay is a user friendly, automated system for the identification of distinct HPV genotypes. Based on its analytical performance, future studies with this platform are warranted to assess its clinical utility for HPV detection and genotyping.

  20. Supplementary Material for: Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance

    KAUST Repository

    Phelan, Jody; Coll, Francesc; McNerney, Ruth; Ascher, David; Pires, Douglas; Furnham, Nick; Coeck, Nele; Hill-Cawthorne, Grant; Nair, Mridul; Mallard, Kim; Ramsay, Andrew; Campino, Susana; Hibberd, Martin; Pain, Arnab; Rigouts, Leen; Clark, Taane

    2016-01-01

    Abstract Background Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure

  1. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus

    OpenAIRE

    Lee, Yookyung; Lim, Sooyeon; Rhee, Moon-Soo; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-01-01

    Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000. Keywords: Insect, Larval gut, Whole genome shot-gun sequencing

  2. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus

    Directory of Open Access Journals (Sweden)

    Yookyung Lee

    2016-03-01

    Full Text Available Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000. Keywords: Insect, Larval gut, Whole genome shot-gun sequencing

  3. Whole genome sequencing of a rare rotavirus from archived stool sample demonstrates independent zoonotic origin of human G8P[14] strains in Hungary.

    Science.gov (United States)

    Marton, Szilvia; Dóró, Renáta; Fehér, Enikő; Forró, Barbara; Ihász, Katalin; Varga-Kugler, Renáta; Farkas, Szilvia L; Bányai, Krisztián

    2017-01-02

    Genotype P[14] rotaviruses in humans are thought to be zoonotic strains originating from bovine or ovine host species. Over the past 30 years only few genotype P[14] strains were identified in Hungary totalinghuman rotaviruses whose genotype had been determined. In this study we report the genome sequence and phylogenetic analysis of a human genotype G8P[14] strain, RVA/Human-wt/HUN/182-02/2001/G8P[14]. The whole genome constellation (G8-P[14]-I2-R2-C2-M2-A11-N2-T6-E2-H3) of this strain was shared with another Hungarian zoonotic G8P[14] strain, RVA/Human-wt/HUN/BP1062/2004/G8P[14], although phylogenetic analyses revealed the two rotaviruses likely had different progenitors. Overall, our findings indicate that human G8P[14] rotavirus detected in Hungary in the past originated from independent zoonotic events. Further studies are needed to assess the public health risk associated with infections by various animal rotavirus strains. Copyright © 2016. Published by Elsevier B.V.

  4. Whole genome comparisons of Fragaria, Prunus and Malus reveal different modes of evolution between Rosaceous subfamilies.

    Science.gov (United States)

    Jung, Sook; Cestaro, Alessandro; Troggio, Michela; Main, Dorrie; Zheng, Ping; Cho, Ilhyung; Folta, Kevin M; Sosinski, Bryon; Abbott, Albert; Celton, Jean-Marc; Arús, Pere; Shulaev, Vladimir; Verde, Ignazio; Morgante, Michele; Rokhsar, Daniel; Velasco, Riccardo; Sargent, Daniel James

    2012-04-04

    Rosaceae include numerous economically important and morphologically diverse species. Comparative mapping between the member species in Rosaceae have indicated some level of synteny. Recently the whole genome of three crop species, peach, apple and strawberry, which belong to different genera of the Rosaceae family, have been sequenced, allowing in-depth comparison of these genomes. Our analysis using the whole genome sequences of peach, apple and strawberry identified 1399 orthologous regions between the three genomes, with a mean length of around 100 kb. Each peach chromosome showed major orthology mostly to one strawberry chromosome, but to more than two apple chromosomes, suggesting that the apple genome went through more chromosomal fissions in addition to the whole genome duplication after the divergence of the three genera. However, the distribution of contiguous ancestral regions, identified using the multiple genome rearrangements and ancestors (MGRA) algorithm, suggested that the Fragaria genome went through a greater number of small scale rearrangements compared to the other genomes since they diverged from a common ancestor. Using the contiguous ancestral regions, we reconstructed a hypothetical ancestral genome for the Rosaceae 7 composed of nine chromosomes and propose the evolutionary steps from the ancestral genome to the extant Fragaria, Prunus and Malus genomes. Our analysis shows that different modes of evolution may have played major roles in different subfamilies of Rosaceae. The hypothetical ancestral genome of Rosaceae and the evolutionary steps that lead to three different lineages of Rosaceae will facilitate our understanding of plant genome evolution as well as have a practical impact on knowledge transfer among member species of Rosaceae.

  5. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.

    Science.gov (United States)

    Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C

    2012-09-11

    Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  6. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic

    Directory of Open Access Journals (Sweden)

    Sealfon Rachel

    2012-09-01

    Full Text Available Abstract Background Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x; four of the seven isolates were previously sequenced. Results Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961, 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Conclusions Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  7. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus.

    Directory of Open Access Journals (Sweden)

    Elizabeth M Driebe

    Full Text Available Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss.

  8. High-precision, whole-genome sequencing of laboratory strains facilitates genetic studies.

    Directory of Open Access Journals (Sweden)

    Anjana Srivatsan

    2008-08-01

    Full Text Available Whole-genome sequencing is a powerful technique for obtaining the reference sequence information of multiple organisms. Its use can be dramatically expanded to rapidly identify genomic variations, which can be linked with phenotypes to obtain biological insights. We explored these potential applications using the emerging next-generation sequencing platform Solexa Genome Analyzer, and the well-characterized model bacterium Bacillus subtilis. Combining sequencing with experimental verification, we first improved the accuracy of the published sequence of the B. subtilis reference strain 168, then obtained sequences of multiple related laboratory strains and different isolates of each strain. This provides a framework for comparing the divergence between different laboratory strains and between their individual isolates. We also demonstrated the power of Solexa sequencing by using its results to predict a defect in the citrate signal transduction pathway of a common laboratory strain, which we verified experimentally. Finally, we examined the molecular nature of spontaneously generated mutations that suppress the growth defect caused by deletion of the stringent response mediator relA. Using whole-genome sequencing, we rapidly mapped these suppressor mutations to two small homologs of relA. Interestingly, stable suppressor strains had mutations in both genes, with each mutation alone partially relieving the relA growth defect. This supports an intriguing three-locus interaction module that is not easily identifiable through traditional suppressor mapping. We conclude that whole-genome sequencing can drastically accelerate the identification of suppressor mutations and complex genetic interactions, and it can be applied as a standard tool to investigate the genetic traits of model organisms.

  9. Whole genome comparisons of Fragaria, Prunus and Malus reveal different modes of evolution between Rosaceous subfamilies

    Directory of Open Access Journals (Sweden)

    Jung Sook

    2012-04-01

    Full Text Available Abstract Background Rosaceae include numerous economically important and morphologically diverse species. Comparative mapping between the member species in Rosaceae have indicated some level of synteny. Recently the whole genome of three crop species, peach, apple and strawberry, which belong to different genera of the Rosaceae family, have been sequenced, allowing in-depth comparison of these genomes. Results Our analysis using the whole genome sequences of peach, apple and strawberry identified 1399 orthologous regions between the three genomes, with a mean length of around 100 kb. Each peach chromosome showed major orthology mostly to one strawberry chromosome, but to more than two apple chromosomes, suggesting that the apple genome went through more chromosomal fissions in addition to the whole genome duplication after the divergence of the three genera. However, the distribution of contiguous ancestral regions, identified using the multiple genome rearrangements and ancestors (MGRA algorithm, suggested that the Fragaria genome went through a greater number of small scale rearrangements compared to the other genomes since they diverged from a common ancestor. Using the contiguous ancestral regions, we reconstructed a hypothetical ancestral genome for the Rosaceae 7 composed of nine chromosomes and propose the evolutionary steps from the ancestral genome to the extant Fragaria, Prunus and Malus genomes. Conclusion Our analysis shows that different modes of evolution may have played major roles in different subfamilies of Rosaceae. The hypothetical ancestral genome of Rosaceae and the evolutionary steps that lead to three different lineages of Rosaceae will facilitate our understanding of plant genome evolution as well as have a practical impact on knowledge transfer among member species of Rosaceae.

  10. Whole genome sequences and annotation of Micrococcus luteus SUBG006, a novel phytopathogen of mango.

    Science.gov (United States)

    Rakhashiya, Purvi M; Patel, Pooja P; Thaker, Vrinda S

    2015-12-01

    Actinobaceria, Micrococcus luteus SUBG006 was isolated from infected leaves of Mangifera indica L. vr. Nylon in Rajkot, (22.30°N, 70.78°E), Gujarat, India. The genome size is 3.86 Mb with G + C content of 69.80% and contains 112 rRNA sequences (5S, 16S and 23S). The whole genome sequencing has been deposited in DDBJ/EMBL/GenBank under the accession number JOKP00000000.

  11. A Danish Salmonella Bareilly outbreak investigated by the use of whole genome sequencing

    DEFF Research Database (Denmark)

    Torpdahl, M.; Kiil, K.; Litrup, E.

    2013-01-01

    with several band changes and others are defined by one PFGE profile thereby excluding closely related profiles. We decided to investigate whether whole genome sequencing (WGS) could resolve this issue and be useful in outbreak investigations. Several analyses were performed, including a SNP tree based...... on the core genome, MLST profiles and detection of phages in the genome. The human cluster and the broiler isolates belonged to the same ST, but the isolates were divided into two groups, 9 SNPs apart, according to an MP phylogeny. When using PHAST, we found that two phage regions were a 100% similar...

  12. Whole-genome sequencing for identification of the source in hospital-acquired Legionnaires' disease

    DEFF Research Database (Denmark)

    Rosendahl Madsen, A M; Holm, A; Jensen, T G

    2017-01-01

    Acquisition of Legionnaires' disease is a serious complication of hospitalization. Rapid determination of whether or not the infection is caused by strains of Legionella pneumophila in the hospital environment is crucial to avoid further cases. This study investigated the use of whole-genome sequ......Acquisition of Legionnaires' disease is a serious complication of hospitalization. Rapid determination of whether or not the infection is caused by strains of Legionella pneumophila in the hospital environment is crucial to avoid further cases. This study investigated the use of whole...

  13. Isolation and whole-genome sequencing of a Crimean-Congo hemorrhagic fever virus strain, Greece.

    Science.gov (United States)

    Papa, Anna; Papadopoulou, Elpida; Tsioka, Katerina; Kontana, Anastasia; Pappa, Styliani; Melidou, Ageliki; Giadinis, Nektarios D

    2018-03-01

    Crimean-Congo hemorrhagic fever virus (CCHFV) was isolated from a pool of two adult Rhipicephalus bursa ticks removed from a goat in 2015 in Greece. The strain clusters into lineage Europe 2 representing the second available whole-genome sequenced isolate of this lineage. CCHFV IgG antibodies were detected in 8 of 19 goats of the farm. Currently CCHFV is not associated with disease in mammals other than humans. Studies in animal models are needed to investigate the pathogenicity level of lineage Europe 2 and compare it with that of other lineages. Copyright © 2018 Elsevier GmbH. All rights reserved.

  14. Association analysis of whole genome sequencing data accounting for longitudinal and family designs.

    Science.gov (United States)

    Hu, Yijuan; Hui, Qin; Sun, Yan V

    2014-01-01

    Using the whole genome sequencing data and the simulated longitudinal phenotypes for 849 pedigree-based individuals from Genetic Analysis Workshop 18, we investigated various approaches to detecting the association of rare and common variants with blood pressure traits. We compared three strategies for longitudinal data: (a) using the baseline measurement only, (b) using the average from multiple visits, and (c) using all individual measurements. We also compared the power of using all of the pedigree-based data and the unrelated subset. The analyses were performed without knowledge of the underlying simulating model.

  15. Reflections on the cost of "low-cost" whole genome sequencing: framing the health policy debate.

    Directory of Open Access Journals (Sweden)

    Timothy Caulfield

    2013-11-01

    Full Text Available The cost of whole genome sequencing is dropping rapidly. There has been a great deal of enthusiasm about the potential for this technological advance to transform clinical care. Given the interest and significant investment in genomics, this seems an ideal time to consider what the evidence tells us about potential benefits and harms, particularly in the context of health care policy. The scale and pace of adoption of this powerful new technology should be driven by clinical need, clinical evidence, and a commitment to put patients at the centre of health care policy.

  16. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing

    DEFF Research Database (Denmark)

    Li, Ying-hui; Zhao, Shan-cen; Ma, Jian-xin

    2013-01-01

    and genetic improvement were identified.CONCLUSIONS:Given the uniqueness of the soybean germplasm sequenced, this study drew a clear picture of human-mediated evolution of the soybean genomes. The genomic resources and information provided by this study would also facilitate the discovery of genes......BACKGROUND:Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re...

  17. Recommendations to address the difficulties encountered when determining linezolid resistance from whole genome sequencing data.

    Science.gov (United States)

    Beukers, Alicia G; Hasman, Henrik; Hegstad, Kristin; van Hal, Sebastiaan J

    2018-05-29

    Mutations associated with linezolid resistance within the V domain of 23S rRNA are annotated using an Escherichia coli numbering system. The 23S rRNA gene varies in length, nucleotide sequence and copy number between bacterial species. Consequently, this numbering system is not intuitive and can lead to confusion when locating mutation sites using whole genome sequencing data. Using the mutation G2576T as an example, we demonstrate the difficulties associated with using the E. coli numbering system. © Crown copyright 2018.

  18. Diversity and Genome Analysis of Australian and Global Oilseed Brassica napus L. Germplasm Using Transcriptomics and Whole Genome Re-sequencing

    Directory of Open Access Journals (Sweden)

    M. Michelle Malmberg

    2018-04-01

    Full Text Available Intensive breeding of Brassica napus has resulted in relatively low diversity, such that B. napus would benefit from germplasm improvement schemes that sustain diversity. As such, samples representative of global germplasm pools need to be assessed for existing population structure, diversity and linkage disequilibrium (LD. Complexity reduction genotyping-by-sequencing (GBS methods, including GBS-transcriptomics (GBS-t, enable cost-effective screening of a large number of samples, while whole genome re-sequencing (WGR delivers the ability to generate large numbers of unbiased genomic single nucleotide polymorphisms (SNPs, and identify structural variants (SVs. Furthermore, the development of genomic tools based on whole genomes representative of global oilseed diversity and orientated by the reference genome has substantial industry relevance and will be highly beneficial for canola breeding. As recent studies have focused on European and Chinese varieties, a global diversity panel as well as a substantial number of Australian spring types were included in this study. Focusing on industry relevance, 633 varieties were initially genotyped using GBS-t to examine population structure using 61,037 SNPs. Subsequently, 149 samples representative of global diversity were selected for WGR and both data sets used for a side-by-side evaluation of diversity and LD. The WGR data was further used to develop genomic resources consisting of a list of 4,029,750 high-confidence SNPs annotated using SnpEff, and SVs in the form of 10,976 deletions and 2,556 insertions. These resources form the basis of a reliable and repeatable system allowing greater integration between canola genomics studies, with a strong focus on breeding germplasm and industry applicability.

  19. Whole genome sequencing of clinical strains of Mycobacterium tuberculosis from Mumbai, India: A potential tool for determining drug-resistance and strain lineage.

    Science.gov (United States)

    Chatterjee, Anirvan; Nilgiriwala, Kayzad; Saranath, Dhananjaya; Rodrigues, Camilla; Mistry, Nerges

    2017-12-01

    Amplification of drug resistance in Mycobacterium tuberculosis (M.tb) and its transmission are significant barriers in controlling tuberculosis (TB) globally. Diagnostic inaccuracies and delays impede appropriate drug administration, which exacerbates primary and secondary drug resistance. Increasing affordability of whole genome sequencing (WGS) and exhaustive cataloguing of drug resistance mutations is poised to revolutionise TB diagnostics and facilitate personalized drug therapy. However, application of WGS for diagnostics in high endemic areas is yet to be demonstrated. We report WGS of 74 clinical TB isolates from Mumbai, India, characterising genotypic drug resistance to first- and second-line anti-TB drugs. A concordance analysis between phenotypic and genotypic drug susceptibility of a subset of 29 isolates and the sensitivity of resistance prediction to the 4 drugs was calculated, viz. isoniazid-100%, rifampicin-100%, ethambutol-100% and streptomycin-85%. The whole genome based phylogeny showed almost equal proportion of East Asian (27/74) and Central Asian (25/74) strains. Interestingly we also found a clonal group of 9 isolates, of which 7 patients were found to be from the same geographical location and accessed the same health post. This provides the first evidence of epidemiological linkage for tracking TB transmission in India, an approach which has the potential to significantly improve chances of End-TB goals. Finally, the use of Mykrobe Predictor, as a standalone drug resistance and strain typing tool, requiring just few minutes to analyse raw WGS data into tabulated results, implies the rapid clinical applicability of WGS based TB diagnosis. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Population and Whole Genome Sequence Based Characterization of Invasive Group A Streptococci Recovered in the United States during 2015

    Directory of Open Access Journals (Sweden)

    Sopio Chochua

    2017-09-01

    Full Text Available Group A streptococci (GAS are genetically diverse. Determination of strain features can reveal associations with disease and resistance and assist in vaccine formulation. We employed whole-genome sequence (WGS-based characterization of 1,454 invasive GAS isolates recovered in 2015 by Active Bacterial Core Surveillance and performed conventional antimicrobial susceptibility testing. Predictions were made for genotype, GAS carbohydrate, antimicrobial resistance, surface proteins (M family, fibronectin binding, T, R28, secreted virulence proteins (Sda1, Sic, exotoxins, hyaluronate capsule, and an upregulated nga operon (encodes NADase and streptolysin O promoter (Pnga3. Sixty-four M protein gene (emm types were identified among 69 clonal complexes (CCs, including one CC of Streptococcus dysgalactiae subsp. equisimilis. emm types predicted the presence or absence of active sof determinants and were segregated into sof-positive or sof-negative genetic complexes. Only one “emm type switch” between strains was apparent. sof-negative strains showed a propensity to cause infections in the first quarter of the year, while sof+ strain infections were more likely in summer. Of 1,454 isolates, 808 (55.6% were Pnga3 positive and 637 (78.9% were accounted for by types emm1, emm89, and emm12. Theoretical coverage of a 30-valent M vaccine combined with an M-related protein (Mrp vaccine encompassed 98% of the isolates. WGS data predicted that 15.3, 13.8, 12.7, and 0.6% of the isolates were nonsusceptible to tetracycline, erythromycin plus clindamycin, erythromycin, and fluoroquinolones, respectively, with only 19 discordant phenotypic results. Close phylogenetic clustering of emm59 isolates was consistent with recent regional emergence. This study revealed strain traits informative for GAS disease incidence tracking, outbreak detection, vaccine strategy, and antimicrobial therapy.

  1. Whole-genome expression analyses of type 2 diabetes in human skin reveal altered immune function and burden of infection.

    Science.gov (United States)

    Wu, Chun; Chen, Xiaopan; Shu, Jing; Lee, Chun-Ting

    2017-05-23

    Skin disorders are among most common complications associated with type 2 diabetes mellitus (T2DM). Although T2DM patients are known to have increased risk of infections and other T2DM-related skin disorders, their molecular mechanisms are largely unknown. This study aims to identify dysregulated genes and gene networks that are associated with T2DM in human skin. We compared the expression profiles of 56,318 transcribed genes on 74 T2DM cases and 148 gender- age-, and race-matched non-diabetes controls from the Genotype-Tissue Expression (GTEx) database. RNA-Sequencing data indicates that diabetic skin is characterized by increased expression of genes that are related to immune responses (CCL20, CXCL9, CXCL10, CXCL11, CXCL13, and CCL18), JAK/STAT signaling pathway (JAK3, STAT1, and STAT2), tumor necrosis factor superfamily (TNFSF10 and TNFSF15), and infectious disease pathways (OAS1, OAS2, OAS3, and IFIH1). Genes in cell adhesion molecules pathway (NCAM1 and L1CAM) and collagen family (PCOLCE2 and COL9A3) are downregulated, suggesting structural changes in the skin of T2DM. For the first time, to the best of our knowledge, this pioneer analytic study reports comprehensive unbiased gene expression changes and dysregulated pathways in the non-diseased skin of T2DM patients. This comprehensive understanding derived from whole-genome expression profiles could advance our knowledge in determining molecular targets for the prevention and treatment of T2DM-associated skin disorders.

  2. Highly efficient PCR assay to discriminate allelic DNA methylation status using whole genome amplification

    Directory of Open Access Journals (Sweden)

    Ito Takashi

    2011-06-01

    Full Text Available Abstract Background We previously developed a simple method termed HpaII-McrBC PCR (HM-PCR to discriminate allelic methylation status of the genomic sites of interest, and successfully applied it to a comprehensive analysis of CpG islands (CGIs on human chromosome 21q. However, HM-PCR requires 200 ng of genomic DNA to examine one target site, thereby precluding its application to such samples that are limited in quantity. Findings We developed HpaII-McrBC whole-genome-amplification PCR (HM-WGA-PCR that uses whole-genome-amplified DNA as the template. HM-WGA-PCR uses only 1/100th the genomic template material required for HM-PCR. Indeed, we successfully analyzed 147 CGIs by HM-WGA-PCR using only ~300 ng of DNA, whereas previous HM-PCR study had required ~30 μg. Furthermore, we confirmed that allelic methylation status revealed by HM-WGA-PCR is identical to that by HM-PCR in every case of the 147 CGIs tested, proving high consistency between the two methods. Conclusions HM-WGA-PCR would serve as a reliable alternative to HM-PCR in the analysis of allelic methylation status when the quantity of DNA available is limited.

  3. Integration of transcriptome and whole genomic resequencing data to identify key genes affecting swine fat deposition.

    Directory of Open Access Journals (Sweden)

    Kai Xing

    Full Text Available Fat deposition is highly correlated with the growth, meat quality, reproductive performance and immunity of pigs. Fatty acid synthesis takes place mainly in the adipose tissue of pigs; therefore, in this study, a high-throughput massively parallel sequencing approach was used to generate adipose tissue transcriptomes from two groups of Songliao black pigs that had opposite backfat thickness phenotypes. The total number of paired-end reads produced for each sample was in the range of 39.29-49.36 millions. Approximately 188 genes were differentially expressed in adipose tissue and were enriched for metabolic processes, such as fatty acid biosynthesis, lipid synthesis, metabolism of fatty acids, etinol, caffeine and arachidonic acid and immunity. Additionally, many genetic variations were detected between the two groups through pooled whole-genome resequencing. Integration of transcriptome and whole-genome resequencing data revealed important genomic variations among the differentially expressed genes for fat deposition, for example, the lipogenic genes. Further studies are required to investigate the roles of candidate genes in fat deposition to improve pig breeding programs.

  4. Rapid identification of lettuce seed germination mutants by bulked segregant analysis and whole genome sequencing.

    Science.gov (United States)

    Huo, Heqiang; Henry, Isabelle M; Coppoolse, Eric R; Verhoef-Post, Miriam; Schut, Johan W; de Rooij, Han; Vogelaar, Aat; Joosen, Ronny V L; Woudenberg, Leo; Comai, Luca; Bradford, Kent J

    2016-11-01

    Lettuce (Lactuca sativa) seeds exhibit thermoinhibition, or failure to complete germination when imbibed at warm temperatures. Chemical mutagenesis was employed to develop lettuce lines that exhibit germination thermotolerance. Two independent thermotolerant lettuce seed mutant lines, TG01 and TG10, were generated through ethyl methanesulfonate mutagenesis. Genetic and physiological analyses indicated that these two mutations were allelic and recessive. To identify the causal gene(s), we applied bulked segregant analysis by whole genome sequencing. For each mutant, bulked DNA samples of segregating thermotolerant (mutant) seeds were sequenced and analyzed for homozygous single-nucleotide polymorphisms. Two independent candidate mutations were identified at different physical positions in the zeaxanthin epoxidase gene (ABSCISIC ACID DEFICIENT 1/ZEAXANTHIN EPOXIDASE, or ABA1/ZEP) in TG01 and TG10. The mutation in TG01 caused an amino acid replacement, whereas the mutation in TG10 resulted in alternative mRNA splicing. Endogenous abscisic acid contents were reduced in both mutants, and expression of the ABA1 gene from wild-type lettuce under its own promoter fully complemented the TG01 mutant. Conventional genetic mapping confirmed that the causal mutations were located near the ZEP/ABA1 gene, but the bulked segregant whole genome sequencing approach more efficiently identified the specific gene responsible for the phenotype. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  5. Comparison of microbial DNA enrichment tools for metagenomic whole genome sequencing.

    Science.gov (United States)

    Thoendel, Matthew; Jeraldo, Patricio R; Greenwood-Quaintance, Kerryl E; Yao, Janet Z; Chia, Nicholas; Hanssen, Arlen D; Abdel, Matthew P; Patel, Robin

    2016-08-01

    Metagenomic whole genome sequencing for detection of pathogens in clinical samples is an exciting new area for discovery and clinical testing. A major barrier to this approach is the overwhelming ratio of human to pathogen DNA in samples with low pathogen abundance, which is typical of most clinical specimens. Microbial DNA enrichment methods offer the potential to relieve this limitation by improving this ratio. Two commercially available enrichment kits, the NEBNext Microbiome DNA Enrichment Kit and the Molzym MolYsis Basic kit, were tested for their ability to enrich for microbial DNA from resected arthroplasty component sonicate fluids from prosthetic joint infections or uninfected sonicate fluids spiked with Staphylococcus aureus. Using spiked uninfected sonicate fluid there was a 6-fold enrichment of bacterial DNA with the NEBNext kit and 76-fold enrichment with the MolYsis kit. Metagenomic whole genome sequencing of sonicate fluid revealed 13- to 85-fold enrichment of bacterial DNA using the NEBNext enrichment kit. The MolYsis approach achieved 481- to 9580-fold enrichment, resulting in 7 to 59% of sequencing reads being from the pathogens known to be present in the samples. These results demonstrate the usefulness of these tools when testing clinical samples with low microbial burden using next generation sequencing. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Systematic evaluation of bias in microbial community profiles induced by whole genome amplification.

    Science.gov (United States)

    Direito, Susana O L; Zaura, Egija; Little, Miranda; Ehrenfreund, Pascale; Röling, Wilfred F M

    2014-03-01

    Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used [multiple displacement amplification (MDA)] and one new primer-free method [primase-based whole genome amplification (pWGA)] were compared using a polymerase chain reaction (PCR)-based method as control. Pyrosequencing of an environmental sample and principal component analysis revealed that MDA impacted community profiles more strongly than pWGA and indicated that this related to species GC content, although an influence of DNA integrity could not be excluded. Subsequently, biases by species GC content, DNA integrity and fragment size were separately analysed using defined mixtures of DNA from various species. We found significantly less amplification of species with the highest GC content for MDA-based templates and, to a lesser extent, for pWGA. DNA fragmentation also interfered severely: species with more fragmented DNA were less amplified with MDA and pWGA. pWGA was unable to amplify low molecular weight DNA (microbial communities in low-biomass environments and for currently planned astrobiological missions to Mars. © 2013 Society for Applied Microbiology and John Wiley & Sons Ltd.

  7. Whole genome duplication affects evolvability of flowering time in an autotetraploid plant.

    Directory of Open Access Journals (Sweden)

    Sara L Martin

    Full Text Available Whole genome duplications have occurred recurrently throughout the evolutionary history of eukaryotes. The resulting genetic and phenotypic changes can influence physiological and ecological responses to the environment; however, the impact of genome copy number on evolvability has rarely been examined experimentally. Here, we evaluate the effect of genome duplication on the ability to respond to selection for early flowering time in lines drawn from naturally occurring diploid and autotetraploid populations of the plant Chamerion angustifolium (fireweed. We contrast this with the result of four generations of selection on synthesized neoautotetraploids, whose genic variability is similar to diploids but genome copy number is similar to autotetraploids. In addition, we examine correlated responses to selection in all three groups. Diploid and both extant tetraploid and neoautotetraploid lines responded to selection with significant reductions in time to flowering. Evolvability, measured as realized heritability, was significantly lower in extant tetraploids (^b(T =  0.31 than diploids (^b(T =  0.40. Neotetraploids exhibited the highest evolutionary response (^b(T  =  0.55. The rapid shift in flowering time in neotetraploids was associated with an increase in phenotypic variability across generations, but not with change in genome size or phenotypic correlations among traits. Our results suggest that whole genome duplications, without hybridization, may initially alter evolutionary rate, and that the dynamic nature of neoautopolyploids may contribute to the prevalence of polyploidy throughout eukaryotes.

  8. Rediscovery by Whole Genome Sequencing: Classical Mutations and Genome Polymorphisms in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    McCluskey, Kevin; Wiest, Aric E.; Grigoriev, Igor V.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Baker, Scott E.

    2011-06-02

    Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.

  9. Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

    Science.gov (United States)

    Robins-Browne, Roy M.; Holt, Kathryn E.; Ingle, Danielle J.; Hocking, Dianna M.; Yang, Ji; Tauschek, Marija

    2016-01-01

    The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E.coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods. PMID:27917373

  10. Independent Evolution of Winner Traits without Whole Genome Duplication in Dekkera Yeasts.

    Directory of Open Access Journals (Sweden)

    Yi-Cheng Guo

    Full Text Available Dekkera yeasts have often been considered as alternative sources of ethanol production that could compete with S. cerevisiae. The two lineages of yeasts independently evolved traits that include high glucose and ethanol tolerance, aerobic fermentation, and a rapid ethanol fermentation rate. The Saccharomyces yeasts attained these traits mainly through whole genome duplication approximately 100 million years ago (Mya. However, the Dekkera yeasts, which were separated from S. cerevisiae approximately 200 Mya, did not undergo whole genome duplication (WGD but still occupy a niche similar to S. cerevisiae. Upon analysis of two Dekkera yeasts and five closely related non-WGD yeasts, we found that a massive loss of cis-regulatory elements occurred in an ancestor of the Dekkera yeasts, which led to improved mitochondrial functions similar to the S. cerevisiae yeasts. The evolutionary analysis indicated that genes involved in the transcription and translation process exhibited faster evolution in the Dekkera yeasts. We detected 90 positively selected genes, suggesting that the Dekkera yeasts evolved an efficient translation system to facilitate adaptive evolution. Moreover, we identified that 12 vacuolar H+-ATPase (V-ATPase function genes that were under positive selection, which assists in developing tolerance to high alcohol and high sugar stress. We also revealed that the enzyme PGK1 is responsible for the increased rate of glycolysis in the Dekkera yeasts. These results provide important insights to understand the independent adaptive evolution of the Dekkera yeasts and provide tools for genetic modification promoting industrial usage.

  11. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

    Science.gov (United States)

    Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

    2018-05-01

    The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Kernel-based whole-genome prediction of complex traits: a review.

    Science.gov (United States)

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  13. Kernel-based whole-genome prediction of complex traits: a review

    Directory of Open Access Journals (Sweden)

    Gota eMorota

    2014-10-01

    Full Text Available Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways, thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  14. Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not.

    Science.gov (United States)

    Hedge, Jessica; Wilson, Daniel J

    2014-11-25

    Phylogenetic inference in bacterial genomics is fundamental to understanding problems such as population history, antimicrobial resistance, and transmission dynamics. The field has been plagued by an apparent state of contradiction since the distorting effects of recombination on phylogeny were discovered more than a decade ago. Researchers persist with detailed phylogenetic analyses while simultaneously acknowledging that recombination seriously misleads inference of population dynamics and selection. Here we resolve this paradox by showing that phylogenetic tree topologies based on whole genomes robustly reconstruct the clonal frame topology but that branch lengths are badly skewed. Surprisingly, removing recombining sites can exacerbate branch length distortion caused by recombination. Phylogenetic tree reconstruction is a popular approach for understanding the relatedness of bacteria in a population from differences in their genome sequences. However, bacteria frequently exchange regions of their genomes by a process called homologous recombination, which violates a fundamental assumption of phylogenetic methods. Since many researchers continue to use phylogenetics for recombining bacteria, it is important to understand how recombination affects the conclusions drawn from these analyses. We find that whole-genome sequences afford great accuracy in reconstructing evolutionary relationships despite concerns surrounding the presence of recombination, but the branch lengths of the phylogenetic tree are indeed badly distorted. Surprisingly, methods to reduce the impact of recombination on branch lengths can exacerbate the problem. Copyright © 2014 Hedge and Wilson.

  15. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  16. Impact of antenatal glucocorticosteroids on whole-genome expression in preterm babies.

    Science.gov (United States)

    Saugstad, Ola Didrik; Kwinta, Przemko; Wollen, Embjørg Julianne; Bik-Multanowski, Mirosław; Madetko-Talowska, Anna; Jagła, Mateusz; Tomasik, Tomasz; Pietrzyk, Jacek Józef

    2013-04-01

    To study the impact that using antenatal steroid to treat threatened preterm delivery has on whole-genome expression. A prospective whole-genome expression study was carried out on 50 newborn infants, delivered before 32 weeks gestation, who had been exposed to antenatal steroids, including 40 who had received a full antenatal steroid course. Seventy infants not exposed to antenatal steroids formed the control group. Microarray analyses were performed five and 28 days after delivery, and the results were validated by real-time PCR. The study was conducted between September 2008 and November 2010. Twenty thousand six hundred and ninety-three genes were studied in the infants' leucocytes. Thirteen were differentially expressed 5 days after delivery, but there were no differences at day 28. Four genes related to cancer or inflammation were up-regulated. Nine genes were down-regulated: six were Y-linked and associated with malignancies, graft-versus-host disease, male infertility and cell differentiation and three were associated with pre-eclampsia, oxidative stress and chloride/bicarbonate exchange. Seven gene pathways were up-regulated at day five and only one at day 28. These were associated with cell growth, cell cycle regulation, metabolism and apoptosis. Antenatal steroid therapy affects a limited number of genes and gene pathways in leucocytes in preterm babies at day five of life. The effect is short-lived, but long-term effects cannot be ruled out. ©2013 The Author(s)/Acta Paediatrica ©2013 Foundation Acta Paediatrica.

  17. Whole genome analysis of linezolid resistance in Streptococcus pneumoniae reveals resistance and compensatory mutations

    Directory of Open Access Journals (Sweden)

    Légaré Danielle

    2011-10-01

    Full Text Available Abstract Background Several mutations were present in the genome of Streptococcus pneumoniae linezolid-resistant strains but the role of several of these mutations had not been experimentally tested. To analyze the role of these mutations, we reconstituted resistance by serial whole genome transformation of a novel resistant isolate into two strains with sensitive background. We sequenced the parent mutant and two independent transformants exhibiting similar minimum inhibitory concentration to linezolid. Results Comparative genomic analyses revealed that transformants acquired G2576T transversions in every gene copy of 23S rRNA and that the number of altered copies correlated with the level of linezolid resistance and cross-resistance to florfenicol and chloramphenicol. One of the transformants also acquired a mutation present in the parent mutant leading to the overexpression of an ABC transporter (spr1021. The acquisition of these mutations conferred a fitness cost however, which was further enhanced by the acquisition of a mutation in a RNA methyltransferase implicated in resistance. Interestingly, the fitness of the transformants could be restored in part by the acquisition of altered copies of the L3 and L16 ribosomal proteins and by mutations leading to the overexpression of the spr1887 ABC transporter that were present in the original linezolid-resistant mutant. Conclusions Our results demonstrate the usefulness of whole genome approaches at detecting major determinants of resistance as well as compensatory mutations that alleviate the fitness cost associated with resistance.

  18. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    Science.gov (United States)

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  19. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    Science.gov (United States)

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available

  20. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-10-24

    Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity

  1. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic

  2. Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data.

    Science.gov (United States)

    Nishito, Yukari; Osana, Yasunori; Hachiya, Tsuyoshi; Popendorf, Kris; Toyoda, Atsushi; Fujiyama, Asao; Itaya, Mitsuhiro; Sakakibara, Yasubumi

    2010-04-16

    Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for gamma-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks

  3. Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data

    Directory of Open Access Journals (Sweden)

    Fujiyama Asao

    2010-04-01

    Full Text Available Abstract Background Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. Results We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for γ-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. Conclusions The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B

  4. Characterization of a Wheat Breeders' Array suitable for high-throughput SNP genotyping of global accessions of hexaploid bread wheat (Triticum aestivum).

    Science.gov (United States)

    Allen, Alexandra M; Winfield, Mark O; Burridge, Amanda J; Downie, Rowena C; Benbow, Harriet R; Barker, Gary L A; Wilkinson, Paul A; Coghill, Jane; Waterfall, Christy; Davassi, Alessandro; Scopes, Geoff; Pirani, Ali; Webster, Teresa; Brew, Fiona; Bloor, Claire; Griffiths, Simon; Bentley, Alison R; Alda, Mark; Jack, Peter; Phillips, Andrew L; Edwards, Keith J

    2017-03-01

    Targeted selection and inbreeding have resulted in a lack of genetic diversity in elite hexaploid bread wheat accessions. Reduced diversity can be a limiting factor in the breeding of high yielding varieties and crucially can mean reduced resilience in the face of changing climate and resource pressures. Recent technological advances have enabled the development of molecular markers for use in the assessment and utilization of genetic diversity in hexaploid wheat. Starting with a large collection of 819 571 previously characterized wheat markers, here we describe the identification of 35 143 single nucleotide polymorphism-based markers, which are highly suited to the genotyping of elite hexaploid wheat accessions. To assess their suitability, the markers have been validated using a commercial high-density Affymetrix Axiom ® genotyping array (the Wheat Breeders' Array), in a high-throughput 384 microplate configuration, to characterize a diverse global collection of wheat accessions including landraces and elite lines derived from commercial breeding communities. We demonstrate that the Wheat Breeders' Array is also suitable for generating high-density genetic maps of previously uncharacterized populations and for characterizing novel genetic diversity produced by mutagenesis. To facilitate the use of the array by the wheat community, the markers, the associated sequence and the genotype information have been made available through the interactive web site 'CerealsDB'. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  5. Significance of functional disease-causal/susceptible variants identified by whole-genome analyses for the understanding of human diseases.

    Science.gov (United States)

    Hitomi, Yuki; Tokunaga, Katsushi

    2017-01-01

    Human genome variation may cause differences in traits and disease risks. Disease-causal/susceptible genes and variants for both common and rare diseases can be detected by comprehensive whole-genome analyses, such as whole-genome sequencing (WGS), using next-generation sequencing (NGS) technology and genome-wide association studies (GWAS). Here, in addition to the application of an NGS as a whole-genome analysis method, we summarize approaches for the identification of functional disease-causal/susceptible variants from abundant genetic variants in the human genome and methods for evaluating their functional effects in human diseases, using an NGS and in silico and in vitro functional analyses. We also discuss the clinical applications of the functional disease causal/susceptible variants to personalized medicine.

  6. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA

    DEFF Research Database (Denmark)

    Poulsen, Jesper Buchhave; Lescai, Francesco; Grove, Jakob

    2016-01-01

    Stored neonatal dried blood spot (DBS) samples from neonatal screening programmes are a valuable diagnostic and research resource. Combined with information from national health registries they can be used in population-based studies of genetic diseases. DNA extracted from neonatal DBSs can...... be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA). Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject...... we analysed a neonatal DBS sample and corresponding adult whole-blood (WB) reference sample. Different DNA sample types were prepared for each of the subjects. Pilot 1: wgaDNA of 2x3.2mm neonatal DBSs (DBS_2x3.2) and raw DNA extract of the WB reference sample (WB_ref). Pilot 2: DBS_2x3.2, WB...

  7. Whole-Genome Scans Provide Evidence of Adaptive Evolution in Malawian Plasmodium falciparum Isolates

    DEFF Research Database (Denmark)

    Ocholla, Harold; Preston, Mark D; Mipando, Mwapatsa

    2014-01-01

    BACKGROUND:  Selection by host immunity and antimalarial drugs has driven extensive adaptive evolution in Plasmodium falciparum and continues to produce ever-changing landscapes of genetic variation. METHODS:  We performed whole-genome sequencing of 69 P. falciparum isolates from Malawi and used......, an area of high malaria transmission. Allele frequency-based tests provided evidence of recent population growth in Malawi and detected potential targets of host immunity and candidate vaccine antigens. Comparison of the sequence variation between isolates from Malawi and those from 5 geographically...... dispersed countries (Kenya, Burkina Faso, Mali, Cambodia, and Thailand) detected population genetic differences between Africa and Asia, within Southeast Asia, and within Africa. Haplotype-based tests of selection to sequence data from all 6 populations identified signals of directional selection at known...

  8. Whole-genome analyses resolve early branches in the tree of life of modern birds

    DEFF Research Database (Denmark)

    Sicheritz-Pontén, Thomas; Li, Cai; Li, Bo

    2014-01-01

    To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister...... or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator...... and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high...

  9. Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Nielsen, Eva M.; Kaas, Rolf Sommer

    2014-01-01

    Salmonella enterica is a common cause of minor and large food borne outbreaks. To achieve successful and nearly ‘real-time’ monitoring and identification of outbreaks, reliable sub-typing is essential. Whole genome sequencing (WGS) shows great promises for using as a routine epidemiological typing....... Enteritidis and 5 S. Derby were also sequenced and used for comparison. A number of different bioinformatics approaches were applied on the data; including pan-genome tree, k-mer tree, nucleotide difference tree and SNP tree. The outcome of each approach was evaluated in relation to the association...... of the isolates to specific outbreaks. The pan-genome tree clustered 65% of the S. Typhimurium isolates according to the pre-defined epidemiology, the k-mer tree 88%, the nucleotide difference tree 100% and the SNP tree 100% of the strains within S. Typhimurium. The resulting outcome of the four phylogenetic...

  10. A strategic stakeholder approach for addressing further analysis requests in whole genome sequencing research.

    Science.gov (United States)

    Thornock, Bradley Steven O

    2016-01-01

    Whole genome sequencing (WGS) can be a cost-effective and efficient means of diagnosis for some children, but it also raises a number of ethical concerns. One such concern is how researchers derive and communicate results from WGS, including future requests for further analysis of stored sequences. The purpose of this paper is to think about what is at stake, and for whom, in any solution that is developed to deal with such requests. To accomplish this task, this paper will utilize stakeholder theory, a common method used in business ethics. Several scenarios that connect stakeholder concerns and WGS will also posited and analyzed. This paper concludes by developing criteria composed of a series of questions that researchers can answer in order to more effectively address requests for further analysis of stored sequences.

  11. Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding

    Science.gov (United States)

    de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.

    2013-01-01

    Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228

  12. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation

    DEFF Research Database (Denmark)

    Zhao, Shancen; Zheng, Pingping; Dong, Shanshan

    2013-01-01

    The panda lineage dates back to the late Miocene and ultimately leads to only one extant species, the giant panda (Ailuropoda melanoleuca). Although global climate change and anthropogenic disturbances are recognized to shape animal population demography their contribution to panda population...... dynamics remains largely unknown. We sequenced the whole genomes of 34 pandas at an average 4.7-fold coverage and used this data set together with the previously deep-sequenced panda genome to reconstruct a continuous demographic history of pandas from their origin to the present. We identify two...... panda populations that show genetic adaptation to their environments. However, in all three populations, anthropogenic activities have negatively affected pandas for 3,000 years....

  13. Mapping genomic features to functional traits through microbial whole genome sequences.

    Science.gov (United States)

    Zhang, Wei; Zeng, Erliang; Liu, Dan; Jones, Stuart E; Emrich, Scott

    2014-01-01

    Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights.

  14. Small homologous blocks in phytophthora genomes do not point to an ancient whole-genome duplication.

    Science.gov (United States)

    van Hooff, Jolien J E; Snel, Berend; Seidl, Michael F

    2014-05-01

    Genomes of the plant-pathogenic genus Phytophthora are characterized by small duplicated blocks consisting of two consecutive genes (2HOM blocks) and by an elevated abundance of similarly aged gene duplicates. Both properties, in particular the presence of 2HOM blocks, have been attributed to a whole-genome duplication (WGD) at the last common ancestor of Phytophthora. However, large intraspecies synteny-compelling evidence for a WGD-has not been detected. Here, we revisited the WGD hypothesis by deducing the age of 2HOM blocks. Two independent timing methods reveal that the majority of 2HOM blocks arose after divergence of the Phytophthora lineages. In addition, a large proportion of the 2HOM block copies colocalize on the same scaffold. Therefore, the presence of 2HOM blocks does not support a WGD at the last common ancestor of Phytophthora. Thus, genome evolution of Phytophthora is likely driven by alternative mechanisms, such as bursts of transposon activity.

  15. Whole-genome and Transcriptome Sequencing of Prostate Cancer Identify New Genetic Alterations Driving Disease Progression

    DEFF Research Database (Denmark)

    Ren, Shancheng; Wei, Gong-Hong; Liu, Dongbing

    2018-01-01

    BACKGROUND: Global disparities in prostate cancer (PCa) incidence highlight the urgent need to identify genomic abnormalities in prostate tumors in different ethnic populations including Asian men. OBJECTIVE: To systematically explore the genomic complexity and define disease-driven genetic......-scale and comprehensive genomic data of prostate cancer from Asian population. Identification of these genetic alterations may help advance prostate cancer diagnosis, prognosis, and treatment....... alterations in PCa. DESIGN, SETTING, AND PARTICIPANTS: The study sequenced whole-genome and transcriptome of tumor-benign paired tissues from 65 treatment-naive Chinese PCa patients. Subsequent targeted deep sequencing of 293 PCa-relevant genes was performed in another cohort of 145 prostate tumors. OUTCOME...

  16. Microfluidic screening and whole-genome sequencing identifies mutations associated with improved protein secretion by yeast

    DEFF Research Database (Denmark)

    Huang, Mingtao; Bai, Yunpeng; Sjostrom, Staffan L.

    2015-01-01

    There is an increasing demand for biotech-based production of recombinant proteins for use as pharmaceuticals in the food and feed industry and in industrial applications. Yeast Saccharomyces cerevisiae is among preferred cell factories for recombinant protein production, and there is increasing...... interest in improving its protein secretion capacity. Due to the complexity of the secretory machinery in eukaryotic cells, it is difficult to apply rational engineering for construction of improved strains. Here we used high-throughput microfluidics for the screening of yeast libraries, generated by UV...... mutagenesis. Several screening and sorting rounds resulted in the selection of eight yeast clones with significantly improved secretion of recombinant a-amylase. Efficient secretion was genetically stable in the selected clones. We performed whole-genome sequencing of the eight clones and identified 330...

  17. Clinical decision support for whole genome sequence information leveraging a service-oriented architecture: a prototype.

    Science.gov (United States)

    Welch, Brandon M; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

    2014-01-01

    Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time.

  18. Real time application of whole genome sequencing for outbreak investigation - What is an achievable turnaround time?

    Science.gov (United States)

    McGann, Patrick; Bunin, Jessica L; Snesrud, Erik; Singh, Seema; Maybank, Rosslyn; Ong, Ana C; Kwak, Yoon I; Seronello, Scott; Clifford, Robert J; Hinkle, Mary; Yamada, Stephen; Barnhill, Jason; Lesho, Emil

    2016-07-01

    Whole genome sequencing (WGS) is increasingly employed in clinical settings, though few assessments of turnaround times (TAT) have been performed in real-time. In this study, WGS was used to investigate an unfolding outbreak of vancomycin resistant Enterococcus faecium (VRE) among 3 patients in the ICU of a tertiary care hospital. Including overnight culturing, a TAT of just 48.5 h for a comprehensive report was achievable using an Illumina Miseq benchtop sequencer. WGS revealed that isolates from patient 2 and 3 differed from that of patient 1 by a single nucleotide polymorphism (SNP), indicating nosocomial transmission. However, the unparalleled resolution provided by WGS suggested that nosocomial transmission involved two separate events from patient 1 to patient 2 and 3, and not a linear transmission suspected by the time line. Rapid TAT's are achievable using WGS in the clinical setting and can provide an unprecedented level of resolution for outbreak investigations. Published by Elsevier Inc.

  19. Whole-genome sequencing of a malignant granular cell tumor with metabolic response to pazopanib

    Science.gov (United States)

    Wei, Lei; Liu, Song; Conroy, Jeffrey; Wang, Jianmin; Papanicolau-Sengos, Antonios; Glenn, Sean T.; Murakami, Mitsuko; Liu, Lu; Hu, Qiang; Conroy, Jacob; Miles, Kiersten Marie; Nowak, David E.; Liu, Biao; Qin, Maochun; Bshara, Wiam; Omilian, Angela R.; Head, Karen; Bianchi, Michael; Burgher, Blake; Darlak, Christopher; Kane, John; Merzianu, Mihai; Cheney, Richard; Fabiano, Andrew; Salerno, Kilian; Talati, Chetasi; Khushalani, Nikhil I.; Trump, Donald L.; Johnson, Candace S.; Morrison, Carl D.

    2015-01-01

    Granular cell tumors are an uncommon soft tissue neoplasm. Malignant granular cell tumors comprise T transitions, particularly when immediately preceded by a 5′ G. A loss-of-function mutation was detected in a newly recognized tumor suppressor candidate, BRD7. No mutations were found in known targets of pazopanib. However, we identified a receptor tyrosine kinase pathway mutation in GFRA2 that warrants further evaluation. To the best of our knowledge, this is only the second reported case of a malignant granular cell tumor exhibiting a response to pazopanib, and the first whole-genome sequencing of this uncommon tumor type. The findings provide insight into the genetic basis of malignant granular cell tumors and identify potential targets for further investigation. PMID:27148567

  20. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences

    KAUST Repository

    Coll, Francesc

    2015-05-27

    Mycobacterium tuberculosis drug resistance (DR) challenges effective tuberculosis disease control. Current molecular tests examine limited numbers of mutations, and although whole genome sequencing approaches could fully characterise DR, data complexity has restricted their clinical application. A library (1,325 mutations) predictive of DR for 15 anti-tuberculosis drugs was compiled and validated for 11 of them using genomic-phenotypic data from 792 strains. A rapid online ‘TB-Profiler’ tool was developed to report DR and strain-type profiles directly from raw sequences. Using our DR mutation library, in silico diagnostic accuracy was superior to some commercial diagnostics and alternative databases. The library will facilitate sequence-based drug-susceptibility testing.

  1. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

    Science.gov (United States)

    Allard, Marc W; Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M; Brown, Eric W; Timme, Ruth

    2016-08-01

    The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  2. Epidemiological links between tuberculosis cases identified twice as efficiently by whole genome sequencing than conventional molecular typing: A population-based study.

    Science.gov (United States)

    Jajou, Rana; de Neeling, Albert; van Hunen, Rianne; de Vries, Gerard; Schimmel, Henrieke; Mulder, Arnout; Anthony, Richard; van der Hoek, Wim; van Soolingen, Dick

    2018-01-01

    Patients with Mycobacterium tuberculosis isolates sharing identical DNA fingerprint patterns can be epidemiologically linked. However, municipal health services in the Netherlands are able to confirm an epidemiological link in only around 23% of the patients with isolates clustered by the conventional variable number of tandem repeat (VNTR) genotyping. This research aims to investigate whether whole genome sequencing (WGS) is a more reliable predictor of epidemiological links between tuberculosis patients than VNTR genotyping. VNTR genotyping and WGS were performed in parallel on all Mycobacterium tuberculosis complex isolates received at the Netherlands National Institute for Public Health and the Environment in 2016. Isolates were clustered by VNTR when they shared identical 24-loci VNTR patterns; isolates were assigned to a WGS cluster when the pair-wise genetic distance was ≤ 12 single nucleotide polymorphisms (SNPs). Cluster investigation was performed by municipal health services on all isolates clustered by VNTR in 2016. The proportion of epidemiological links identified among patients clustered by either method was calculated. In total, 535 isolates were genotyped, of which 25% (134/535) were clustered by VNTR and 14% (76/535) by WGS; the concordance between both typing methods was 86%. The proportion of epidemiological links among WGS clustered cases (57%) was twice as common than among VNTR clustered cases (31%). When WGS was applied, the number of clustered isolates was halved, while all epidemiologically linked cases remained clustered. WGS is therefore a more reliable tool to predict epidemiological links between tuberculosis cases than VNTR genotyping and will allow more efficient transmission tracing, as epidemiological investigations based on false clustering can be avoided.

  3. Unlocking the diversity of genebanks: whole-genome marker analysis of Swiss bread wheat and spelt

    KAUST Repository

    Mü ller, Thomas; Schierscher-Viret, Beate; Fossati, Dario; Brabant, Cé cile; Schori, Arnold; Keller, Beat; Krattinger, Simon G.

    2017-01-01

    Genebanks play a pivotal role in preserving the genetic diversity present among old landraces and wild progenitors of modern crops and they represent sources of agriculturally important genes that were lost during domestication and in modern breeding. However, undesirable genes that negatively affect crop performance are often co-introduced when landraces and wild crop progenitors are crossed with elite cultivars, which often limit the use of genebank material in modern breeding programs. A detailed genetic characterization is an important prerequisite to solve this problem and to make genebank material more accessible to breeding. Here, we genotyped 502 bread wheat and 293 spelt accessions held in the Swiss National Genebank using a 15K wheat SNP array. The material included both spring and winter wheats and consisted of old landraces and modern cultivars. Genome- and sub-genome-wide analyses revealed that spelt and bread wheat form two distinct gene pools. In addition, we identified bread wheat landraces that were genetically distinct from modern cultivars. Such accessions were possibly missed in the early Swiss wheat breeding program and are promising targets for the identification of novel genes. The genetic information obtained in this study is appropriate to perform genome-wide association studies, which will facilitate the identification and transfer of agriculturally important genes from the genebank into modern cultivars through marker-assisted selection.

  4. Unlocking the diversity of genebanks: whole-genome marker analysis of Swiss bread wheat and spelt

    KAUST Repository

    Müller, Thomas

    2017-11-04

    Genebanks play a pivotal role in preserving the genetic diversity present among old landraces and wild progenitors of modern crops and they represent sources of agriculturally important genes that were lost during domestication and in modern breeding. However, undesirable genes that negatively affect crop performance are often co-introduced when landraces and wild crop progenitors are crossed with elite cultivars, which often limit the use of genebank material in modern breeding programs. A detailed genetic characterization is an important prerequisite to solve this problem and to make genebank material more accessible to breeding. Here, we genotyped 502 bread wheat and 293 spelt accessions held in the Swiss National Genebank using a 15K wheat SNP array. The material included both spring and winter wheats and consisted of old landraces and modern cultivars. Genome- and sub-genome-wide analyses revealed that spelt and bread wheat form two distinct gene pools. In addition, we identified bread wheat landraces that were genetically distinct from modern cultivars. Such accessions were possibly missed in the early Swiss wheat breeding program and are promising targets for the identification of novel genes. The genetic information obtained in this study is appropriate to perform genome-wide association studies, which will facilitate the identification and transfer of agriculturally important genes from the genebank into modern cultivars through marker-assisted selection.

  5. Array-based genotyping and genetic dissimilarity analysis of a set of maize inbred lines belonging to different heterotic groups

    Directory of Open Access Journals (Sweden)

    Jambrović Antun

    2014-01-01

    Full Text Available Here we describe the results of the detailed array-based genotyping obtained by using the Illumina MaizeSNP50 BeadChip of eleven inbred lines belonging to different heterotic groups relevant for maize breeding in Southeast Europe - European Corn Belt. The objectives of this study were to assess the utility of the MaizeSNP50 BeadChip platform by determining its descriptive power and to assess genetic dissimilarity of the inbred lines. The distribution of the SNPs was found not completely uniform among chromosomes, but average call rate was very high (97.9% and number of polymorphic loci was 33200 out of 50074 SNPs with known mapping position indicating descriptive power of the MaizeSNP50 BeadChip. The dendrogram obtained from UPGMA cluster analysis as well as principal component analysis (PCA confirmed pedigree information, undoubtedly distinguishing lines according to their background in two population varieties of Reid Yellow Dent and Lancaster Sure Crop. Dissimilarity analysis showed that all of the inbred lines could be distinguished from each other. Whereas cluster analysis did not definitely differentiate Mo17 and Ohio inbred lines, PCA revealed clear genetic differences between them. The studied inbred lines were confirmed to be genetically diverse, representing a large proportion of the genetic variation occurring in two maize heterotic groups.

  6. A whole genome association study to detect additive and dominant single nucleotide polymorphisms for growth and carcass traits in Korean native cattle, Hanwoo

    Directory of Open Access Journals (Sweden)

    Yi Li

    2017-01-01

    Full Text Available Objective A whole genome association study was conducted to identify single nucleotide polymorphisms (SNPs with additive and dominant effects for growth and carcass traits in Korean native cattle, Hanwoo. Methods The data set comprised 61 sires and their 486 Hanwoo steers that were born between spring of 2005 and fall of 2007. The steers were genotyped with the 35,968 SNPs that were embedded in the Illumina bovine SNP 50K beadchip and six growth and carcass quality traits were measured for the steers. A series of lack-of-fit tests between the models was applied to classify gene expression pattern as additive or dominant. Results A total of 18 (0, 15 (3, 12 (8, 15 (18, 11 (7, and 21 (1 SNPs were detected at the 5% chromosome (genome - wise level for weaning weight (WWT, yearling weight (YWT, carcass weight (CWT, backfat thickness (BFT, longissimus dorsi muscle area (LMA and marbling score, respectively. Among the significant 129 SNPs, 56 SNPs had additive effects, 20 SNPs dominance effects, and 53 SNPs both additive and dominance effects, suggesting that dominance inheritance mode be considered in genetic improvement for growth and carcass quality in Hanwoo. The significant SNPs were located at 33 quantitative trait locus (QTL regions on 18 Bos Taurus chromosomes (i.e. BTA 3, 4, 5, 6, 7, 9, 11, 12, 13, 14, 16, 17, 18, 20, 23, 26, 28, and 29 were detected. There is strong evidence that BTA14 is the key chromosome affecting CWT. Also, BTA20 is the key chromosome for almost all traits measured (WWT, YWT, LMA. Conclusion The application of various additive and dominance SNP models enabled better characterization of SNP inheritance mode for growth and carcass quality traits in Hanwoo, and many of the detected SNPs or QTL had dominance effects, suggesting that dominance be considered for the whole-genome SNPs data and implementation of successive molecular breeding schemes in Hanwoo.

  7. Environmental whole-genome amplification to access microbial populations in contaminated sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, Carl B [Diversa Corporation; Wyborski, Denise L. [Diversa Corporation; Garcia, Joseph A. [Diversa Corporation; Podar, Mircea [ORNL; Chen, Wenqiong [Diversa Corporation; Chang, Sherman H. [Diversa Corporation; Chang, Hwai W. [Diversa Corporation; Watson, David B [ORNL; Brodie, Eoin L. [Lawrence Berkeley National Laboratory (LBNL); Hazen, Terry [Lawrence Berkeley National Laboratory (LBNL); Keller, Martin [ORNL

    2006-05-01

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using {phi}29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2% genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small-subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9% of the sequences had significant similarities to known proteins, and 'clusters of orthologous groups' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  8. Phylogenetics and differentiation of Salmonella Newport lineages by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Guojie Cao

    Full Text Available Salmonella Newport has ranked in the top three Salmonella serotypes associated with foodborne outbreaks from 1995 to 2011 in the United States. In the current study, we selected 26 S. Newport strains isolated from diverse sources and geographic locations and then conducted 454 shotgun pyrosequencing procedures to obtain 16-24 × coverage of high quality draft genomes for each strain. Comparative genomic analysis of 28 S. Newport strains (including 2 reference genomes and 15 outgroup genomes identified more than 140,000 informative SNPs. A resulting phylogenetic tree consisted of four sublineages and indicated that S. Newport had a clear geographic structure. Strains from Asia were divergent from those from the Americas. Our findings demonstrated that analysis using whole genome sequencing data resulted in a more accurate picture of phylogeny compared to that using single genes or small sets of genes. We selected loci around the mutS gene of S. Newport to differentiate distinct lineages, including those between invH and mutS genes at the 3' end of Salmonella Pathogenicity Island 1 (SPI-1, ste fimbrial operon, and Clustered, Regularly Interspaced, Short Palindromic Repeats (CRISPR associated-proteins (cas. These genes in the outgroup genomes held high similarity with either S. Newport Lineage II or III at the same loci. S. Newport Lineages II and III have different evolutionary histories in this region and our data demonstrated genetic flow and homologous recombination events around mutS. The findings suggested that S. Newport Lineages II and III diverged early in the serotype evolution and have evolved largely independently. Moreover, we identified genes that could delineate sublineages within the phylogenetic tree and that could be used as potential biomarkers for trace-back investigations during outbreaks. Thus, whole genome sequencing data enabled us to better understand the genetic background of pathogenicity and evolutionary history of S

  9. Whole genome investigation of a divergent clade of the pathogen Streptococcus suis

    Directory of Open Access Journals (Sweden)

    Abiyad eBaig

    2015-11-01

    Full Text Available Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN and cpn60 did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70, of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species.

  10. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes.

    Science.gov (United States)

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-02-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information.

  11. Practical issues in implementing whole-genome-sequencing in routine diagnostic microbiology.

    Science.gov (United States)

    Rossen, J W A; Friedrich, A W; Moran-Gilad, J

    2018-04-01

    Next generation sequencing (NGS) is increasingly being used in clinical microbiology. Like every new technology adopted in microbiology, the integration of NGS into clinical and routine workflows must be carefully managed. To review the practical aspects of implementing bacterial whole genome sequencing (WGS) in routine diagnostic laboratories. Review of the literature and expert opinion. In this review, we discuss when and how to integrate whole genome sequencing (WGS) in the routine workflow of the clinical laboratory. In addition, as the microbiology laboratories have to adhere to various national and international regulations and criteria for their accreditation, we deliberate on quality control issues for using WGS in microbiology, including the importance of proficiency testing. Furthermore, the current and future place of this technology in the diagnostic hierarchy of microbiology is described as well as the necessity of maintaining backwards compatibility with already established methods. Finally, we speculate on the question of whether WGS can entirely replace routine microbiology in the future and the tension between the fact that most sequencers are designed to process multiple samples in parallel whereas for optimal diagnosis a one-by-one processing of the samples is preferred. Special reference is made to the cost and turnaround time of WGS in diagnostic laboratories. Further development is required to improve the workflow for WGS, in particular to shorten the turnaround time, reduce costs, and streamline downstream data analyses. Only when these processes reach maturity will reliance on WGS for routine patient management and infection control management become feasible, enabling the transformation of clinical microbiology into a genome-based and personalized diagnostic field. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  12. Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs)

    Science.gov (United States)

    Sims, Gregory E.; Kim, Sung-Hou

    2011-01-01

    A whole-genome phylogeny of the Escherichia coli/Shigella group was constructed by using the feature frequency profile (FFP) method. This alignment-free approach uses the frequencies of l-mer features of whole genomes to infer phylogenic distances. We present two phylogenies that accentuate different aspects of E. coli/Shigella genomic evolution: (i) one based on the compositions of all possible features of length l = 24 (∼8.4 million features), which are likely to reveal the phenetic grouping and relationship among the organisms and (ii) the other based on the compositions of core features with low frequency and low variability (∼0.56 million features), which account for ∼69% of all commonly shared features among 38 taxa examined and are likely to have genome-wide lineal evolutionary signal. Shigella appears as a single clade when all possible features are used without filtering of noncore features. However, results using core features show that Shigella consists of at least two distantly related subclades, implying that the subclades evolved into a single clade because of a high degree of convergence influenced by mobile genetic elements and niche adaptation. In both FFP trees, the basal group of the E. coli/Shigella phylogeny is the B2 phylogroup, which contains primarily uropathogenic strains, suggesting that the E. coli/Shigella ancestor was likely a facultative or opportunistic pathogen. The extant commensal strains diverged relatively late and appear to be the result of reductive evolution of genomes. We also identify clade distinguishing features and their associated genomic regions within each phylogroup. Such features may provide useful information for understanding evolution of the groups and for quick diagnostic identification of each phylogroup. PMID:21536867

  13. Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks.

    Science.gov (United States)

    Rusconi, Brigida; Sanjar, Fatemeh; Koenig, Sara S K; Mammel, Mark K; Tarr, Phillip I; Eppinger, Mark

    2016-01-01

    Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and

  14. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  15. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    Science.gov (United States)

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  16. The Whole-Genome and Transcriptome of the Manila Clam (Ruditapes philippinarum).

    Science.gov (United States)

    Mun, Seyoung; Kim, Yun-Ji; Markkandan, Kesavan; Shin, Wonseok; Oh, Sumin; Woo, Jiyoung; Yoo, Jongsu; An, Hyesuck; Han, Kyudong

    2017-06-01

    The manila clam, Ruditapes philippinarum, is an important bivalve species in worldwide aquaculture including Korea. The aquaculture production of R. philippinarum is under threat from diverse environmental factors including viruses, microorganisms, parasites, and water conditions with subsequently declining production. In spite of its importance as a marine resource, the reference genome of R. philippinarum for comprehensive genetic studies is largely unexplored. Here, we report the de novo whole-genome and transcriptome assembly of R. philippinarum across three different tissues (foot, gill, and adductor muscle), and provide the basic data for advanced studies in selective breeding and disease control in order to obtain successful aquaculture systems. An approximately 2.56 Gb high quality whole-genome was assembled with various library construction methods. A total of 108,034 protein coding gene models were predicted and repetitive elements including simple sequence repeats and noncoding RNAs were identified to further understanding of the genetic background of R. philippinarum for genomics-assisted breeding. Comparative analysis with the bivalve marine invertebrates uncover that the gene family related to complement C1q was enriched. Furthermore, we performed transcriptome analysis with three different tissues in order to support genome annotation and then identified 41,275 transcripts which were annotated. The R. philippinarum genome resource will markedly advance a wide range of potential genetic studies, a reference genome for comparative analysis of bivalve species and unraveling mechanisms of biological processes in molluscs. We believe that the R. philippinarum genome will serve as an initial platform for breeding better-quality clams using a genomic approach. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. Automation of the linear array HPV genotyping test and its application for routine typing of human papillomaviruses in cervical specimens of women without cytological abnormalities in Switzerland.

    Science.gov (United States)

    Dobec, Marinko; Bannwart, Fridolin; Kaeppeli, Franz; Cassinotti, Pascal

    2009-05-01

    There is a need for reliable, automated high throughput HPV detection and genotyping methods for pre- and post-prophylactic vaccine intervention analyses. To optimize the linear array (LA) HPV genotyping test (Roche Diagnostics, Rotkreuz) in regard to possible automation steps for the routine laboratory diagnosis of HPV infections and to analyze the HPV genotype distribution in cervical specimens of women without cytological abnormalities in Switzerland. 680 cervical cell specimens with normal cytology, obtained from women undergoing routine cervical screening by liquid-based Pap smear, were analyzed by the LA HPV genotyping test for HPV-DNA. The automation of the LA HPV genotyping test resulted in a total hands-on time reduction of 255 min (from 480 to 225 min; 53%). Any of 37 HPV genotypes were detected in 117 (17.2%) and high-risk (HR) HPV in 55 (8.1%) of 680 women with normal cytology. The highest prevalence of any HPV (28.1%) and HR-HPV (15.1%) was observed in age-group 21-30 and showed a continuous decrease in older age-groups. The most common HR-HPV genotypes were HPV-16 (12%), HPV-31 (9.4%), HPV-52 (6%), HPV-51 (5.1%), HPV-45 (4.3%), HPV-58 (4.3%) and HPV-59 (4.3%). The optimization and automation of the LA HPV genotyping test makes it suited for high throughput HPV detection and typing. The epidemiological data provides information about distribution of HPV genotypes in women without cytological abnormalities in Switzerland and may be important for determining the future impact of vaccines and potential changes in the country's epidemiological HPV profile.

  18. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Ahrenfeldt, Johanne; Bellod Cisneros, Jose Luis

    2016-01-01

    and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https://cge.cbs.dtu.dk/services...... and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services...

  19. Whole-genome typing and characterization of blaVIM19-harbouring ST383 Klebsiella pneumoniae by PFGE, whole-genome mapping and WGS.

    Science.gov (United States)

    Sabirova, Julia S; Xavier, Basil Britto; Coppens, Jasmine; Zarkotou, Olympia; Lammens, Christine; Janssens, Lore; Burggrave, Ronald; Wagner, Trevor; Goossens, Herman; Malhotra-Kumar, Surbhi

    2016-06-01

    We utilized whole-genome mapping (WGM) and WGS to characterize 12 clinical carbapenem-resistant Klebsiella pneumoniae strains (TGH1-TGH12). All strains were screened for carbapenemase genes by PCR, and typed by MLST, PFGE (XbaI) and WGM (AflII) (OpGen, USA). WGS (Illumina) was performed on TGH8 and TGH10. Reads were de novo assembled and annotated [SPAdes, Rapid Annotation Subsystem Technology (RAST)]. Contigs were aligned directly, and after in silico AflII restriction, with corresponding WGMs (MapSolver, OpGen; BioNumerics, Applied Maths). All 12 strains were ST383. Of the 12 strains, 11 were carbapenem resistant, 7 harboured blaKPC-2 and 11 harboured blaVIM-19. Varying the parameters for assigning WGM clusters showed that these were comparable to STs and to the eight PFGE types or subtypes (difference of three or more bands). A 95% similarity coefficient assigned all 12 WGMs to a single cluster, whereas a 99% similarity coefficient (or ≥10 unmatched-fragment difference) assigned the 12 WGMs to eight (sub)clusters. Based on a difference of three or more bands between PFGE profiles, the Simpson's diversity indices (SDIs) of WGM (0.94, Jackknife pseudo-values CI: 0.883-0.996) and PFGE (0.93, Jackknife pseudo-values CI: 0.828-1.000) were similar (P = 0.649). However, the discriminatory power of WGM was significantly higher (SDI: 0.94, Jackknife pseudo-values CI: 0.883-0.996) than that of PFGE profiles typed on a difference of seven or more bands (SDI: 0.53, Jackknife pseudo-values CI: 0.212-0.849) (P = 0.007). This study demonstrates the application of WGM to understanding the epidemiology of hospital-associated K. pneumoniae. Utilizing a combination of WGM and WGS, we also present here the first longitudinal genomic characterization of the highly dynamic carbapenem-resistant ST383 K. pneumoniae clone that is rapidly gaining importance in Europe. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial

  20. Evaluation of SNP Data from the Malus Infinium Array Identifies Challenges for Genetic Analysis of Complex Genomes of Polyploid Origin.

    Directory of Open Access Journals (Sweden)

    Michela Troggio

    Full Text Available High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432, but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the 'Golden Delicious' genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies.

  1. Efficient oligonucleotide probe selection for pan-genomic tiling arrays

    Directory of Open Access Journals (Sweden)

    Zhang Wei

    2009-09-01

    Full Text Available Abstract Background Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome. Results This paper presents a new probe selection algorithm (PanArray that can tile multiple whole genomes using a minimal number of probes. Unlike arrays built on clustered gene families, PanArray uses an unbiased, probe-centric approach that does not rely on annotations, gene clustering, or multi-alignments. Instead, probes are evenly tiled across all sequences of the pan-genome at a consistent level of coverage. To minimize the required number of probes, probes conserved across multiple strains in the pan-genome are selected first, and additional probes are used only where necessary to span polymorphic regions of the genome. The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage. Conclusion PanArray is an oligonucleotide probe selection algorithm for tiling multiple genome sequences using a minimal number of probes. It is capable of fully tiling all genomes of a species on

  2. Single-nucleotide polymorphism discovery in Leptographium longiclavatum, a mountain pine beetle-associated symbiotic fungus, using whole-genome resequencing.

    Science.gov (United States)

    Ojeda, Dario I; Dhillon, Braham; Tsui, Clement K M; Hamelin, Richard C

    2014-03-01

    Single-nucleotide polymorphisms (SNPs) are rapidly becoming the standard markers in population genomics studies; however, their use in nonmodel organisms is limited due to the lack of cost-effective approaches to uncover genome-wide variation, and the large number of individuals needed in the screening process to reduce ascertainment bias. To discover SNPs for population genomics studies in the fungal symbionts of the mountain pine beetle (MPB), we developed a road map to discover SNPs and to produce a genotyping platform. We undertook a whole-genome sequencing approach of Leptographium longiclavatum in combination with available genomics resources of another MPB symbiont, Grosmannia clavigera. We sequenced 71 individuals pooled into four groups using the Illumina sequencing technology. We generated between 27 and 30 million reads of 75 bp that resulted in a total of 1, 181 contigs longer than 2 kb and an assembled genome size of 28.9 Mb (N50 = 48 kb, average depth = 125x). A total of 9052 proteins were annotated, and between 9531 and 17,266 SNPs were identified in the four pools. A subset of 206 genes (containing 574 SNPs, 11% false positives) was used to develop a genotyping platform for this species. Using this roadmap, we developed a genotyping assay with a total of 147 SNPs located in 121 genes using the Illumina(®) Sequenom iPLEX Gold. Our preliminary genotyping (success rate = 85%) of 304 individuals from 36 populations supports the utility of this approach for population genomics studies in other MPB fungal symbionts and other fungal nonmodel species. © 2013 John Wiley & Sons Ltd.

  3. Whole genome sequencing reveals complex evolution patterns of multidrug-resistant Mycobacterium tuberculosis Beijing strains in patients.

    Directory of Open Access Journals (Sweden)

    Matthias Merker

    Full Text Available Multidrug-resistant (MDR Mycobacterium tuberculosis complex (MTBC strains represent a major threat for tuberculosis (TB control. Treatment of MDR-TB patients is long and less effective, resulting in a significant number of treatment failures. The development of further resistances leads to extensively drug-resistant (XDR variants. However, data on the individual reasons for treatment failure, e.g. an induced mutational burst, and on the evolution of bacteria in the patient are only sparsely available. To address this question, we investigated the intra-patient evolution of serial MTBC isolates obtained from three MDR-TB patients undergoing longitudinal treatment, finally leading to XDR-TB. Sequential isolates displayed identical IS6110 fingerprint patterns, suggesting the absence of exogenous re-infection. We utilized whole genome sequencing (WGS to screen for variations in three isolates from Patient A and four isolates from Patient B and C, respectively. Acquired polymorphisms were subsequently validated in up to 15 serial isolates by Sanger sequencing. We determined eight (Patient A and nine (Patient B polymorphisms, which occurred in a stepwise manner during the course of the therapy and were linked to resistance or a potential compensatory mechanism. For both patients, our analysis revealed the long-term co-existence of clonal subpopulations that displayed different drug resistance allele combinations. Out of these, the most resistant clone was fixed in the population. In contrast, baseline and follow-up isolates of Patient C were distinguished each by eleven unique polymorphisms, indicating an exogenous re-infection with an XDR strain not detected by IS6110 RFLP typing. Our study demonstrates that intra-patient microevolution of MDR-MTBC strains under longitudinal treatment is more complex than previously anticipated. However, a mutator phenotype was not detected. The presence of different subpopulations might confound phenotypic and

  4. Whole genome sequencing-based characterization of extensively drug resistant (XDR) strains of Mycobacterium tuberculosis from Pakistan

    KAUST Repository

    Hasan, Zahra; Ali, Asho; McNerney, Ruth; Mallard, Kim; Hill-Cawthorne, Grant A.; Coll, Francesc; Nair, Mridul; Pain, Arnab; Clark, Taane G.; Hasan, Rumina

    2015-01-01

    Objectives: The global increase in drug resistance in Mycobacterium tuberculosis (MTB) strains increases the focus on improved molecular diagnostics for MTB. Extensively drug-resistant (XDR) - TB is caused by MTB strains resistant to rifampicin, isoniazid, fluoroquinolone and aminoglycoside antibiotics. Resistance to anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs), in particular MTB genes. However, there is regional variation between MTB lineages and the SNPs associated with resistance. Therefore, there is a need to identify common resistance conferring SNPs so that effective molecular-based diagnostic tests for MTB can be developed. This study investigated used whole genome sequencing (WGS) to characterize 37 XDR MTB isolates from Pakistan and investigated SNPs related to drug resistance. Methods: XDR-TB strains were selected. DNA was extracted from MTB strains, and samples underwent WGS with 76-base-paired end fragment sizes using Illumina paired end HiSeq2000 technology. Raw sequence data were mapped uniquely to H37Rv reference genome. The mappings allowed SNPs and small indels to be called using SAMtools/BCFtools. Results: This study found that in all XDR strains, rifampicin resistance was attributable to SNPs in the rpoB RDR region. Isoniazid resistance-associated mutations were primarily related to katG codon 315 followed by inhA S94A. Fluoroquinolone resistance was attributable to gyrA 91-94 codons in most strains, while one did not have SNPs in either gyrA or gyrB. Aminoglycoside resistance was mostly associated with SNPs in rrs, except in 6 strains. Ethambutol resistant strains had embB codon 306 mutations, but many strains did not have this present. The SNPs were compared with those present in commercial assays such as LiPA Hain MDRTBsl, and the sensitivity of the assays for these strains was evaluated. Conclusions: If common drug resistance associated with SNPs evaluated the concordance between phenotypic and

  5. Whole genome sequencing-based characterization of extensively drug resistant (XDR) strains of Mycobacterium tuberculosis from Pakistan

    KAUST Repository

    Hasan, Zahra

    2015-03-01

    Objectives: The global increase in drug resistance in Mycobacterium tuberculosis (MTB) strains increases the focus on improved molecular diagnostics for MTB. Extensively drug-resistant (XDR) - TB is caused by MTB strains resistant to rifampicin, isoniazid, fluoroquinolone and aminoglycoside antibiotics. Resistance to anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs), in particular MTB genes. However, there is regional variation between MTB lineages and the SNPs associated with resistance. Therefore, there is a need to identify common resistance conferring SNPs so that effective molecular-based diagnostic tests for MTB can be developed. This study investigated used whole genome sequencing (WGS) to characterize 37 XDR MTB isolates from Pakistan and investigated SNPs related to drug resistance. Methods: XDR-TB strains were selected. DNA was extracted from MTB strains, and samples underwent WGS with 76-base-paired end fragment sizes using Illumina paired end HiSeq2000 technology. Raw sequence data were mapped uniquely to H37Rv reference genome. The mappings allowed SNPs and small indels to be called using SAMtools/BCFtools. Results: This study found that in all XDR strains, rifampicin resistance was attributable to SNPs in the rpoB RDR region. Isoniazid resistance-associated mutations were primarily related to katG codon 315 followed by inhA S94A. Fluoroquinolone resistance was attributable to gyrA 91-94 codons in most strains, while one did not have SNPs in either gyrA or gyrB. Aminoglycoside resistance was mostly associated with SNPs in rrs, except in 6 strains. Ethambutol resistant strains had embB codon 306 mutations, but many strains did not have this present. The SNPs were compared with those present in commercial assays such as LiPA Hain MDRTBsl, and the sensitivity of the assays for these strains was evaluated. Conclusions: If common drug resistance associated with SNPs evaluated the concordance between phenotypic and

  6. Development of novel InDel markers and genetic diversity in Chenopodium quinoa through whole-genome re-sequencing.

    Science.gov (United States)

    Zhang, Tifu; Gu, Minfeng; Liu, Yuhe; Lv, Yuanda; Zhou, Ling; Lu, Haiyan; Liang, Shuaiqiang; Bao, Huabin; Zhao, Han

    2017-09-05

    Quinoa (Chenopodium quinoa Willd.) is a balanced nutritional crop, but its breeding improvement has been limited by the lack of information on its genetics and genomics. Therefore, it is necessary to obtain knowledge on genomic variation, population structure, and genetic diversity and to develop novel Insertion/Deletion (InDel) markers for quinoa by whole-genome re-sequencing. We re-sequenced 11 quinoa accessions and obtained a coverage depth between approximately 7× to 23× the quinoa genome. Based on the 1453-megabase (Mb) assembly from the reference accession Riobamba, 8,441,022 filtered bi-allelic single nucleotide polymorphisms (SNPs) and 842,783 filtered InDels were identified, with an estimated SNP and InDel density of 5.81 and 0.58 per kilobase (kb). From the genomic InDel variations, 85 dimorphic InDel markers were newly developed and validated. Together with the 62 simple sequence repeat (SSR) markers reported, a total of 147 markers were used for genotyping the 129 quinoa accessions. Molecular grouping analysis showed classification into two major groups, the Andean highland (composed of the northern and southern highland subgroups) and Chilean coastal, based on combined STRUCTURE, phylogenetic tree and PCA (Principle Component Analysis) analyses. Further analysis of the genetic diversity exhibited a decreasing tendency from the Chilean coast group to the Andean highland group, and the gene flow between subgroups was more frequent than that between the two subgroups and the Chilean coastal group. The majority of the variations (approximately 70%) were found through an analysis of molecular variation (AMOVA) due to the diversity between the groups. This was congruent with the observation of a highly significant F ST value (0.705) between the groups, demonstrating significant genetic differentiation between the Andean highland type of quinoa and the Chilean coastal type. Moreover, a core set of 16 quinoa germplasms that capture all 362 alleles was

  7. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections.

    Directory of Open Access Journals (Sweden)

    Pimlapas Leekitcharoenphon

    Full Text Available Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections.

  8. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Thorup Nielsen, Mette

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely...

  9. Direct DNA Extraction from Mycobacterium tuberculosis Frozen Stocks as a Reculture-Independent Approach to Whole-Genome Sequencing

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Zallet, J; Lillebaek, T

    2015-01-01

    Culturing before DNA extraction represents a major time-consuming step in whole-genome sequencing of slow-growing bacteria, such as Mycobacterium tuberculosis. We report a workflow to extract DNA from frozen isolates without reculturing. Prepared libraries and sequence data were comparable...... with results from recultured aliquots of the same stocks....

  10. A Site Specific Model And Analysis Of The Neutral Somatic Mutation Rate In Whole-Genome Cancer Data

    DEFF Research Database (Denmark)

    Bertl, Johanna; Guo, Qianyun; Rasmussen, Malene Juul

    2017-01-01

    Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation ra...

  11. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland.

    Science.gov (United States)

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten; Lefort, François

    2016-10-06

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture. Copyright © 2016 Crovadore et al.

  12. Whole-genome amplified DNA from stored dried blood spots is reliable in high resolution melting curve and sequencing analysis

    DEFF Research Database (Denmark)

    Winkel, Bo G; Hollegaard, Mads V; Olesen, Morten S

    2011-01-01

    BACKGROUND: The use of dried blood spots (DBS) samples in genomic workup has been limited by the relative low amounts of genomic DNA (gDNA) they contain. It remains to be proven that whole genome amplified DNA (wgaDNA) from stored DBS samples, constitutes a reliable alternative to gDNA.We wanted...

  13. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...

  14. Whole-genome pyrosequencing of an epidemic multidrug-resistant Acinetobacter baumannii strain belonging to the European clone II group

    DEFF Research Database (Denmark)

    Iacono, M.; Villa, L.; Fortini, D.

    2008-01-01

    The whole-genome sequence of an epidemic, multidrug-resistant Acinetobacter baumannii strain (strain ACICU) belonging to the European clone II group and carrying the plasmid-mediated bla(OXA-58) carbapenem resistance gene was determined. The A. baumannii ACICU genome was compared with the genomes...

  15. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus.

    Science.gov (United States)

    Lee, Yookyung; Lim, Sooyeon; Rhee, Moon-Soo; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-03-01

    Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000.

  16. Comparing Whole-Genome Sequencing with Sanger Sequencing for spa Typing of Methicillin-Resistant Staphylococcus aureus

    DEFF Research Database (Denmark)

    Bartels, Mette Damkjaer; Petersen, Andreas; Worning, Peder

    2014-01-01

    spa typing of methicillin-resistant Staphylococcus aureus (MRSA) has traditionally been done by PCR amplification and Sanger sequencing of the spa repeat region. At Hvidovre Hospital, Denmark, whole-genome sequencing (WGS) of all MRSA isolates has been performed routinely since January 2013, and ...

  17. Whole-genome sequencing reveals the mechanisms for evolution of streptomycin resistance in Lactobacillus plantarum.

    Science.gov (United States)

    Zhang, Fuxin; Gao, Jiayuan; Wang, Bini; Huo, Dongxue; Wang, Zhaoxia; Zhang, Jiachao; Shao, Yuyu

    2018-04-01

    In this research, we investigated the evolution of streptomycin resistance in Lactobacillus plantarum ATCC14917, which was passaged in medium containing a gradually increasing concentration of streptomycin. After 25 d, the minimum inhibitory concentration (MIC) of L. plantarum ATCC14917 had reached 131,072 µg/mL, which was 8,192-fold higher than the MIC of the original parent isolate. The highly resistant L. plantarum ATCC14917 isolate was then passaged in antibiotic-free medium to determine the stability of resistance. The MIC value of the L. plantarum ATCC14917 isolate decreased to 2,048 µg/mL after 35 d but remained constant thereafter, indicating that resistance was irreversible even in the absence of selection pressure. Whole-genome sequencing of parent isolates, control isolates, and isolates following passage was used to study the resistance mechanism of L. plantarum ATCC14917 to streptomycin and adaptation in the presence and absence of selection pressure. Five mutated genes (single nucleotide polymorphisms and structural variants) were verified in highly resistant L. plantarum ATCC14917 isolates, which were related to ribosomal protein S12, LPXTG-motif cell wall anchor domain protein, LrgA family protein, Ser/Thr phosphatase family protein, and a hypothetical protein that may correlate with resistance to streptomycin. After passage in streptomycin-free medium, only the mutant gene encoding ribosomal protein S12 remained; the other 4 mutant genes had reverted to the wild type as found in the parent isolate. Although the MIC value of L. plantarum ATCC14917 was reduced in the absence of selection pressure, it remained 128-fold higher than the MIC value of the parent isolate, indicating that ribosomal protein S12 may play an important role in streptomycin resistance. Using the mobile elements database, we demonstrated that streptomycin resistance-related genes in L. plantarum ATCC14917 were not located on mobile elements. This research offers a way of

  18. Whole-genome gene expression profiling of formalin-fixed, paraffin-embedded tissue samples.

    Directory of Open Access Journals (Sweden)

    Craig April

    2009-12-01

    Full Text Available We have developed a gene expression assay (Whole-Genome DASL, capable of generating whole-genome gene expression profiles from degraded samples such as formalin-fixed, paraffin-embedded (FFPE specimens.We demonstrated a similar level of sensitivity in gene detection between matched fresh-frozen (FF and FFPE samples, with the number and overlap of probes detected in the FFPE samples being approximately 88% and 95% of that in the corresponding FF samples, respectively; 74% of the differentially expressed probes overlapped between the FF and FFPE pairs. The WG-DASL assay is also able to detect 1.3-1.5 and 1.5-2 -fold changes in intact and FFPE samples, respectively. The dynamic range for the assay is approximately 3 logs. Comparing the WG-DASL assay with an in vitro transcription-based labeling method yielded fold-change correlations of R(2 approximately 0.83, while fold-change comparisons with quantitative RT-PCR assays yielded R(2 approximately 0.86 and R(2 approximately 0.55 for intact and FFPE samples, respectively. Additionally, the WG-DASL assay yielded high self-correlations (R(2>0.98 with low intact RNA inputs ranging from 1 ng to 100 ng; reproducible expression profiles were also obtained with 250 pg total RNA (R(2 approximately 0.92, with approximately 71% of the probes detected in 100 ng total RNA also detected at the 250 pg level. When FFPE samples were assayed, 1 ng total RNA yielded self-correlations of R(2 approximately 0.80, while still maintaining a correlation of R(2 approximately 0.75 with standard FFPE inputs (200 ng.Taken together, these results show that WG-DASL assay provides a reliable platform for genome-wide expression profiling in archived materials. It also possesses utility within clinical settings where only limited quantities of samples may be available (e.g. microdissected material or when minimally invasive procedures are performed (e.g. biopsied specimens.

  19. Genotyping using whole-genome sequencing is a realistic alternative to surveillance based on phenotypic antimicrobial susceptibility testing

    DEFF Research Database (Denmark)

    Zankari, Ea; Hasman, Henrik; Kaas, Rolf Sommer

    2013-01-01

    -genome sequencing (WGS) may soon be within reach even for routine surveillance and clinical diagnostics. The aim of this study was to evaluate WGS as a routine tool for surveillance of antimicrobial resistance compared with current phenotypic procedures. Methods: Antimicrobial susceptibility tests were performed...... to the categorizing of isolates as resistant and 2569 as susceptible. Seven cases of disagreement between tested and predicted susceptibility were observed, six of which were related to spectinomycin resistance in Escherichia coli. Correlation between MLST type and resistance profiles was only observed in Salmonella...

  20. Digital Droplet Multiple Displacement Amplification (ddMDA for Whole Genome Sequencing of Limited DNA Samples.

    Directory of Open Access Journals (Sweden)

    Minsoung Rhee

    Full Text Available Multiple displacement amplification (MDA is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently, the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet, ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology.

  1. Whole genome sequencing reveals a de novo SHANK3 mutation in familial autism spectrum disorder.

    Directory of Open Access Journals (Sweden)

    Sergio I Nemirovsky

    Full Text Available Clinical genomics promise to be especially suitable for the study of etiologically heterogeneous conditions such as Autism Spectrum Disorder (ASD. Here we present three siblings with ASD where we evaluated the usefulness of Whole Genome Sequencing (WGS for the diagnostic approach to ASD.We identified a family segregating ASD in three siblings with an unidentified cause. We performed WGS in the three probands and used a state-of-the-art comprehensive bioinformatic analysis pipeline and prioritized the identified variants located in genes likely to be related to ASD. We validated the finding by Sanger sequencing in the probands and their parents.Three male siblings presented a syndrome characterized by severe intellectual disability, absence of language, autism spectrum symptoms and epilepsy with negative family history for mental retardation, language disorders, ASD or other psychiatric disorders. We found germline mosaicism for a heterozygous deletion of a cytosine in the exon 21 of the SHANK3 gene, resulting in a missense sequence of 5 codons followed by a premature stop codon (NM_033517:c.3259_3259delC, p.Ser1088Profs*6.We reported an infrequent form of familial ASD where WGS proved useful in the clinic. We identified a mutation in SHANK3 that underscores its relevance in Autism Spectrum Disorder.

  2. Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Margaret Staton

    Full Text Available Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence.

  3. Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic

    Directory of Open Access Journals (Sweden)

    Samantha B. Foley

    2015-01-01

    Full Text Available Despite the potential of whole-genome sequencing (WGS to improve patient diagnosis and care, the empirical value of WGS in the cancer genetics clinic is unknown. We performed WGS on members of two cohorts of cancer genetics patients: those with BRCA1/2 mutations (n = 176 and those without (n = 82. Initial analysis of potentially pathogenic variants (PPVs, defined as nonsynonymous variants with allele frequency < 1% in ESP6500 in 163 clinically-relevant genes suggested that WGS will provide useful clinical results. This is despite the fact that a majority of PPVs were novel missense variants likely to be classified as variants of unknown significance (VUS. Furthermore, previously reported pathogenic missense variants did not always associate with their predicted diseases in our patients. This suggests that the clinical use of WGS will require large-scale efforts to consolidate WGS and patient data to improve accuracy of interpretation of rare variants. While loss-of-function (LoF variants represented only a small fraction of PPVs, WGS identified additional cancer risk LoF PPVs in patients with known BRCA1/2 mutations and led to cancer risk diagnoses in 21% of non-BRCA cancer genetics patients after expanding our analysis to 3209 ClinVar genes. These data illustrate how WGS can be used to improve our ability to discover patients' cancer genetic risks.

  4. Whole-genome analysis of a patient with early-stage small-cell lung cancer.

    Science.gov (United States)

    Han, J-Y; Lee, Y-S; Kim, B C; Lee, G K; Lee, S; Kim, E-H; Kim, H-M; Bhak, J

    2014-12-01

    We performed whole-genome sequencing (WGS) of a case of early-stage small-cell lung cancer (SCLC) to analyze the genomic features. WGS revealed a lot of single-nucleotide variations (SNVs), small insertion/deletions and chromosomal abnormality. Chromosomes 4p, 5q, 13q, 15q, 17p and 22q contained many block deletions. Especially, copy loss was observed in tumor suppressor genes RB1 and TP53, and copy gain in oncogene hTERT. Somatic mutations were found in TP53 and CREBBP. Novel nonsynonymous (ns) SNVs in C6ORF103 and SLC5A4 genes were also found. Sanger sequencing of the SLC5A4 gene in 23 independent SCLC samples showed another nsSNV in the SLC5A4 gene, indicating that nsSNVs in the SLC5A4 gene are recurrent in SCLC. WGS of an early-stage SCLC identified novel recurrent mutations and validated known variations, including copy number variations. These findings provide insight into the genomic landscape contributing to SCLC development.

  5. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    Science.gov (United States)

    Yuen, Ryan KC; Merico, Daniele; Bookman, Matt; Howe, Jennifer L; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D’Abate, Lia; Chan, Ada JS; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson WL; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W

    2017-01-01

    We are performing whole genome sequencing (WGS) of families with Autism Spectrum Disorder (ASD) to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved. Here, we report WGS of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible in a cloud platform, and through an internet portal with controlled access. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertion/deletions (indels) or copy number variations (CNVs) per ASD subject. We identified 18 new candidate ASD-risk genes such as MED13 and PHF3, and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (p=6×10−4). In 294/2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried CNV/chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD. PMID:28263302

  6. Supersize me: how whole-genome sequencing and big data are transforming epidemiology.

    Science.gov (United States)

    Kao, Rowland R; Haydon, Daniel T; Lycett, Samantha J; Murcia, Pablo R

    2014-05-01

    In epidemiology, the identification of 'who infected whom' allows us to quantify key characteristics such as incubation periods, heterogeneity in transmission rates, duration of infectiousness, and the existence of high-risk groups. Although invaluable, the existence of many plausible infection pathways makes this difficult, and epidemiological contact tracing either uncertain, logistically prohibitive, or both. The recent advent of next-generation sequencing technology allows the identification of traceable differences in the pathogen genome that are transforming our ability to understand high-resolution disease transmission, sometimes even down to the host-to-host scale. We review recent examples of the use of pathogen whole-genome sequencing for the purpose of forensic tracing of transmission pathways, focusing on the particular problems where evolutionary dynamics must be supplemented by epidemiological information on the most likely timing of events as well as possible transmission pathways. We also discuss potential pitfalls in the over-interpretation of these data, and highlight the manner in which a confluence of this technology with sophisticated mathematical and statistical approaches has the potential to produce a paradigm shift in our understanding of infectious disease transmission and control. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. Mechanisms of Linezolid Resistance among Coagulase-Negative Staphylococci Determined by Whole-Genome Sequencing

    Science.gov (United States)

    Tewhey, Ryan; Gu, Bing; Kelesidis, Theodoros; Charlton, Carmen; Bobenchik, April; Hindler, Janet; Schork, Nicholas J.

    2014-01-01

    ABSTRACT Linezolid resistance is uncommon among staphylococci, but approximately 2% of clinical isolates of coagulase-negative staphylococci (CoNS) may exhibit resistance to linezolid (MIC, ≥8 µg/ml). We performed whole-genome sequencing (WGS) to characterize the resistance mechanisms and genetic backgrounds of 28 linezolid-resistant CoNS (21 Staphylococcus epidermidis isolates and 7 Staphylococcus haemolyticus isolates) obtained from blood cultures at a large teaching health system in California between 2007 and 2012. The following well-characterized mutations associated with linezolid resistance were identified in the 23S rRNA: G2576U, G2447U, and U2504A, along with the mutation C2534U. Mutations in the L3 and L4 riboproteins, at sites previously associated with linezolid resistance, were also identified in 20 isolates. The majority of isolates harbored more than one mutation in the 23S rRNA and L3 and L4 genes. In addition, the cfr methylase gene was found in almost half (48%) of S. epidermidis isolates. cfr had been only rarely identified in staphylococci in the United States prior to this study. Isolates of the same sequence type were identified with unique mutations associated with linezolid resistance, suggesting independent acquisition of linezolid resistance in each isolate. PMID:24915435

  8. A synergism between adaptive effects and evolvability drives whole genome duplication to fixation.

    Science.gov (United States)

    Cuypers, Thomas D; Hogeweg, Paulien

    2014-04-01

    Whole genome duplication has shaped eukaryotic evolutionary history and has been associated with drastic environmental change and species radiation. While the most common fate of WGD duplicates is a return to single copy, retained duplicates have been found enriched for highly interacting genes. This pattern has been explained by a neutral process of subfunctionalization and more recently, dosage balance selection. However, much about the relationship between environmental change, WGD and adaptation remains unknown. Here, we study the duplicate retention pattern postWGD, by letting virtual cells adapt to environmental changes. The virtual cells have structured genomes that encode a regulatory network and simple metabolism. Populations are under selection for homeostasis and evolve by point mutations, small indels and WGD. After populations had initially adapted fully to fluctuating resource conditions re-adaptation to a broad range of novel environments was studied by tracking mutations in the line of descent. WGD was established in a minority (≈30%) of lineages, yet, these were significantly more successful at re-adaptation. Unexpectedly, WGD lineages conserved more seemingly redundant genes, yet had higher per gene mutation rates. While WGD duplicates of all functional classes were significantly over-retained compared to a model of neutral losses, duplicate retention was clearly biased towards highly connected TFs. Importantly, no subfunctionalization occurred in conserved pairs, strongly suggesting that dosage balance shaped retention. Meanwhile, singles diverged significantly. WGD, therefore, is a powerful mechanism to cope with environmental change, allowing conservation of a core machinery, while adapting the peripheral network to accommodate change.

  9. Prokaryotic Phylogenies Inferred from Whole-Genome Sequence and Annotation Data

    Directory of Open Access Journals (Sweden)

    Wei Du

    2013-01-01

    Full Text Available Phylogenetic trees are used to represent the evolutionary relationship among various groups of species. In this paper, a novel method for inferring prokaryotic phylogenies using multiple genomic information is proposed. The method is called CGCPhy and based on the distance matrix of orthologous gene clusters between whole-genome pairs. CGCPhy comprises four main steps. First, orthologous genes are determined by sequence similarity, genomic function, and genomic structure information. Second, genes involving potential HGT events are eliminated, since such genes are considered to be the highly conserved genes across different species and the genes located on fragments with abnormal genome barcode. Third, we calculate the distance of the orthologous gene clusters between each genome pair in terms of the number of orthologous genes in conserved clusters. Finally, the neighbor-joining method is employed to construct phylogenetic trees across different species. CGCPhy has been examined on different datasets from 617 complete single-chromosome prokaryotic genomes and achieved applicative accuracies on different species sets in agreement with Bergey's taxonomy in quartet topologies. Simulation results show that CGCPhy achieves high average accuracy and has a low standard deviation on different datasets, so it has an applicative potential for phylogenetic analysis.

  10. Molecular analysis of single oocyst of Eimeria by whole genome amplification (WGA) based nested PCR.

    Science.gov (United States)

    Wang, Yunzhou; Tao, Geru; Cui, Yujuan; Lv, Qiyao; Xie, Li; Li, Yuan; Suo, Xun; Qin, Yinghe; Xiao, Lihua; Liu, Xianyong

    2014-09-01

    PCR-based molecular tools are widely used for the identification and characterization of protozoa. Here we report the molecular analysis of Eimeria species using combined methods of whole genome amplification (WGA) and nested PCR. Single oocyst of Eimeria stiedai or Eimeriamedia was directly used for random amplification of the genomic DNA with either primer extension preamplification (PEP) or multiple displacement amplification (MDA), and then the WGA product was used as template in nested PCR with species-specific primers for ITS-1, 18S rDNA and 23S rDNA of E. stiedai and E. media. WGA-based PCR was successful for the amplification of these genes from single oocyst. For the species identification of single oocyst isolated from mixed E. stiedai or E. media, the results from WGA-based PCR were exactly in accordance with those from morphological identification, suggesting the availability of this method in molecular analysis of eimerian parasites at the single oocyst level. WGA-based PCR method can also be applied for the identification and genetic characterization of other protists. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Insights into three whole-genome duplications gleaned from the Paramecium caudatum genome sequence.

    Science.gov (United States)

    McGrath, Casey L; Gout, Jean-Francois; Doak, Thomas G; Yanagi, Akira; Lynch, Michael

    2014-08-01

    Paramecium has long been a model eukaryote. The sequence of the Paramecium tetraurelia genome reveals a history of three successive whole-genome duplications (WGDs), and the sequences of P. biaurelia and P. sexaurelia suggest that these WGDs are shared by all members of the aurelia species complex. Here, we present the genome sequence of P. caudatum, a species closely related to the P. aurelia species group. P. caudatum shares only the most ancient of the three WGDs with the aurelia complex. We found that P. caudatum maintains twice as many paralogs from this early event as the P. aurelia species, suggesting that post-WGD gene retention is influenced by subsequent WGDs and supporting the importance of selection for dosage in gene retention. The availability of P. caudatum as an outgroup allows an expanded analysis of the aurelia intermediate and recent WGD events. Both the Guanine+Cytosine (GC) content and the expression level of preduplication genes are significant predictors of duplicate retention. We find widespread asymmetrical evolution among aurelia paralogs, which is likely caused by gradual pseudogenization rather than by neofunctionalization. Finally, cases of divergent resolution of intermediate WGD duplicates between aurelia species implicate this process acts as an ongoing reinforcement mechanism of reproductive isolation long after a WGD event. Copyright © 2014 by the Genetics Society of America.

  12. Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data.

    Science.gov (United States)

    Al-Nakeeb, Kosai; Petersen, Thomas Nordahl; Sicheritz-Pontén, Thomas

    2017-11-21

    Whole-genome sequencing (WGS) projects provide short read nucleotide sequences from nuclear and possibly organelle DNA depending on the source of origin. Mitochondrial DNA is present in animals and fungi, while plants contain DNA from both mitochondria and chloroplasts. Current techniques for separating organelle reads from nuclear reads in WGS data require full reference or partial seed sequences for assembling. Norgal (de Novo ORGAneLle extractor) avoids this requirement by identifying a high frequency subset of k-mers that are predominantly of mitochondrial origin and performing a de novo assembly on a subset of reads that contains these k-mers. The method was applied to WGS data from a panda, brown algae seaweed, butterfly and filamentous fungus. We were able to extract full circular mitochondrial genomes and obtained sequence identities to the reference sequences in the range from 98.5 to 99.5%. We also assembled the chloroplasts of grape vines and cucumbers using Norgal together with seed-based de novo assemblers. Norgal is a pipeline that can extract and assemble full or partial mitochondrial and chloroplast genomes from WGS short reads without prior knowledge. The program is available at: https://bitbucket.org/kosaidtu/norgal .

  13. Publisher Correction: Whole genome sequencing in psychiatric disorders: the WGSPD consortium.

    Science.gov (United States)

    Sanders, Stephan J; Neale, Benjamin M; Huang, Hailiang; Werling, Donna M; An, Joon-Yong; Dong, Shan; Abecasis, Goncalo; Arguello, P Alexander; Blangero, John; Boehnke, Michael; Daly, Mark J; Eggan, Kevin; Geschwind, Daniel H; Glahn, David C; Goldstein, David B; Gur, Raquel E; Handsaker, Robert E; McCarroll, Steven A; Ophoff, Roel A; Palotie, Aarno; Pato, Carlos N; Sabatti, Chiara; State, Matthew W; Willsey, A Jeremy; Hyman, Steven E; Addington, Anjene M; Lehner, Thomas; Freimer, Nelson B

    2018-03-16

    In the version of this article initially published, the consortium authorship and corresponding authors were not presented correctly. In the PDF and print versions, the Whole Genome Sequencing for Psychiatric Disorders (WGSPD) consortium was missing from the author list at the beginning of the paper, where it should have appeared as the seventh author; it was present in the author list at the end of the paper, but the footnote directing readers to the Supplementary Note for a list of members was missing. In the HTML version, the consortium was listed as the last author instead of as the seventh, and the line directing readers to the Supplementary Note for a list of members appeared at the end of the paper under Author Information but not in association with the consortium name itself. Also, this line stated that both member names and affiliations could be found in the Supplementary Note; in fact, only names are given. In all versions of the paper, the corresponding author symbols were attached to A. Jeremy Willsey, Steven E. Hyman, Anjene M. Addington and Thomas Lehner; they should have been attached, respectively, to Steven E. Hyman, Anjene M. Addington, Thomas Lehner and Nelson B. Freimer. As a result of this shift, the respective contact links in the HTML version did not lead to the indicated individuals. The errors have been corrected in the HTML and PDF versions of the article.

  14. Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications.

    Science.gov (United States)

    Tank, David C; Eastman, Jonathan M; Pennell, Matthew W; Soltis, Pamela S; Soltis, Douglas E; Hinchliff, Cody E; Brown, Joseph W; Sessa, Emily B; Harmon, Luke J

    2015-07-01

    Our growing understanding of the plant tree of life provides a novel opportunity to uncover the major drivers of angiosperm diversity. Using a time-calibrated phylogeny, we characterized hot and cold spots of lineage diversification across the angiosperm tree of life by modeling evolutionary diversification using stepwise AIC (MEDUSA). We also tested the whole-genome duplication (WGD) radiation lag-time model, which postulates that increases in diversification tend to lag behind established WGD events. Diversification rates have been incredibly heterogeneous throughout the evolutionary history of angiosperms and reveal a pattern of 'nested radiations' - increases in net diversification nested within other radiations. This pattern in turn generates a negative relationship between clade age and diversity across both families and orders. We suggest that stochastically changing diversification rates across the phylogeny explain these patterns. Finally, we demonstrate significant statistical support for the WGD radiation lag-time model. Across angiosperms, nested shifts in diversification led to an overall increasing rate of net diversification and declining relative extinction rates through time. These diversification shifts are only rarely perfectly associated with WGD events, but commonly follow them after a lag period. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  15. Whole-genome transcriptional analysis of heavy metal stresses inCaulobacter crescentus

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Ping; Brodie, Eoin L.; Suzuki, Yohey; McAdams, Harley H.; Andersen, Gary L.

    2005-09-21

    The bacterium Caulobacter crescentus and related stalkbacterial species are known for their distinctive ability to live in lownutrient environments, a characteristic of most heavy metal contaminatedsites. Caulobacter crescentus is a model organism for studying cell cycleregulation with well developed genetics. We have identified the pathwaysresponding to heavy metal toxicity in C. crescentus to provide insightsfor possible application of Caulobacter to environmental restoration. Weexposed C. crescentus cells to four heavy metals (chromium, cadmium,selenium and uranium) and analyzed genome wide transcriptional activitiespost exposure using a Affymetrix GeneChip microarray. C. crescentusshowed surprisingly high tolerance to uranium, a possible mechanism forwhich may be formation of extracellular calcium-uranium-phosphateprecipitates. The principal response to these metals was protectionagainst oxidative stress (up-regulation of manganese-dependent superoxidedismutase, sodA). Glutathione S-transferase, thioredoxin, glutaredoxinsand DNA repair enzymes responded most strongly to cadmium and chromate.The cadmium and chromium stress response also focused on reducing theintracellular metal concentration, with multiple efflux pumps employed toremove cadmium while a sulfate transporter was down-regulated to reducenon-specific uptake of chromium. Membrane proteins were also up-regulatedin response to most of the metals tested. A two-component signaltransduction system involved in the uranium response was identified.Several differentially regulated transcripts from regions previously notknown to encode proteins were identified, demonstrating the advantage ofevaluating the transcriptome using whole genome microarrays.

  16. Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

    Science.gov (United States)

    Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

    2016-01-01

    Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335

  17. DNA-based identification of spices: DNA isolation, whole genome amplification, and polymerase chain reaction.

    Science.gov (United States)

    Focke, Felix; Haase, Ilka; Fischer, Markus

    2011-01-26

    Usually spices are identified morphologically using simple methods like magnifying glasses or microscopic instruments. On the other hand, molecular biological methods like the polymerase chain reaction (PCR) enable an accurate and specific detection also in complex matrices. Generally, the origins of spices are plants with diverse genetic backgrounds and relationships. The processing methods used for the production of spices are complex and individual. Consequently, the development of a reliable DNA-based method for spice analysis is a challenging intention. However, once established, this method will be easily adapted to less difficult food matrices. In the current study, several alternative methods for the isolation of DNA from spices have been developed and evaluated in detail with regard to (i) its purity (photometric), (ii) yield (fluorimetric methods), and (iii) its amplifiability (PCR). Whole genome amplification methods were used to preamplify isolates to improve the ratio between amplifiable DNA and inhibiting substances. Specific primer sets were designed, and the PCR conditions were optimized to detect 18 spices selectively. Assays of self-made spice mixtures were performed to proof the applicability of the developed methods.

  18. Whole-Genome Analyses of Korean Native and Holstein Cattle Breeds by Massively Parallel Sequencing

    Science.gov (United States)

    Stothard, Paul; Chung, Won-Hyong; Jeon, Heoyn-Jeong; Miller, Stephen P.; Choi, So-Young; Lee, Jeong-Koo; Yang, Bokyoung; Lee, Kyung-Tai; Han, Kwang-Jin; Kim, Hyeong-Cheol; Jeong, Dongkee; Oh, Jae-Don; Kim, Namshin; Kim, Tae-Hun; Lee, Hak-Kyo; Lee, Sung-Jin

    2014-01-01

    A main goal of cattle genomics is to identify DNA differences that account for variations in economically important traits. In this study, we performed whole-genome analyses of three important cattle breeds in Korea—Hanwoo, Jeju Heugu, and Korean Holstein—using the Illumina HiSeq 2000 sequencing platform. We achieved 25.5-, 29.6-, and 29.5-fold coverage of the Hanwoo, Jeju Heugu, and Korean Holstein genomes, respectively, and identified a total of 10.4 million single nucleotide polymorphisms (SNPs), of which 54.12% were found to be novel. We also detected 1,063,267 insertions–deletions (InDels) across the genomes (78.92% novel). Annotations of the datasets identified a total of 31,503 nonsynonymous SNPs and 859 frameshift InDels that could affect phenotypic variations in traits of interest. Furthermore, genome-wide copy number variation regions (CNVRs) were detected by comparing the Hanwoo, Jeju Heugu, and previously published Chikso genomes against that of Korean Holstein. A total of 992, 284, and 1881 CNVRs, respectively, were detected throughout the genome. Moreover, 53, 65, 45, and 82 putative regions of homozygosity (ROH) were identified in Hanwoo, Jeju Heugu, Chikso, and Korean Holstein respectively. The results of this study provide a valuable foundation for further investigations to dissect the molecular mechanisms underlying variation in economically important traits in cattle and to develop genetic markers for use in cattle breeding. PMID:24992012

  19. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.

    Science.gov (United States)

    Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G

    2012-12-07

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.

  20. Improved acid tolerance of Lactobacillus pentosus by error-prone whole genome amplification.

    Science.gov (United States)

    Ye, Lidan; Zhao, Hua; Li, Zhi; Wu, Jin Chuan

    2013-05-01

    Acid tolerance of Lactobacillus pentosus ATCC 8041 was improved by error-prone amplification of its genomic DNA using random primers and Taq DNA polymerase. The resulting amplification products were transferred into wild-type L. pentosus by electroporation and the transformants were screened for growth on low-pH agar plates. After only one round of mutation, one mutant (MT3) was identified that was able to completely consume 20 g/L of glucose to produce lactic acid at a yield of 95% in 1L MRS medium at pH 3.8 within 36 h, whereas no growth or lactic acid production was observed for the wild-type strain under the same conditions. The acid tolerance of mutant MT3 remained genetically stable for at least 25 subcultures. Therefore, the error-prone whole genome amplification technique is a very powerful tool for improving phenotypes of this lactic acid bacterium and may also be applicable for other microorganisms. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. Single Cell Analysis of Dystrophin and SRY Gene by Using Whole Genome Amplification

    Institute of Scientific and Technical Information of China (English)

    徐晨明; 金帆; 黄荷凤; 陶冶; 叶英辉

    2001-01-01

    Objective To develop a reliable and sensitive method for detection of sex and multiloci of Duchenne muscular dystrophy (DMD) gene in single cell Materials & methods Whole genome of single cell were amplified by using 15-base random primers (primer extension preamplification, PEP), then a small aliquot of PEP product were analyzed by using locus-specific nest PCR amplification. The procedure was evaluated by detection dystrophin exons 8, 17, 19, 44, 45, 48 and human testis-determining gene (SRY)in single lymphocytes from known sources and single blastomeres from the couples with no family history of DMD.Results The amplification efficiency rate of six dystrophin exons from single lymphocytes and single blastomeres were 97. 2% (175/180) and 100% (60/60) respectively.Results of SRY showed that 100% (15/15) amplification in single male-derived lymphocytes and 0% (0/15) amplification in single female-derived lymphocytes. Conclusion The technique of single cell PEP-nest PCR for dystrophin exons 8, 17,19, 44, 45, 48 and SRY is highly specifc. PEP-nest PCR is suitable for Preimplantation genetic diagnosis (PGD) of DMD at single cell level.

  2. A quantitative comparison of single-cell whole genome amplification methods.

    Directory of Open Access Journals (Sweden)

    Charles F A de Bourcy

    Full Text Available Single-cell sequencing is emerging as an important tool for studies of genomic heterogeneity. Whole genome amplification (WGA is a key step in single-cell sequencing workflows and a multitude of methods have been introduced. Here, we compare three state-of-the-art methods on both bulk and single-cell samples of E. coli DNA: Multiple Displacement Amplification (MDA, Multiple Annealing and Looping Based Amplification Cycles (MALBAC, and the PicoPLEX single-cell WGA kit (NEB-WGA. We considered the effects of reaction gain on coverage uniformity, error rates and the level of background contamination. We compared the suitability of the different WGA methods for the detection of copy-number variations, for the detection of single-nucleotide polymorphisms and for de-novo genome assembly. No single method performed best across all criteria and significant differences in characteristics were observed; the choice of which amplifier to use will depend strongly on the details of the type of question being asked in any given experiment.

  3. Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics.

    Science.gov (United States)

    Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko

    2017-07-12

    Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.

  4. A Proposed Clinical Decision Support Architecture Capable of Supporting Whole Genome Sequence Information

    Directory of Open Access Journals (Sweden)

    Brandon M. Welch

    2014-04-01

    Full Text Available Whole genome sequence (WGS information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR. A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1 each component of the architecture; (2 the interaction of the components; and (3 how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine.

  5. A proposed clinical decision support architecture capable of supporting whole genome sequence information.

    Science.gov (United States)

    Welch, Brandon M; Loya, Salvador Rodriguez; Eilbeck, Karen; Kawamoto, Kensaku

    2014-04-04

    Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine.

  6. Whole-genome sequencing identifies recurrent somatic NOTCH2 mutations in splenic marginal zone lymphoma.

    Science.gov (United States)

    Kiel, Mark J; Velusamy, Thirunavukkarasu; Betz, Bryan L; Zhao, Lili; Weigelin, Helmut G; Chiang, Mark Y; Huebner-Chan, David R; Bailey, Nathanael G; Yang, David T; Bhagat, Govind; Miranda, Roberto N; Bahler, David W; Medeiros, L Jeffrey; Lim, Megan S; Elenitoba-Johnson, Kojo S J

    2012-08-27

    Splenic marginal zone lymphoma (SMZL), the most common primary lymphoma of spleen, is poorly understood at the genetic level. In this study, using whole-genome DNA sequencing (WGS) and confirmation by Sanger sequencing, we observed mutations identified in several genes not previously known to be recurrently altered in SMZL. In particular, we identified recurrent somatic gain-of-function mutations in NOTCH2, a gene encoding a protein required for marginal zone B cell development, in 25 of 99 (∼25%) cases of SMZL and in 1 of 19 (∼5%) cases of nonsplenic MZLs. These mutations clustered near the C-terminal proline/glutamate/serine/threonine (PEST)-rich domain, resulting in protein truncation or, rarely, were nonsynonymous substitutions affecting the extracellular heterodimerization domain (HD). NOTCH2 mutations were not present in other B cell lymphomas and leukemias, such as chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL; n = 15), mantle cell lymphoma (MCL; n = 15), low-grade follicular lymphoma (FL; n = 44), hairy cell leukemia (HCL; n = 15), and reactive lymphoid hyperplasia (n = 14). NOTCH2 mutations were associated with adverse clinical outcomes (relapse, histological transformation, and/or death) among SMZL patients (P = 0.002). These results suggest that NOTCH2 mutations play a role in the pathogenesis and progression of SMZL and are associated with a poor prognosis.

  7. Automated whole-genome multiple alignment of rat, mouse, and human

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  8. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  9. Whole Genome Sequence Analysis of Pig Respiratory Bacterial Pathogens with Elevated Minimum Inhibitory Concentrations for Macrolides.

    Science.gov (United States)

    Dayao, Denise Ann Estarez; Seddon, Jennifer M; Gibson, Justine S; Blackall, Patrick J; Turni, Conny

    2016-10-01

    Macrolides are often used to treat and control bacterial pathogens causing respiratory disease in pigs. This study analyzed the whole genome sequences of one clinical isolate of Actinobacillus pleuropneumoniae, Haemophilus parasuis, Pasteurella multocida, and Bordetella bronchiseptica, all isolated from Australian pigs to identify the mechanism underlying the elevated minimum inhibitory concentrations (MICs) for erythromycin, tilmicosin, or tulathromycin. The H. parasuis assembled genome had a nucleotide transition at position 2059 (A to G) in the six copies of the 23S rRNA gene. This mutation has previously been associated with macrolide resistance but this is the first reported mechanism associated with elevated macrolide MICs in H. parasuis. There was no known macrolide resistance mechanism identified in the other three bacterial genomes. However, strA and sul2, aminoglycoside and sulfonamide resistance genes, respectively, were detected in one contiguous sequence (contig 1) of A. pleuropneumoniae assembled genome. This contig was identical to plasmids previously identified in Pasteurellaceae. This study has provided one possible explanation of elevated MICs to macrolides in H. parasuis. Further studies are necessary to clarify the mechanism causing the unexplained macrolide resistance in other Australian pig respiratory pathogens including the role of efflux systems, which were detected in all analyzed genomes.

  10. A human genome-wide library of local phylogeny predictions for whole-genome inference problems

    Directory of Open Access Journals (Sweden)

    Schwartz Russell

    2008-08-01

    Full Text Available Abstract Background Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding. Results In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history. Conclusion Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

  11. The "most wanted" taxa from the human microbiome for whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Anthony A Fodor

    Full Text Available The goal of the Human Microbiome Project (HMP is to generate a comprehensive catalog of human-associated microorganisms including reference genomes representing the most common species. Toward this goal, the HMP has characterized the microbial communities at 18 body habitats in a cohort of over 200 healthy volunteers using 16S rRNA gene (16S sequencing and has generated nearly 1,000 reference genomes from human-associated microorganisms. To determine how well current reference genome collections capture the diversity observed among the healthy microbiome and to guide isolation and future sequencing of microbiome members, we compared the HMP's 16S data sets to several reference 16S collections to create a 'most wanted' list of taxa for sequencing. Our analysis revealed that the diversity of commonly occurring taxa within the HMP cohort microbiome is relatively modest, few novel taxa are represented by these OTUs and many common taxa among HMP volunteers recur across different populations of healthy humans. Taken together, these results suggest that it should be possible to perform whole-genome sequencing on a large fraction of the human microbiome, including the 'most wanted', and that these sequences should serve to support microbiome studies across multiple cohorts. Also, in stark contrast to other taxa, the 'most wanted' organisms are poorly represented among culture collections suggesting that novel culture- and single-cell-based methods will be required to isolate these organisms for sequencing.

  12. Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks

    Science.gov (United States)

    Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.

    2017-05-01

    Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.

  13. The house spider genome reveals an ancient whole-genome duplication during arachnid evolution.

    Science.gov (United States)

    Schwager, Evelyn E; Sharma, Prashant P; Clarke, Thomas; Leite, Daniel J; Wierschin, Torsten; Pechmann, Matthias; Akiyama-Oda, Yasuko; Esposito, Lauren; Bechsgaard, Jesper; Bilde, Trine; Buffry, Alexandra D; Chao, Hsu; Dinh, Huyen; Doddapaneni, HarshaVardhan; Dugan, Shannon; Eibner, Cornelius; Extavour, Cassandra G; Funch, Peter; Garb, Jessica; Gonzalez, Luis B; Gonzalez, Vanessa L; Griffiths-Jones, Sam; Han, Yi; Hayashi, Cheryl; Hilbrant, Maarten; Hughes, Daniel S T; Janssen, Ralf; Lee, Sandra L; Maeso, Ignacio; Murali, Shwetha C; Muzny, Donna M; Nunes da Fonseca, Rodrigo; Paese, Christian L B; Qu, Jiaxin; Ronshaugen, Matthew; Schomburg, Christoph; Schönauer, Anna; Stollewerk, Angelika; Torres-Oliva, Montserrat; Turetzek, Natascha; Vanthournout, Bram; Werren, John H; Wolff, Carsten; Worley, Kim C; Bucher, Gregor; Gibbs, Richard A; Coddington, Jonathan; Oda, Hiroki; Stanke, Mario; Ayoub, Nadia A; Prpic, Nikola-Michael; Flot, Jean-François; Posnien, Nico; Richards, Stephen; McGregor, Alistair P

    2017-07-31

    The duplication of genes can occur through various mechanisms and is thought to make a major contribution to the evolutionary diversification of organisms. There is increasing evidence for a large-scale duplication of genes in some chelicerate lineages including two rounds of whole genome duplication (WGD) in horseshoe crabs. To investigate this further, we sequenced and analyzed the genome of the common house spider Parasteatoda tepidariorum. We found pervasive duplication of both coding and non-coding genes in this spider, including two clusters of Hox genes. Analysis of synteny conservation across the P. tepidariorum genome suggests that there has been an ancient WGD in spiders. Comparison with the genomes of other chelicerates, including that of the newly sequenced bark scorpion Centruroides sculpturatus, suggests that this event occurred in the common ancestor of spiders and scorpions, and is probably independent of the WGDs in horseshoe crabs. Furthermore, characterization of the sequence and expression of the Hox paralogs in P. tepidariorum suggests that many have been subject to neo-functionalization and/or sub-functionalization since their duplication. Our results reveal that spiders and scorpions are likely the descendants of a polyploid ancestor that lived more than 450 MYA. Given the extensive morphological diversity and ecological adaptations found among these animals, rivaling those of vertebrates, our study of the ancient WGD event in Arachnopulmonata provides a new comparative platform to explore common and divergent evolutionary outcomes of polyploidization events across eukaryotes.

  14. Whole genome duplication of intra- and inter-chromosomes in the tomato genome.

    Science.gov (United States)

    Song, Chi; Guo, Juan; Sun, Wei; Wang, Ying

    2012-07-20

    Whole genome duplication (WGD) events have been proven to occur in the evolutionary history of most angiosperms. Tomato is considered a model species of the Solanaceae family. In this study, we describe the details of the evolutionary process of the tomato genome by detecting collinearity blocks and dating the WGD events on the tree of life by combining two different methods: synonymous substitution rates (Ks) and phylogenetic trees. In total, 593 collinearity blocks were discovered out of 12 pseudo-chromosomes constructed. It was evident that chromosome 2 had experienced an intra-chromosomal duplication event. Major inter-chromosomal duplication occurred among all the pseudo-chromosome. We calculated the Ks value of these collinearity blocks. Two peaks of Ks distribution were found, corresponding to two WGD events occurring approximately 36-82 million years ago (MYA) and 148-205 MYA. Additionally, the results of phylogenetic trees suggested that the more recent WGD event may have occurred after the divergence of the rosid-asterid clade, but before the major diversification in Solanaceae. The older WGD event was shown to have occurred before the divergence of the rosid-asterid clade and after the divergence of rice-Arabidopsis (monocot-dicot). Copyright © 2012. Published by Elsevier Ltd.

  15. DArT whole genome profiling provides insights on the evolution and taxonomy of edible Banana (Musa spp.)

    Czech Academy of Sciences Publication Activity Database

    Sardos, J.; Perrier, X.; Doležel, Jaroslav; Hřibová, Eva; Christelová, Pavla; Van den Houwe, I.; Kilian, A.; Roux, N.

    2016-01-01

    Roč. 118, č. 7 (2016), s. 1269-1278 ISSN 0305-7364 R&D Projects: GA MŠk(CZ) LO1204; GA MŠk LG15017 Institutional support: RVO:61389030 Keywords : multilocus genotype data * arrays technology dart * genetic diversity * population-structure * balbisiana colla * acuminata colla * markers * identification * aflp * domestication * Musa acuminata * Musa balbisiana * Musa spp. * banana * DArT * domestication * taxonomy * classification * domestication Subject RIV: EB - Genetics ; Molecular Biology OBOR OECD: Plant sciences, botany Impact factor: 4.041, year: 2016

  16. Prediction of expected years of life using whole-genome markers.

    Directory of Open Access Journals (Sweden)

    Gustavo de los Campos

    Full Text Available Genetic factors are believed to account for 25% of the interindividual differences in Years of Life (YL among humans. However, the genetic loci that have thus far been found to be associated with YL explain a very small proportion of the expected genetic variation in this trait, perhaps reflecting the complexity of the trait and the limitations of traditional association studies when applied to traits affected by a large number of small-effect genes. Using data from the Framingham Heart Study and statistical methods borrowed largely from the field of animal genetics (whole-genome prediction, WGP, we developed a WGP model for the study of YL and evaluated the extent to which thousands of genetic variants across the genome examined simultaneously can be used to predict interindividual differences in YL. We find that a sizable proportion of differences in YL--which were unexplained by age at entry, sex, smoking and BMI--can be accounted for and predicted using WGP methods. The contribution of genomic information to prediction accuracy was even higher than that of smoking and body mass index (BMI combined; two predictors that are considered among the most important life-shortening factors. We evaluated the impacts of familial relationships and population structure (as described by the first two marker-derived principal components and concluded that in our dataset population structure explained partially, but not fully the gains in prediction accuracy obtained with WGP. Further inspection of prediction accuracies by age at death indicated that most of the gains in predictive ability achieved with WGP were due to the increased accuracy of prediction of early mortality, perhaps reflecting the ability of WGP to capture differences in genetic risk to deadly diseases such as cancer, which are most often responsible for early mortality in our sample.

  17. Whole-genome copy number variation analysis in anophthalmia and microphthalmia.

    Science.gov (United States)

    Schilter, K F; Reis, L M; Schneider, A; Bardakjian, T M; Abdul-Rahman, O; Kozel, B A; Zimmerman, H H; Broeckel, U; Semina, E V

    2013-11-01

    Anophthalmia/microphthalmia (A/M) represent severe developmental ocular malformations. Currently, mutations in known genes explain less than 40% of A/M cases. We performed whole-genome copy number variation analysis in 60 patients affected with isolated or syndromic A/M. Pathogenic deletions of 3q26 (SOX2) were identified in four independent patients with syndromic microphthalmia. Other variants of interest included regions with a known role in human disease (likely pathogenic) as well as novel rearrangements (uncertain significance). A 2.2-Mb duplication of 3q29 in a patient with non-syndromic anophthalmia and an 877-kb duplication of 11p13 (PAX6) and a 1.4-Mb deletion of 17q11.2 (NF1) in two independent probands with syndromic microphthalmia and other ocular defects were identified; while ocular anomalies have been previously associated with 3q29 duplications, PAX6 duplications, and NF1 mutations in some cases, the ocular phenotypes observed here are more severe than previously reported. Three novel regions of possible interest included a 2q14.2 duplication which cosegregated with microphthalmia/microcornea and congenital cataracts in one family, and 2q21 and 15q26 duplications in two additional cases; each of these regions contains genes that are active during vertebrate ocular development. Overall, this study identified causative copy number mutations and regions with a possible role in ocular disease in 17% of A/M cases. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  18. Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases.

    Directory of Open Access Journals (Sweden)

    Mariane M A Stefani

    2017-06-01

    Full Text Available Since leprosy is both treated and controlled by multidrug therapy (MDT it is important to monitor recurrent cases for drug resistance and to distinguish between relapse and reinfection as a means of assessing therapeutic efficacy. All three objectives can be reached with single nucleotide resolution using next generation sequencing and bioinformatics analysis of Mycobacterium leprae DNA present in human skin.DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR. Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico.In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence. However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable.This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission.

  19. Challenging a bioinformatic tool's ability to detect microbial contaminants using in silico whole genome sequencing data.

    Science.gov (United States)

    Olson, Nathan D; Zook, Justin M; Morrow, Jayne B; Lin, Nancy J

    2017-01-01

    High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus , Escherichia , and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

  20. Selective whole genome amplification for resequencing target microbial species from complex natural samples.

    Science.gov (United States)

    Leichty, Aaron R; Brisson, Dustin

    2014-10-01

    Population genomic analyses have demonstrated power to address major questions in evolutionary and molecular microbiology. Collecting populations of genomes is hindered in many microbial species by the absence of a cost effective and practical method to collect ample quantities of sufficiently pure genomic DNA for next-generation sequencing. Here we present a simple method to amplify genomes of a target microbial species present in a complex, natural sample. The selective whole genome amplification (SWGA) technique amplifies target genomes using nucleotide sequence motifs that are common in the target microbe genome, but rare in the background genomes, to prime the highly processive phi29 polymerase. SWGA thus selectively amplifies the target genome from samples in which it originally represented a minor fraction of the total DNA. The post-SWGA samples are enriched in target genomic DNA, which are ideal for population resequencing. We demonstrate the efficacy of SWGA using both laboratory-prepared mixtures of cultured microbes as well as a natural host-microbe association. Targeted amplification of Borrelia burgdorferi mixed with Escherichia coli at genome ratios of 1:2000 resulted in >10(5)-fold amplification of the target genomes with genomic extracts from Wolbachia pipientis-infected Drosophila melanogaster resulted in up to 70% of high-throughput resequencing reads mapping to the W. pipientis genome. By contrast, 2-9% of sequencing reads were derived from W. pipientis without prior amplification. The SWGA technique results in high sequencing coverage at a fraction of the sequencing effort, thus allowing population genomic studies at affordable costs. Copyright © 2014 by the Genetics Society of America.

  1. Utility of Whole-Genome Sequencing in Characterizing Acinetobacter Epidemiology and Analyzing Hospital Outbreaks

    Science.gov (United States)

    Fitzpatrick, Margaret A.; Hauser, Alan R.

    2015-01-01

    Acinetobacter baumannii frequently causes nosocomial infections and outbreaks. Whole-genome sequencing (WGS) is a promising technique for strain typing and outbreak investigations. We compared the performance of conventional methods with WGS for strain typing clinical Acinetobacter isolates and analyzing a carbapenem-resistant A. baumannii (CRAB) outbreak. We performed two band-based typing techniques (pulsed-field gel electrophoresis and repetitive extragenic palindromic-PCR), multilocus sequence type (MLST) analysis, and WGS on 148 Acinetobacter calcoaceticus-A. baumannii complex bloodstream isolates collected from a single hospital from 2005 to 2012. Phylogenetic trees inferred from core-genome single nucleotide polymorphisms (SNPs) confirmed three Acinetobacter species within this collection. Four major A. baumannii clonal lineages (as defined by MLST) circulated during the study, three of which are globally distributed and one of which is novel. WGS indicated that a threshold of 2,500 core SNPs accurately distinguished A. baumannii isolates from different clonal lineages. The band-based techniques performed poorly in assigning isolates to clonal lineages and exhibited little agreement with sequence-based techniques. After applying WGS to a CRAB outbreak that occurred during the study, we identified a threshold of 2.5 core SNPs that distinguished nonoutbreak from outbreak strains. WGS was more discriminatory than the band-based techniques and was used to construct a more accurate transmission map that resolved many of the plausible transmission routes suggested by epidemiologic links. Our study demonstrates that WGS is superior to conventional techniques for A. baumannii strain typing and outbreak analysis. These findings support the incorporation of WGS into health care infection prevention efforts. PMID:26699703

  2. Whole genome analysis of Leptospira licerasiae provides insight into leptospiral evolution and pathogenicity.

    Directory of Open Access Journals (Sweden)

    Jessica N Ricaldi

    Full Text Available The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835 provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010(T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT. Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for

  3. The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes.

    Science.gov (United States)

    Mao, Qing; Ciotlos, Serban; Zhang, Rebecca Yu; Ball, Madeleine P; Chin, Robert; Carnevali, Paolo; Barua, Nina; Nguyen, Staci; Agarwal, Misha R; Clegg, Tom; Connelly, Abram; Vandewege, Ward; Zaranek, Alexander Wait; Estep, Preston W; Church, George M; Drmanac, Radoje; Peters, Brock A

    2016-10-11

    Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics' Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics' standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.

  4. Impact of whole-genome duplication events on diversification rates in angiosperms.

    Science.gov (United States)

    Landis, Jacob B; Soltis, Douglas E; Li, Zheng; Marx, Hannah E; Barker, Michael S; Tank, David C; Soltis, Pamela S

    2018-03-01

    Polyploidy or whole-genome duplication (WGD) pervades the evolutionary history of angiosperms. Despite extensive progress in our understanding of WGD, the role of these events in promoting diversification is still not well understood. We seek to clarify the possible association between WGD and diversification rates in flowering plants. Using a previously published phylogeny spanning all land plants (31,749 tips) and WGD events inferred from analyses of the 1000 Plants (1KP) transcriptome data, we analyzed the association of WGDs and diversification rates following numerous WGD events across the angiosperms. We used a stepwise AIC approach (MEDUSA), a Bayesian mixture model approach (BAMM), and state-dependent diversification analyses (MuSSE) to investigate patterns of diversification. Sister-clade comparisons were used to investigate species richness after WGDs. Based on the density of 1KP taxon sampling, 106 WGDs were unambiguously placed on the angiosperm phylogeny. We identified 334-530 shifts in diversification rates. We found that 61 WGD events were tightly linked to changes in diversification rates, and state-dependent diversification analyses indicated higher speciation rates for subsequent rounds of WGD. Additionally, 70 of 99 WGD events showed an increase in species richness compared to the sister clade. Forty-six of the 106 WGDs analyzed appear to be closely associated with upshifts in the rate of diversification in angiosperms. Shifts in diversification do not appear more likely than random within a four-node lag phase following a WGD; however, younger WGD events are more likely to be followed by an upshift in diversification than older WGD events. © 2018 Botanical Society of America.

  5. Discovery of Gene Sources for Economic Traits in Hanwoo by Whole-genome Resequencing

    Directory of Open Access Journals (Sweden)

    Younhee Shin

    2016-09-01

    Full Text Available Hanwoo, a Korean native cattle (Bos taurus coreana, has great economic value due to high meat quality. Also, the breed has genetic variations that are associated with production traits such as health, disease resistance, reproduction, growth as well as carcass quality. In this study, next generation sequencing technologies and the availability of an appropriate reference genome were applied to discover a large amount of single nucleotide polymorphisms (SNPs in ten Hanwoo bulls. Analysis of whole-genome resequencing generated a total of 26.5 Gb data, of which 594,716,859 and 592,990,750 reads covered 98.73% and 93.79% of the bovine reference genomes of UMD 3.1 and Btau 4.6.1, respectively. In total, 2,473,884 and 2,402,997 putative SNPs were discovered, of which 1,095,922 (44.3% and 982,674 (40.9% novel SNPs were discovered against UMD3.1 and Btau 4.6.1, respectively. Among the SNPs, the 46,301 (UMD 3.1 and 28,613 SNPs (Btau 4.6.1 that were identified as Hanwoo-specific SNPs were included in the functional genes that may be involved in the mechanisms of milk production, tenderness, juiciness, marbling of Hanwoo beef and yellow hair. Most of the Hanwoo-specific SNPs were identified in the promoter region, suggesting that the SNPs influence differential expression of the regulated genes relative to the relevant traits. In particular, the non-synonymous (ns SNPs found in CORIN, which is a negative regulator of Agouti, might be a causal variant to determine yellow hair of Hanwoo. Our results will provide abundant genetic sources of variation to characterize Hanwoo genetics and for subsequent breeding.

  6. A synergism between adaptive effects and evolvability drives whole genome duplication to fixation.

    Directory of Open Access Journals (Sweden)

    Thomas D Cuypers

    2014-04-01

    Full Text Available Whole genome duplication has shaped eukaryotic evolutionary history and has been associated with drastic environmental change and species radiation. While the most common fate of WGD duplicates is a return to single copy, retained duplicates have been found enriched for highly interacting genes. This pattern has been explained by a neutral process of subfunctionalization and more recently, dosage balance selection. However, much about the relationship between environmental change, WGD and adaptation remains unknown. Here, we study the duplicate retention pattern postWGD, by letting virtual cells adapt to environmental changes. The virtual cells have structured genomes that encode a regulatory network and simple metabolism. Populations are under selection for homeostasis and evolve by point mutations, small indels and WGD. After populations had initially adapted fully to fluctuating resource conditions re-adaptation to a broad range of novel environments was studied by tracking mutations in the line of descent. WGD was established in a minority (≈30% of lineages, yet, these were significantly more successful at re-adaptation. Unexpectedly, WGD lineages conserved more seemingly redundant genes, yet had higher per gene mutation rates. While WGD duplicates of all functional classes were significantly over-retained compared to a model of neutral losses, duplicate retention was clearly biased towards highly connected TFs. Importantly, no subfunctionalization occurred in conserved pairs, strongly suggesting that dosage balance shaped retention. Meanwhile, singles diverged significantly. WGD, therefore, is a powerful mechanism to cope with environmental change, allowing conservation of a core machinery, while adapting the peripheral network to accommodate change.

  7. Quantification of trace-level DNA by real-time whole genome amplification.

    Science.gov (United States)

    Kang, Min-Jung; Yu, Hannah; Kim, Sook-Kyung; Park, Sang-Ryoul; Yang, Inchul

    2011-01-01

    Quantification of trace amounts of DNA is a challenge in analytical applications where the concentration of a target DNA is very low or only limited amounts of samples are available for analysis. PCR-based methods including real-time PCR are highly sensitive and widely used for quantification of low-level DNA samples. However, ordinary PCR methods require at least one copy of a specific gene sequence for amplification and may not work for a sub-genomic amount of DNA. We suggest a real-time whole genome amplification method adopting the degenerate oligonucleotide primed PCR (DOP-PCR) for quantification of sub-genomic amounts of DNA. This approach enabled quantification of sub-picogram amounts of DNA independently of their sequences. When the method was applied to the human placental DNA of which amount was accurately determined by inductively coupled plasma-optical emission spectroscopy (ICP-OES), an accurate and stable quantification capability for DNA samples ranging from 80 fg to 8 ng was obtained. In blind tests of laboratory-prepared DNA samples, measurement accuracies of 7.4%, -2.1%, and -13.9% with analytical precisions around 15% were achieved for 400-pg, 4-pg, and 400-fg DNA samples, respectively. A similar quantification capability was also observed for other DNA species from calf, E. coli, and lambda phage. Therefore, when provided with an appropriate standard DNA, the suggested real-time DOP-PCR method can be used as a universal method for quantification of trace amounts of DNA.

  8. Germline contamination and leakage in whole genome somatic single nucleotide variant detection.

    Science.gov (United States)

    Sendorek, Dorota H; Caloian, Cristian; Ellrott, Kyle; Bare, J Christopher; Yamaguchi, Takafumi N; Ewing, Adam D; Houlahan, Kathleen E; Norman, Thea C; Margolin, Adam A; Stuart, Joshua M; Boutros, Paul C

    2018-01-31

    The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.

  9. Comparison of Control of Clostridium difficile Infection in Six English Hospitals Using Whole-Genome Sequencing.

    Science.gov (United States)

    Eyre, David W; Fawley, Warren N; Rajgopal, Anu; Settle, Christopher; Mortimer, Kalani; Goldenberg, Simon D; Dawson, Susan; Crook, Derrick W; Peto, Tim E A; Walker, A Sarah; Wilcox, Mark H

    2017-08-01

    Variation in Clostridium difficile infection (CDI) rates between healthcare institutions suggests overall incidence could be reduced if the lowest rates could be achieved more widely. We used whole-genome sequencing (WGS) of consecutive C. difficile isolates from 6 English hospitals over 1 year (2013-14) to compare infection control performance. Fecal samples with a positive initial screen for C. difficile were sequenced. Within each hospital, we estimated the proportion of cases plausibly acquired from previous cases. Overall, 851/971 (87.6%) sequenced samples contained toxin genes, and 451 (46.4%) were fecal-toxin-positive. Of 652 potentially toxigenic isolates >90-days after the study started, 128 (20%, 95% confidence interval [CI] 17-23%) were genetically linked (within ≤2 single nucleotide polymorphisms) to a prior patient's isolate from the previous 90 days. Hospital 2 had the fewest linked isolates, 7/105 (7%, 3-13%), hospital 1, 9/70 (13%, 6-23%), and hospitals 3-6 had similar proportions of linked isolates (22-26%) (P ≤ .002 comparing hospital-2 vs 3-6). Results were similar adjusting for locally circulating ribotypes. Adjusting for hospital, ribotype-027 had the highest proportion of linked isolates (57%, 95% CI 29-81%). Fecal-toxin-positive and toxin-negative patients were similarly likely to be a potential transmission donor, OR = 1.01 (0.68-1.49). There was no association between the estimated proportion of linked cases and testing rates. WGS can be used as a novel surveillance tool to identify varying rates of C. difficile transmission between institutions and therefore to allow targeted efforts to reduce CDI incidence. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America.

  10. Parents perspectives on whole genome sequencing for their children: qualified enthusiasm?

    Science.gov (United States)

    Anderson, J A; Meyn, M S; Shuman, C; Zlotnik Shaul, R; Mantella, L E; Szego, M J; Bowdin, S; Monfared, N; Hayeems, R Z

    2017-08-01

    To better understand the consequences of returning whole genome sequencing (WGS) results in paediatrics and facilitate its evidence-based clinical implementation, we studied parents' experiences with WGS and their preferences for the return of adult-onset secondary variants (SVs)-medically actionable genomic variants unrelated to their child's current medical condition that predict adult-onset disease. We conducted qualitative interviews with parents whose children were undergoing WGS as part of the SickKids Genome Clinic, a research project that studies the impact of clinical WGS on patients, families, and the healthcare system. Interviews probed parents' experience with and motivation for WGS as well as their preferences related to SVs. Interviews were analysed thematically. Of 83 invited, 23 parents from 18 families participated. These parents supported WGS as a diagnostic test, perceiving clear intrinsic and instrumental value. However, many parents were ambivalent about receiving SVs, conveying a sense of self-imposed obligation to take on the 'weight' of knowing their child's SVs, however unpleasant. Some parents chose to learn about adult-onset SVs for their child but not for themselves. Despite general enthusiasm for WGS as a diagnostic test, many parents felt a duty to learn adult-onset SVs. Analogous to 'inflicted insight', we call this phenomenon 'inflicted ought'. Importantly, not all parents of children undergoing WGS view the best interests of their child in relational terms, thereby challenging an underlying justification for current ACMG guidelines for reporting incidental secondary findings from whole exome and WGS. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  11. Identification of Escherichia coli and Shigella Species from Whole-Genome Sequences.

    Science.gov (United States)

    Chattaway, Marie A; Schaefer, Ulf; Tewolde, Rediat; Dallman, Timothy J; Jenkins, Claire

    2017-02-01

    Escherichia coli and Shigella species are closely related and genetically constitute the same species. Differentiating between these two pathogens and accurately identifying the four species of Shigella are therefore challenging. The organism-specific bioinformatics whole-genome sequencing (WGS) typing pipelines at Public Health England are dependent on the initial identification of the bacterial species by use of a kmer-based approach. Of the 1,982 Escherichia coli and Shigella sp. isolates analyzed in this study, 1,957 (98.4%) had concordant results by both traditional biochemistry and serology (TB&S) and the kmer identification (ID) derived from the WGS data. Of the 25 mismatches identified, 10 were enteroinvasive E. coli isolates that were misidentified as Shigella flexneri or S. boydii by the kmer ID, and 8 were S. flexneri isolates misidentified by TB&S as S. boydii due to nonfunctional S. flexneri O antigen biosynthesis genes. Analysis of the population structure based on multilocus sequence typing (MLST) data derived from the WGS data showed that the remaining discrepant results belonged to clonal complex 288 (CC288), comprising both S. boydii and S. dysenteriae strains. Mismatches between the TB&S and kmer ID results were explained by the close phylogenetic relationship between the two species and were resolved with reference to the MLST data. Shigella can be differentiated from E. coli and accurately identified to the species level by use of kmer comparisons and MLST. Analysis of the WGS data provided explanations for the discordant results between TB&S and WGS data, revealed the true phylogenetic relationships between different species of Shigella, and identified emerging pathoadapted lineages. © Crown copyright 2017.

  12. Advanced Whole-Genome Sequencing and Analysis of Fetal Genomes from Amniotic Fluid.

    Science.gov (United States)

    Mao, Qing; Chin, Robert; Xie, Weiwei; Deng, Yuqing; Zhang, Wenwei; Xu, Huixin; Zhang, Rebecca Yu; Shi, Quan; Peters, Erin E; Gulbahce, Natali; Li, Zhenyu; Chen, Fang; Drmanac, Radoje; Peters, Brock A

    2018-04-01

    Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 ( CHD8 ) and LDL receptor-related protein 1 ( LRP1 ), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures. © 2018 American Association for Clinical Chemistry.

  13. Microbiota present in cystic fibrosis lungs as revealed by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Philippe M Hauser

    Full Text Available Determination of the precise composition and variation of microbiota in cystic fibrosis lungs is crucial since chronic inflammation due to microorganisms leads to lung damage and ultimately, death. However, this constitutes a major technical challenge. Culturing of microorganisms does not provide a complete representation of a microbiota, even when using culturomics (high-throughput culture. So far, only PCR-based metagenomics have been investigated. However, these methods are biased towards certain microbial groups, and suffer from uncertain quantification of the different microbial domains. We have explored whole genome sequencing (WGS using the Illumina high-throughput technology applied directly to DNA extracted from sputa obtained from two cystic fibrosis patients. To detect all microorganism groups, we used four procedures for DNA extraction, each with a different lysis protocol. We avoided biases due to whole DNA amplification thanks to the high efficiency of current Illumina technology. Phylogenomic classification of the reads by three different methods produced similar results. Our results suggest that WGS provides, in a single analysis, a better qualitative and quantitative assessment of microbiota compositions than cultures and PCRs. WGS identified a high quantity of Haemophilus spp. (patient 1 or Staphylococcus spp. plus Streptococcus spp. (patient 2 together with low amounts of anaerobic (Veillonella, Prevotella, Fusobacterium and aerobic bacteria (Gemella, Moraxella, Granulicatella. WGS suggested that fungal members represented very low proportions of the microbiota, which were detected by cultures and PCRs because of their selectivity. The future increase of reads' sizes and decrease in cost should ensure the usefulness of WGS for the characterisation of microbiota.

  14. Whole genome grey and white matter DNA methylation profiles in dorsolateral prefrontal cortex.

    Science.gov (United States)

    Sanchez-Mut, Jose Vicente; Heyn, Holger; Vidal, Enrique; Delgado-Morales, Raúl; Moran, Sebastian; Sayols, Sergi; Sandoval, Juan; Ferrer, Isidre; Esteller, Manel; Gräff, Johannes

    2017-06-01

    The brain's neocortex is anatomically organized into grey and white matter, which are mainly composed by neuronal and glial cells, respectively. The neocortex can be further divided in different Brodmann areas according to their cytoarchitectural organization, which are associated with distinct cortical functions. There is increasing evidence that brain development and function are governed by epigenetic processes, yet their contribution to the functional organization of the neocortex remains incompletely understood. Herein, we determined the DNA methylation patterns of grey and white matter of dorsolateral prefrontal cortex (Brodmann area 9), an important region for higher cognitive skills that is particularly affected in various neurological diseases. For avoiding interindividual differences, we analyzed white and grey matter from the same donor using whole genome bisulfite sequencing, and for validating their biological significance, we used Infinium HumanMethylation450 BeadChip and pyrosequencing in ten and twenty independent samples, respectively. The combination of these analysis indicated robust grey-white matter differences in DNA methylation. What is more, cell type-specific markers were enriched among the most differentially methylated genes. Interestingly, we also found an outstanding number of grey-white matter differentially methylated genes that have previously been associated with Alzheimer's, Parkinson's, and Huntington's disease, as well as Multiple and Amyotrophic lateral sclerosis. The data presented here thus constitute an important resource for future studies not only to gain insight into brain regional as well as grey and white matter differences, but also to unmask epigenetic alterations that might underlie neurological and neurodegenerative diseases. © 2017 Wiley Periodicals, Inc.

  15. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

    Science.gov (United States)

    Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike; van den Berg, Jan; Vriezen, Wim; Janssen, Antoine; Datema, Erwin; Jahrman, Torben; Moquet, Frederic; Bonnet, Julien; Peters, Sander

    2014-10-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies. © 2014 The Authors The Plant Journal © 2014 John Wiley & Sons Ltd.

  16. Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases

    Science.gov (United States)

    Bührer-Sékula, Samira; Benjak, Andrej; Loiseau, Chloé; Singh, Pushpendra; Pontes, Maria A. A.; Gonçalves, Heitor S.; Hungria, Emerith M.; Busso, Philippe; Piton, Jérémie; Silveira, Maria I. S.; Cruz, Rossilene; Schetinni, Antônio; Costa, Maurício B.; Virmond, Marcos C. L.; Diorio, Suzana M.; Dias-Baptista, Ida M. F.; Rosa, Patricia S.; Matsuoka, Masanori; Penna, Maria L. F.; Cole, Stewart T.; Penna, Gerson O.

    2017-01-01

    Background Since leprosy is both treated and controlled by multidrug therapy (MDT) it is important to monitor recurrent cases for drug resistance and to distinguish between relapse and reinfection as a means of assessing therapeutic efficacy. All three objectives can be reached with single nucleotide resolution using next generation sequencing and bioinformatics analysis of Mycobacterium leprae DNA present in human skin. Methodology DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR). Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico. Principal findings In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence. However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable. Conclusions/Significance This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission. PMID:28617800

  17. Rapid genotyping with DNA micro-arrays for high-density linkage mapping and QTL mapping in common buckwheat (Fagopyrum esculentum Moench)

    Science.gov (United States)

    Yabe, Shiori; Hara, Takashi; Ueno, Mariko; Enoki, Hiroyuki; Kimura, Tatsuro; Nishimura, Satoru; Yasui, Yasuo; Ohsawa, Ryo; Iwata, Hiroyoshi

    2014-01-01

    For genetic studies and genomics-assisted breeding, particularly of minor crops, a genotyping system that does not require a priori genomic information is preferable. Here, we demonstrated the potential of a novel array-based genotyping system for the rapid construction of high-density linkage map and quantitative trait loci (QTL) mapping. By using the system, we successfully constructed an accurate, high-density linkage map for common buckwheat (Fagopyrum esculentum Moench); the map was composed of 756 loci and included 8,884 markers. The number of linkage groups converged to eight, which is the basic number of chromosomes in common buckwheat. The sizes of the linkage groups of the P1 and P2 maps were 773.8 and 800.4 cM, respectively. The average interval between adjacent loci was 2.13 cM. The linkage map constructed here will be useful for the analysis of other common buckwheat populations. We also performed QTL mapping for main stem length and detected four QTL. It took 37 days to process 178 samples from DNA extraction to genotyping, indicating the system enables genotyping of genome-wide markers for a few hundred buckwheat plants before the plants mature. The novel system will be useful for genomics-assisted breeding in minor crops without a priori genomic information. PMID:25914583

  18. Whole genome sequence and genome annotation of Colletotrichum acutatum, causal agent of anthracnose in pepper plants in South Korea.

    Science.gov (United States)

    Han, Joon-Hee; Chon, Jae-Kyung; Ahn, Jong-Hwa; Choi, Ik-Young; Lee, Yong-Hwan; Kim, Kyoung Su

    2016-06-01

    Colletotrichum acutatum is a destructive fungal pathogen which causes anthracnose in a wide range of crops. Here we report the whole genome sequence and annotation of C. acutatum strain KC05, isolated from an infected pepper in Kangwon, South Korea. Genomic DNA from the KC05 strain was used for the whole genome sequencing using a PacBio sequencer and the MiSeq system. The KC05 genome was determined to be 52,190,760 bp in size with a G + C content of 51.73% in 27 scaffolds and to contain 13,559 genes with an average length of 1516 bp. Gene prediction and annotation were performed by incorporating RNA-Seq data. The genome sequence of the KC05 was deposited at DDBJ/ENA/GenBank under the accession number LUXP00000000.

  19. Whole genome amplification of Chelex-extracted DNA from a single mite: a method for studying genetics of the predatory mite Phytoseiulus persimilis.

    Science.gov (United States)

    Konakandla, Bhanu; Park, Yoonseong; Margolies, David

    2006-01-01

    We developed and optimized a method using Chelex DNA extraction followed by whole genome amplification (WGA) to overcome problems conducting molecular genetic studies due to the limited amount of DNA obtainable from individual small organisms such as predatory mites. The DNA from a single mite, Phytoseiulus persimilis Athias-Henrot (Acari: Phytoseiidae), isolated in Chelex suspension was subjected to WGA. More than 1000-fold amplification of the DNA was achieved using as little as 0.03 ng genomic DNA template. The DNA obtained by the WGA was used for polymerase chain reaction followed by direct sequencing. From WGA DNA, nuclear DNA intergenic spacers ITS1 and ITS2 and a mitochondrial DNA 12S marker were tested in three different geographical populations of the predatory mite: California, the Netherlands, and Sicily. We found a total of four different alleles of the 12S in the Sicilian population, but no polymorphism was identified in the ITS marker. The combination of Chelex DNA extraction and WGA is thus shown to be a simple and robust technique for examining molecular markers for multiple loci by using individual mites. We conclude that the methods, Chelex extraction of DNA followed by WGA, provide a large quantity of DNA template that can be used for multiple PCR reactions useful for genetic studies requiring the genotypes of individual mites.

  20. Mining whole genomes and transcriptomes of Jatropha (Jatropha curcas) and Castor bean (Ricinus communis) for NBS-LRR genes and defense response associated transcription factors.

    Science.gov (United States)

    Sood, Archit; Jaiswal, Varun; Chanumolu, Sree Krishna; Malhotra, Nikhil; Pal, Tarun; Chauhan, Rajinder Singh

    2014-11-01

    Jatropha (Jatropha curcas L.) and Castor bean (Ricinus communis) are oilseed crops of family Euphorbiaceae with the potential of producing high quality biodiesel and having industrial value. Both the bioenergy plants are becoming susceptible to various biotic stresses directly affecting the oil quality and content. No report exists as of today on analysis of Nucleotide Binding Site-Leucine Rich Repeat (NBS-LRR) gene repertoire and defense response transcription factors in both the plant species. In silico analysis of whole genomes and transcriptomes identified 47 new NBS-LRR genes in both the species and 122 and 318 defense response related transcription factors in Jatropha and Castor bean, respectively. The identified NBS-LRR genes and defense response transcription factors were mapped onto the respective genomes. Common and unique NBS-LRR genes and defense related transcription factors were identified in both the plant species. All NBS-LRR genes in both the species were characterized into Toll/interleukin-1 receptor NBS-LRRs (TNLs) and coiled-coil NBS-LRRs (CNLs), position on contigs, gene clusters and motifs and domains distribution. Transcript abundance or expression values were measured for all NBS-LRR genes and defense response transcription factors, suggesting their functional role. The current study provides a repertoire of NBS-LRR genes and transcription factors which can be used in not only dissecting the molecular basis of disease resistance phenotype but also in developing disease resistant genotypes in Jatropha and Castor bean through transgenic or molecular breeding approaches.

  1. A Whole Genome Association Study to Detect Single Nucleotide Polymorphisms for Blood Components (Immunity in a Cross between Korean Native Pig and Yorkshire

    Directory of Open Access Journals (Sweden)

    Y.-M. Lee

    2012-12-01

    Full Text Available The purpose of this study was to detect significant SNPs for blood components that were related to immunity using high single nucleotide polymorphism (SNP density panels in a Korean native pig (KNP×Yorkshire (YK cross population. A reciprocal design of KNP×YK produced 249 F2 individuals that were genotyped for a total of 46,865 available SNPs in the Illumina porcine 60K beadchip. To perform whole genome association analysis (WGA, phenotypes were regressed on each SNP under a simple linear regression model after adjustment for sex and slaughter age. To set up a significance threshold, 0.1% point-wise p value from F distribution was used for each SNP test. Among the significant SNPs for a trait, the best set of SNP markers were determined using a stepwise regression procedure with the rates of inclusion and exclusion of each SNP out of the model at 0.001 level. A total of 54 SNPs were detected; 10, 6, 4, 4, 5, 4, 5, 10, and 6 SNPs for neutrophil, lymphocyte, monocyte, eosinophil, basophil, atypical lymph, immunoglobulin, insulin, and insulin-like growth factor-I, respectively. Each set of significant SNPs per trait explained 24 to 42% of phenotypic variance. Several pleiotropic SNPs were detected on SSCs 4, 13, 14 and 15.

  2. Whole genome characterization of a novel porcine reproductive and respiratory syndrome virus 1 isolate: Genetic evidence for recombination between Amervac vaccine and circulating strains in mainland China.

    Science.gov (United States)

    Chen, Nanhua; Liu, Qiaorong; Qiao, Mingming; Deng, Xiaoyu; Chen, Xizhao; Sun, Ming

    2017-10-01

    Genotype 1 porcine reproductive and respiratory syndrome virus (PRRSV 1) have been continuously isolated in China in recent years. Complete genome sequences of these isolates are important to investigate the prevalence and evolution of Chinese PRRSV 1. Herein, we describe the isolation of a novel PRRSV 1 isolate, denominated HLJB1, in the Heilongjiang province of China. Complete genome sequencing of HLJB1 showed that it shares 90.66% and 58.21% nucleotide identities with PRRSV 1 and 2 prototypic strains Lelystad virus and ATCC VR-2332, respectively. HLJB1 has a unique 5-amino-acid insertion in nsp2, which has never been described in other PRRSV 1 isolates. Whole genome-based phylogenetic analysis revealed that all Chinese PRRSV 1 isolates are clustered in pan-European subtype 1 and can be divided into four subgroups. HLJB1 resides in the subgroup of BJEU06-1-like isolates but is also closely related to the Amervac-like isolates. Additionally, recombination analyses suggested that HLJB1 is a recombinant from the Amervac vaccine and the BJEU06-1 isolate. To our best knowledge, our results provide the first genetic evidence for recombination between Amervac vaccine and circulating strains. These findings are also beneficial for studying the origin and evolution of PRRSV 1 in China. Copyright © 2017. Published by Elsevier B.V.

  3. Descriptive Epidemiology and Whole Genome Sequencing Analysis for an Outbreak of Bovine Tuberculosis in Beef Cattle and White-Tailed Deer in Northwestern Minnesota.

    Directory of Open Access Journals (Sweden)

    Linda Glaser

    Full Text Available Bovine tuberculosis (bTB was discovered in a Minnesota cow through routine slaughter surveillance in 2005 and the resulting epidemiological investigation led to the discovery of infection in both cattle and white-tailed deer in the state. From 2005 through 2009, a total of 12 beef cattle herds and 27 free-ranging white-tailed deer (Odocoileus virginianus were found infected in a small geographic region of northwestern Minnesota. Genotyping of isolates determined both cattle and deer shared the same strain of bTB, and it was similar to types found in cattle in the southwestern United States and Mexico. Whole genomic sequencing confirmed the introduction of this infection into Minnesota was recent, with little genetic divergence. Aggressive surveillance and management efforts in both cattle and deer continued from 2010-2012; no additional infections were discovered. Over 10,000 deer were tested and 705 whole herd cattle tests performed in the investigation of this outbreak.

  4. Optical Whole-Genome Restriction Mapping as a Tool for Rapidly Distinguishing and Identifying Bacterial Contaminants in Clinical Samples

    Science.gov (United States)

    2015-08-01

    Article 3. DATES COVERED (From – To) Oct 2011 – Aug 2012 4. TITLE AND SUBTITLE Optical Whole-Genome Restriction Mapping as a Tool for Rapidly...multiple bacteria could be uniquely identified within mixtures. In the first set of experiments, three unique organisms ( Bacillus subtilis subsp. globigii...be useful in monitoring nosocomial outbreaks in neonatal and intensive care wards, or even as an initial screen for antibiotic resistant strains

  5. Identification and Whole Genome Sequencing of the First Case of Kosakonia radicincitans Causing a Human Bloodstream Infection

    OpenAIRE

    Bhatti, Micah D.; Kalia, Awdhesh; Sahasrabhojane, Pranoti; Kim, Jiwoong; Greenberg, David E.; Shelburne, Samuel A.

    2017-01-01

    The taxonomy of Enterobacter species is rapidly changing. Herein we report a bloodstream infection isolate originally identified as Enterobacter cloacae by Vitek2 methodology that we found to be Kosakonia radicincitans using genetic means. Comparative whole genome sequencing of our isolate and other published Kosakonia genomes revealed these organisms lack the AmpC β-lactamase present on the chromosome of Enterobacter sp. A fimbriae operon primarily found in Escherichia coli O157:H7 isolates ...

  6. Whole-Genome Sequences of Two Carbapenem-Resistant Klebsiella quasipneumoniae Strains Isolated from a Tertiary Hospital in Johor, Malaysia.

    Science.gov (United States)

    Gan, Han Ming; Rajasekaram, Ganeswrie; Eng, Wilhelm Wei Han; Kaniappan, Priyatharisni; Dhanoa, Amreeta

    2017-08-10

    We report the whole-genome sequences of two carbapenem-resistant clinical isolates of Klebsiella quasipneumoniae subsp. similipneumoniae obtained from two different patients. Both strains contained three different extended-spectrum β-lactamase genes and showed strikingly high pairwise average nucleotide identity of 99.99% despite being isolated 3 years apart from the same hospital. Copyright © 2017 Gan et al.

  7. Identification of antimicrobial resistance genes in multidrug-resistant clinical Bacteroides fragilis isolates by whole genome shotgun sequencing

    DEFF Research Database (Denmark)

    Sydenham, Thomas Vognbjerg; Sóki, József; Hasman, Henrik

    2015-01-01

    Bacteroides fragilis constitutes the most frequent anaerobic bacterium causing bacteremia in humans. The genetic background for antimicrobial resistance in B. fragilis is diverse with some genes requiring insertion sequence (IS) elements inserted upstream for increased expression. To evaluate whole...... genome shotgun sequencing as a method for predicting antimicrobial resistance properties, one meropenem resistant and five multidrug-resistant blood culture isolates were sequenced and antimicrobial resistance genes and IS elements identified using ResFinder 2.1 (http...

  8. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli.

    OpenAIRE

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf Sommer; Nielsen, Eva M.; Aarestrup, Frank Møller

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-prod...

  9. Real-Time Whole-Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli

    OpenAIRE

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole; Hasman, Henrik; Kaas, Rolf S.; Nielsen, Eva M.; Aarestrup, Frank M.

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming cheaper, it has huge potential in both diagnostics and routine surveillance. The aim of this study was to perform a real-time evaluation of WGS for routine typing and surveillance of verocytotoxin-prod...

  10. Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges.

    Science.gov (United States)

    Cai, Binghuang; Li, Biao; Kiga, Nikki; Thusberg, Janita; Bergquist, Timothy; Chen, Yun-Ching; Niknafs, Noushin; Carter, Hannah; Tokheim, Collin; Beleva-Guthrie, Violeta; Douville, Christopher; Bhattacharya, Rohit; Yeo, Hui Ting Grace; Fan, Jean; Sengupta, Sohini; Kim, Dewey; Cline, Melissa; Turner, Tychele; Diekhans, Mark; Zaucha, Jan; Pal, Lipika R; Cao, Chen; Yu, Chen-Hsin; Yin, Yizhou; Carraro, Marco; Giollo, Manuel; Ferrari, Carlo; Leonardi, Emanuela; Tosatto, Silvio C E; Bobe, Jason; Ball, Madeleine; Hoskins, Roger A; Repo, Susanna; Church, George; Brenner, Steven E; Moult, John; Gough, Julian; Stanke, Mario; Karchin, Rachel; Mooney, Sean D

    2017-09-01

    The advent of next-generation sequencing has dramatically decreased the cost for whole-genome sequencing and increased the viability for its application in research and clinical care. The Personal Genome Project (PGP) provides unrestricted access to genomes of individuals and their associated phenotypes. This resource enabled the Critical Assessment of Genome Interpretation (CAGI) to create a community challenge to assess the bioinformatics community's ability to predict traits from whole genomes. In the CAGI PGP challenge, researchers were asked to predict whether an individual had a particular trait or profile based on their whole genome. Several approaches were used to assess submissions, including ROC AUC (area under receiver operating characteristic curve), probability rankings, the number of correct predictions, and statistical significance simulations. Overall, we found that prediction of individual traits is difficult, relying on a strong knowledge of trait frequency within the general population, whereas matching genomes to trait profiles relies heavily upon a small number of common traits including ancestry, blood type, and eye color. When a rare genetic disorder is present, profiles can be matched when one or more pathogenic variants are identified. Prediction accuracy has improved substantially over the last 6 years due to improved methodology and a better understanding of features. © 2017 Wiley Periodicals, Inc.

  11. WGSSAT: A High-Throughput Computational Pipeline for Mining and Annotation of SSR Markers From Whole Genomes.

    Science.gov (United States)

    Pandey, Manmohan; Kumar, Ravindra; Srivastava, Prachi; Agarwal, Suyash; Srivastava, Shreya; Nagpure, Naresh S; Jena, Joy K; Kushwaha, Basdeo

    2018-03-16

    Mining and characterization of Simple Sequence Repeat (SSR) markers from whole genomes provide valuable information about biological significance of SSR distribution and also facilitate development of markers for genetic analysis. Whole genome sequencing (WGS)-SSR Annotation Tool (WGSSAT) is a graphical user interface pipeline developed using Java Netbeans and Perl scripts which facilitates in simplifying the process of SSR mining and characterization. WGSSAT takes input in FASTA format and automates the prediction of genes, noncoding RNA (ncRNA), core genes, repeats and SSRs from whole genomes followed by mapping of the predicted SSRs onto a genome (classified according to genes, ncRNA, repeats, exonic, intronic, and core gene region) along with primer identification and mining of cross-species markers. The program also generates a detailed statistical report along with visualization of mapped SSRs, genes, core genes, and RNAs. The features of WGSSAT were demonstrated using Takifugu rubripes data. This yielded a total of 139 057 SSR, out of which 113 703 SSR primer pairs were uniquely amplified in silico onto a T. rubripes (fugu) genome. Out of 113 703 mined SSRs, 81 463 were from coding region (including 4286 exonic and 77 177 intronic), 7 from RNA, 267 from core genes of fugu, whereas 105 641 SSR and 601 SSR primer pairs were uniquely mapped onto the medaka genome. WGSSAT is tested under Ubuntu Linux. The source code, documentation, user manual, example dataset and scripts are available online at https://sourceforge.net/projects/wgssat-nbfgr.

  12. Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

    Science.gov (United States)

    2014-01-01

    Background Following transmission, HIV-1 evolves into a diverse population, and next generation sequencing enables us to detect variants occurring at low frequencies. Studying viral evolution at the level of whole genomes was hitherto not possible because next generation sequencing delivers relatively short reads. Results We here provide a proof of principle that whole HIV-1 genomes can be reliably reconstructed from short reads, and use this to study the selection of immune escape mutations at the level of whole genome haplotypes. Using realistically simulated HIV-1 populations, we demonstrate that reconstruction of complete genome haplotypes is feasible with high fidelity. We do not reconstruct all genetically distinct genomes, but each reconstructed haplotype represents one or more of the quasispecies in the HIV-1 population. We then reconstruct 30 whole genome haplotypes from published short sequence reads sampled longitudinally from a single HIV-1 infected patient. We confirm the reliability of the reconstruction by validating our predicted haplotype genes with single genome amplification sequences, and by comparing haplotype frequencies with observed epitope escape frequencies. Conclusions Phylogenetic analysis shows that the HIV-1 population undergoes selection driven evolution, with successive replacement of the viral population by novel dominant strains. We demonstrate that immune escape mutants evolve in a dependent manner with various mutations hitchhiking along with others. As a consequence of this clonal interference, selection coefficients have to be estimated for complete haplotypes and not for individual immune escapes. PMID:24996694

  13. Whole genome detection of rotavirus mixed infections in human, porcine and bovine samples co-infected with various rotavirus strains collected from sub-Saharan Africa.

    Science.gov (United States)

    Nyaga, Martin M; Jere, Khuzwayo C; Esona, Mathew D; Seheri, Mapaseka L; Stucker, Karla M; Halpin, Rebecca A; Akopov, Asmik; Stockwell, Timothy B; Peenze, Ina; Diop, Amadou; Ndiaye, Kader; Boula, Angeline; Maphalala, Gugu; Berejena, Chipo; Mwenda, Jason M; Steele, A Duncan; Wentworth, David E; Mphahlele, M Jeffrey

    2015-04-01

    Group A rotaviruses (RVA) are among the main global causes of severe diarrhea in children under the age of 5years. Strain diversity, mixed infections and untypeable RVA strains are frequently reported in Africa. We analysed rotavirus-positive human stool samples (n=13) obtained from hospitalised children under the age of 5years who presented with acute gastroenteritis at sentinel hospital sites in six African countries, as well as bovine and porcine stool samples (n=1 each), to gain insights into rotavirus diversity and evolution. Polyacrylamide gel electrophoresis (PAGE) analysis and genotyping with G-(VP7) and P-specific (VP4) typing primers suggested that 13 of the 15 samples contained more than 11 segments and/or mixed G/P genotypes. Full-length amplicons for each segment were generated using RVA-specific primers and sequenced using the Ion Torrent and/or Illumina MiSeq next-generation sequencing platforms. Sequencing detected at least one segment in each sample for which duplicate sequences, often having distinct genotypes, existed. This supported and extended the PAGE and RT-PCR genotyping findings that suggested these samples were collected from individuals that had mixed rotavirus infections. The study reports the first porcine (MRC-DPRU1567) and bovine (MRC-DPRU3010) mixed infections. We also report a unique genome segment 9 (VP7), whose G9 genotype belongs to lineage VI and clusters with porcine reference strains. Previously, African G9 strains have all been in lineage III. Furthermore, additional RVA segments isolated from humans have a clear evolutionary relationship with porcine, bovine and ovine rotavirus sequences, indicating relatively recent interspecies transmission and reassortment. Thus, multiple RVA strains from sub-Saharan Africa are infecting mammalian hosts with unpredictable variations in their gene segment combinations. Whole-genome sequence analyses of mixed RVA strains underscore the considerable diversity of rotavirus sequences and

  14. High Resolution Typing by Whole Genome Mapping Enables Discrimination of LA-MRSA (CC398) Strains and Identification of Transmission Events

    Science.gov (United States)

    Bosch, Thijs; Verkade, Erwin; van Luit, Martijn; Pot, Bruno; Vauterin, Paul; Burggrave, Ronald; Savelkoul, Paul; Kluytmans, Jan; Schouls, Leo

    2013-01-01

    After its emergence in 2003, a livestock-associated (LA-)MRSA clade (CC398) has caused an impressive increase in the number of isolates submitted for the Dutch national MRSA surveillance and now comprises 40% of all isolates. The currently used molecular typing techniques have limited discriminatory power for this MRSA clade, which hampers studies on the origin and transmission routes. Recently, a new molecular analysis technique named whole genome mapping was introduced. This method creates high-resolution, ordered whole genome restriction maps that may have potential for strain typing. In this study, we assessed and validated the capability of whole genome mapping to differentiate LA-MRSA isolates. Multiple validation experiments showed that whole genome mapping produced highly reproducible results. Assessment of the technique on two well-documented MRSA outbreaks showed that whole genome mapping was able to confirm one outbreak, but revealed major differences between the maps of a second, indicating that not all isolates belonged to this outbreak. Whole genome mapping of LA-MRSA isolates that were epidemiologically unlinked provided a much higher discriminatory power than spa-typing or MLVA. In contrast, maps created from LA-MRSA isolates obtained during a proven LA-MRSA outbreak were nearly indistinguishable showing that transmission of LA-MRSA can be detected by whole genome mapping. Finally, whole genome maps of LA-MRSA isolates originating from two unrelated veterinarians and their household members showed that veterinarians may carry and transmit different LA-MRSA strains at the same time. No such conclusions could be drawn based spa-typing and MLVA. Although PFGE seems to be suitable for molecular typing of LA-MRSA, WGM provides a much higher discriminatory power. Furthermore, whole genome mapping can provide a comparison with other maps within 2 days after the bacterial culture is received, making it suitable to investigate transmission events and

  15. A comparison between genotyping-by-sequencing and array-based scoring of SNPs for genomic prediction accuracy in winter wheat.

    Science.gov (United States)

    Elbasyoni, Ibrahim S; Lorenz, A J; Guttieri, M; Frels, K; Baenziger, P S; Poland, J; Akhunov, E

    2018-05-01

    The utilization of DNA molecular markers in plant breeding to maximize selection response via marker-assisted selection (MAS) and genomic selection (GS) has revolutionized plant breeding. A key factor affecting GS applicability is the choice of molecular marker platform. Genotyping-by-sequencing scored SNPs (GBS-scored SNPs) provides a large number of markers, albeit with high rates of missing data. Array scored SNPs are of high quality, but the cost per sample is substantially higher. The objectives of this study were 1) compare GBS-scored SNPs, and array scored SNPs for genomic selection applications, and 2) compare estimates of genomic kinship and population structure calculated using the two marker platforms. SNPs were compared in a diversity panel consisting of 299 hard winter wheat (Triticum aestivum L.) accessions that were part of a multi-year, multi-environments association mapping study. The panel was phenotyped in Ithaca, Nebraska for heading date, plant height, days to physiological maturity and grain yield in 2012 and 2013. The panel was genotyped using GBS-scored SNPs, and array scored SNPs. Results indicate that GBS-scored SNPs is comparable to or better than Array-scored SNPs for genomic prediction application. Both platforms identified the same genetic patterns in the panel where 90% of the lines were classified to common genetic groups. Overall, we concluded that GBS-scored SNPs have the potential to be the marker platform of choice for genetic diversity and genomic selection in winter wheat. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. 1 H MR spectroscopy in cervical carcinoma using external phase array body coil at 3.0 Tesla: Prediction of poor prognostic human papillomavirus genotypes.

    Science.gov (United States)

    Lin, Gigin; Lai, Chyong-Huey; Tsai, Shang-Yueh; Lin, Yu-Chun; Huang, Yu-Ting; Wu, Ren-Chin; Yang, Lan-Yan; Lu, Hsin-Ying; Chao, Angel; Wang, Chiun-Chieh; Ng, Koon-Kwan; Ng, Shu-Hang; Chou, Hung-Hsueh; Yen, Tzu-Chen; Hung, Ji-Hong

    2017-03-01

    To assess the clinical value of proton ( 1 H) MR spectroscopy in cervical carcinomas, in the prediction of poor prognostic human papillomavirus (HPV) genotypes as well as persistent disease following concurrent chemoradiotherapy (CCRT). 1 H MR spectroscopy using external phase array coil was performed in 52 consecutive cervical cancer patients at 3 Tesla (T). Poor prognostic HPV genotypes (alpha-7 species or absence of HPV infection) and persistent cervical carcinoma after CCRT were recorded. Statistical significance was calculated with the Mann-Whitney two-sided nonparametric test and areas under the receiver operating characteristics curve (AUC) analysis. A 4.3-fold (P = 0.032) increased level of methyl resonance at 0.9 ppm was found in the poor prognostic HPV genotypes, mainly attributed to the presence of HPV18, with a sensitivity of 75%, a specificity of 81%, and an AUC of 0.76. Poor prognostic HPV genotypes were more frequently observed in patients with adeno-/adenosquamous carcinoma (Chi-square, P < 0.0001). In prediction of the four patients with persistent disease after CCRT, elevated methyl resonance demonstrated a sensitivity of 100%, a specificity of 74%, and an AUC of 0.82. 1 H MR spectroscopy at 3T can be used to depict the elevated lipid resonance levels in cervical carcinomas, as well as help to predict the poor prognostic HPV genotypes and persistent disease following CCRT. Further large studies with longer follow up times are warranted to validate our initial findings. 1 J. Magn. Reson. Imaging 2017;45:899-907. © 2016 International Society for Magnetic Resonance in Medicine.

  17. Development and validation of concurrent preimplantation genetic diagnosis for single gene disorders and comprehensive chromosomal aneuploidy screening without whole genome amplification.

    Science.gov (United States)

    Zimmerman, Rebekah S; Jalas, Chaim; Tao, Xin; Fedick, Anastasia M; Kim, Julia G; Pepe, Russell J; Northrop, Lesley E; Scott, Richard T; Treff, Nathan R

    2016-02-01

    To develop a novel and robust protocol for multifactorial preimplantation genetic testing of trophectoderm biopsies using quantitative polymerase chain reaction (qPCR). Prospective and blinded. Not applicable. Couples indicated for preimplantation genetic diagnosis (PGD). None. Allele dropout (ADO) and failed amplification rate, genotyping consistency, chromosome screening success rate, and clinical outcomes of qPCR-based screening. The ADO frequency on a single cell from a fibroblast cell line was 1.64% (18/1,096). When two or more cells were tested, the ADO frequency dropped to 0.02% (1/4,426). The rate of amplification failure was 1.38% (55/4,000) overall, with 2.5% (20/800) for single cells and 1.09% (35/3,200) for samples that had two or more cells. Among 152 embryos tested in 17 cases by qPCR-based PGD and CCS, 100% were successfully given a diagnosis, with 0% ADO or amplification failure. Genotyping consistency with reference laboratory results was >99%. Another 304 embryos from 43 cases were included in the clinical application of qPCR-based PGD and CCS, for which 99.7% (303/304) of the embryos were given a definitive diagnosis, with only 0.3% (1/304) having an inconclusive result owing to recombination. In patients receiving a transfer with follow-up, the pregnancy rate was 82% (27/33). This study demonstrates that the use of qPCR for PGD testing delivers consistent and more reliable results than existing methods and that single gene disorder PGD can be run concurrently with CCS without the need for additional embryo biopsy or whole genome amplification. Copyright © 2016 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.

  18. Whole-genome characterization of Uruguayan strains of avian infectious bronchitis virus reveals extensive recombination between the two major South American lineages.

    Science.gov (United States)

    Marandino, Ana; Tomás, Gonzalo; Panzera, Yanina; Greif, Gonzalo; Parodi-Talice, Adriana; Hernández, Martín; Techera, Claudia; Hernández, Diego; Pérez, Ruben

    2017-10-01

    Infectious bronchitis virus (Gammacoronavirus, Coronaviridae) is a genetically variable RNA virus that causes one of the most persistent respiratory diseases in poultry. The virus is classified in genotypes and lineages with different epidemiological relevance. Two lineages of the GI genotype (11 and 16) have been widely circulating for decades in South America. GI-11 is an exclusive South American lineage while the GI-16 lineage is distributed in Asia, Europe and South America. Here, we obtained the whole genome of two Uruguayan strains of the GI-11 and GI-16 lineages using Illumina high-throughput sequencing. The strains here sequenced are the first obtained in South America for the infectious bronchitis virus and provide new insights into the origin, spreading and evolution of viral variants. The complete genome of the GI-11 and GI-16 strains have 27,621 and 27,638 nucleotides, respectively, and possess the same genomic organization. Phylogenetic incongruence analysis reveals that both strains have a mosaic genome that arose by recombination between Euro Asiatic strains of the GI-16 lineage and ancestral South American GI-11 viruses. The recombination occurred in South America and produced two viral variants that have retained the full-length S1 sequences of the parental lineages but are extremely similar in the rest of their genomes. These recombinant virus have been extraordinary successful, persisting in the continent for several years with a notorious wide geographic distribution. Our findings reveal a singular viral dynamics and emphasize the importance of complete genomic characterization to understand the emergence and evolutionary history of viral variants. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Whole-Genome Sequence Analysis of Antimicrobial Resistance Genes in Streptococcus uberis and Streptococcus dysgalactiae Isolates from Canadian Dairy Herds

    Directory of Open Access Journals (Sweden)

    Julián Reyes Vélez

    2017-05-01

    Full Text Available The objectives of this study are to determine the occurrence of antimicrobial resistance (AMR genes using whole-genome sequence (WGS of Streptococcus uberis (S. uberis and Streptococcus dysgalactiae (S. dysgalactiae isolates, recovered from dairy cows in the Canadian Maritime Provinces. A secondary objective included the exploration of the association between phenotypic AMR and the genomic characteristics (genome size, guanine–cytosine content, and occurrence of unique gene sequences. Initially, 91 isolates were sequenced, and of these isolates, 89 were assembled. Furthermore, 16 isolates were excluded due to larger than expected genomic sizes (>2.3 bp × 1,000 bp. In the final analysis, 73 were used with complete WGS and minimum inhibitory concentration records, which were part of the previous phenotypic AMR study, representing 18 dairy herds from the Maritime region of Canada (1. A total of 23 unique AMR gene sequences were found in the bacterial genomes, with a mean number of 8.1 (minimum: 5; maximum: 13 per genome. Overall, there were 10 AMR genes [ANT(6, TEM-127, TEM-163, TEM-89, TEM-95, Linb, Lnub, Ermb, Ermc, and TetS] present only in S. uberis genomes and 2 genes unique (EF-TU and TEM-71 to the S. dysgalactiae genomes; 11 AMR genes [APH(3′, TEM-1, TEM-136, TEM-157, TEM-47, TetM, bl2b, gyrA, parE, phoP, and rpoB] were found in both bacterial species. Two-way tabulations showed association between the phenotypic susceptibility to lincosamides and the presence of linB (P = 0.002 and lnuB (P < 0.001 genes and the between the presence of tetM (P = 0.015 and tetS (P = 0.064 genes and phenotypic resistance to tetracyclines only for the S. uberis isolates. The logistic model showed that the odds of resistance (to any of the phenotypically tested antimicrobials was 4.35 times higher when there were >11 AMR genes present in the genome, compared with <7 AMR genes (P < 0.001. The odds of resistance was lower for S

  20. Whole genome transcript profiling from fingerstick blood samples: a comparison and feasibility study

    Directory of Open Access Journals (Sweden)

    Williams Adam R

    2009-12-01

    Full Text Available Abstract Background Whole genome gene expression profiling has revolutionized research in the past decade especially with the advent of microarrays. Recently, there have been significant improvements in whole blood RNA isolation techniques which, through stabilization of RNA at the time of sample collection, avoid bias and artifacts introduced during sample handling. Despite these improvements, current human whole blood RNA stabilization/isolation kits are limited by the requirement of a venous blood sample of at least 2.5 mL. While fingerstick blood collection has been used for many different assays, there has yet to be a kit developed to isolate high quality RNA for use in gene expression studies from such small human samples. The clinical and field testing advantages of obtaining reliable and reproducible gene expression data from a fingerstick are many; it is less invasive, time saving, more mobile, and eliminates the need of a trained phlebotomist. Furthermore, this method could also be employed in small animal studies, i.e. mice, where larger sample collections often require sacrificing the animal. In this study, we offer a rapid and simple method to extract sufficient amounts of high quality total RNA from approximately 70 μl of whole blood collected via a fingerstick using a modified protocol of the commercially available Qiagen PAXgene RNA Blood Kit. Results From two sets of fingerstick collections, about 70 uL whole blood collected via finger lancet and capillary tube, we recovered an average of 252.6 ng total RNA with an average RIN of 9.3. The post-amplification yields for 50 ng of total RNA averaged at 7.0 ug cDNA. The cDNA hybridized to Affymetrix HG-U133 Plus 2.0 GeneChips had an average % Present call of 52.5%. Both fingerstick collections were highly correlated with r2 values ranging from 0.94 to 0.97. Similarly both fingerstick collections were highly correlated to the venous collection with r2 values ranging from 0.88 to 0

  1. Whole-genome sequencing reveals mutational landscape underlying phenotypic differences between two widespread Chinese cattle breeds.

    Directory of Open Access Journals (Sweden)

    Yao Xu

    Full Text Available Whole-genome sequencing provides a powerful tool to obtain more genetic variability that could produce a range of benefits for cattle breeding industry. Nanyang (Bos indicus and Qinchuan (Bos taurus are two important Chinese indigenous cattle breeds with distinct phenotypes. To identify the genetic characteristics responsible for variation in phenotypes between the two breeds, in the present study, we for the first time sequenced the genomes of four Nanyang and four Qinchuan cattle with 10 to 12 fold on average of 97.86% and 98.98% coverage of genomes, respectively. Comparison with the Bos_taurus_UMD_3.1 reference assembly yielded 9,010,096 SNPs for Nanyang, and 6,965,062 for Qinchuan cattle, 51% and 29% of which were novel SNPs, respectively. A total of 154,934 and 115,032 small indels (1 to 3 bp were found in the Nanyang and Qinchuan genomes, respectively. The SNP and indel distribution revealed that Nanyang showed a genetically high diversity as compared to Qinchuan cattle. Furthermore, a total of 2,907 putative cases of copy number variation (CNV were identified by aligning Nanyang to Qinchuan genome, 783 of which (27% encompassed the coding regions of 495 functional genes. The gene ontology (GO analysis revealed that many CNV genes were enriched in the immune system and environment adaptability. Among several CNV genes related to lipid transport and fat metabolism, Lepin receptor gene (LEPR overlapping with CNV_1815 showed remarkably higher copy number in Qinchuan than Nanyang (log2 (ratio = -2.34988; P value = 1.53E-102. Further qPCR and association analysis investigated that the copy number of the LEPR gene presented positive correlations with transcriptional expression and phenotypic traits, suggesting the LEPR CNV may contribute to the higher fat deposition in muscles of Qinchuan cattle. Our findings provide evidence that the distinct phenotypes of Nanyang and Qinchuan breeds may be due to the different genetic variations including SNPs

  2. The MedSeq Project: a randomized trial of integrating whole genome sequencing into clinical medicine.

    Science.gov (United States)

    Vassy, Jason L; Lautenbach, Denise M; McLaughlin, Heather M; Kong, Sek Won; Christensen, Kurt D; Krier, Joel; Kohane, Isaac S; Feuerman, Lindsay Z; Blumenthal-Barby, Jennifer; Roberts, J Scott; Lehmann, Lisa Soleymani; Ho, Carolyn Y; Ubel, Peter A; MacRae, Calum A; Seidman, Christine E; Murray, Michael F; McGuire, Amy L; Rehm, Heidi L; Green, Robert C

    2014-03-20

    Whole genome sequencing (WGS) is already being used in certain clinical and research settings, but its impact on patient well-being, health-care utilization, and clinical decision-making remains largely unstudied. It is also unknown how best to communicate sequencing results to physicians and patients to improve health. We describe the design of the MedSeq Project: the first randomized trials of WGS in clinical care. This pair of randomized controlled trials compares WGS to standard of care in two clinical contexts: (a) disease-specific genomic medicine in a cardiomyopathy clinic and (b) general genomic medicine in primary care. We are recruiting 8 to 12 cardiologists, 8 to 12 primary care physicians, and approximately 200 of their patients. Patient participants in both the cardiology and primary care trials are randomly assigned to receive a family history assessment with or without WGS. Our laboratory delivers a genome report to physician participants that balances the needs to enhance understandability of genomic information and to convey its complexity. We provide an educational curriculum for physician participants and offer them a hotline to genetics professionals for guidance in interpreting and managing their patients' genome reports. Using varied data sources, including surveys, semi-structured interviews, and review of clinical data, we measure the attitudes, behaviors and outcomes of physician and patient participants at multiple time points before and after the disclosure of these results. The impact of emerging sequencing technologies on patient care is unclear. We have designed a process of interpreting WGS results and delivering them to physicians in a way that anticipates how we envision genomic medicine will evolve in the near future. That is, our WGS report provides clinically relevant information while communicating the complexity and uncertainty of WGS results to physicians and, through physicians, to their patients. This project will not only

  3. Whole genome duplications and expansion of the vertebrate GATA transcription factor gene family

    Directory of Open Access Journals (Sweden)

    Bowerman Bruce

    2009-08-01

    Full Text Available Abstract Background GATA transcription factors influence many developmental processes, including the specification of embryonic germ layers. The GATA gene family has significantly expanded in many animal lineages: whereas diverse cnidarians have only one GATA transcription factor, six GATA genes have been identified in many vertebrates, five in many insects, and eleven to thirteen in Caenorhabditis nematodes. All bilaterian animal genomes have at least one member each of two classes, GATA123 and GATA456. Results We have identified one GATA123 gene and one GATA456 gene from the genomic sequence of two invertebrate deuterostomes, a cephalochordate (Branchiostoma floridae and a hemichordate (Saccoglossus kowalevskii. We also have confirmed the presence of six GATA genes in all vertebrate genomes, as well as additional GATA genes in teleost fish. Analyses of conserved sequence motifs and of changes to the exon-intron structure, and molecular phylogenetic analyses of these deuterostome GATA genes support their origin from two ancestral deuterostome genes, one GATA 123 and one GATA456. Comparison of the conserved genomic organization across vertebrates identified eighteen paralogous gene families linked to multiple vertebrate GATA genes (GATA paralogons, providing the strongest evidence yet for expansion of vertebrate GATA gene families via genome duplication events. Conclusion From our analysis, we infer the evolutionary birth order and relationships among vertebrate GATA transcription factors, and define their expansion via multiple rounds of whole genome duplication events. As the genomes of four independent invertebrate deuterostome lineages contain single copy GATA123 and GATA456 genes, we infer that the 0R (pre-genome duplication invertebrate deuterostome ancestor also had two GATA genes, one of each class. Synteny analyses identify duplications of paralogous chromosomal regions (paralogons, from single ancestral vertebrate GATA123 and GATA456

  4. In vivo capsular switch in Streptococcus pneumoniae--analysis by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Fen Z Hu

    Full Text Available Two multidrug resistant strains of Streptococcus pneumoniae - SV35-T23 (capsular type 23F and SV36-T3 (capsular type 3 were recovered from the nasopharynx of two adult patients during an outbreak of pneumococcal disease in a New York hospital in 1996. Both strains belonged to the pandemic lineage PMEN1 but they differed strikingly in virulence when tested in the mouse model of IP infection: as few as 1000 CFU of SV36 killed all mice within 24 hours after inoculation while SV35-T23 was avirulent.Whole genome sequencing (WGS of the two isolates was performed (i to test if these two isolates belonging to the same clonal type and recovered from an identical epidemiological scenario only differed in their capsular genes? and (ii to test if the vast difference in virulence between the strains was mostly - or exclusively - due to the type III capsule. WGS demonstrated extensive differences between the two isolates including over 2500 single nucleotide polymorphisms in core genes and also differences in 36 genetic determinants: 25 of which were unique to SV35-T23 and 11 unique to strain SV36-T3. Nineteen of these differences were capsular genes and 9 bacteriocin genes.Using genetic transformation in the laboratory, the capsular region of SV35-T23 was replaced by the type 3 capsular genes from SV36-T3 to generate the recombinant SV35-T3* which was as virulent as the parental strain SV36-T3* in the murine model and the type 3 capsule was the major virulence factor in the chinchilla model as well. On the other hand, a careful comparison of strains SV36-T3 and the laboratory constructed SV35-T3* in the chinchilla model suggested that some additional determinants present in SV36 but not in the laboratory recombinant may also contribute to the progression of middle ear disease. The nature of this determinants remains to be identified.

  5. Public attitudes in Japan toward participation in whole genome sequencing studies.

    Science.gov (United States)

    Okita, Taketoshi; Ohashi, Noriko; Kabata, Daijiro; Shintani, Ayumi; Kato, Kazuto

    2018-04-13

    Recent innovations in gene analysis technology have allowed for rapid and inexpensive sequencing of entire genomes. Thus, both conducting a study using whole genome sequencing (WGS) in a large population and the clinical application of research findings from such studies are currently feasible. However, to promote WGS studies, understanding and voluntary participation by the general public is needed. Therefore, it is essential to investigate the general public's attitude toward and understanding of WGS studies. The primary goal of our research is to investigate these issues and to discover how they relate to research participation in WGS studies. A survey of awareness regarding WGS and studies using WGS was conducted with a sample of 2000 or more participants using a self-administered questionnaire posted on the Internet between February 20 and 21, 2015. Prior to the survey, we briefly explained WGS and WGS study-related issues to the respondents in order to provide them with the minimum knowledge required to answer the questionnaire. We then conducted an analysis, including cross-classification. For the question regarding interest in WGS, 46.6% of participants responded "Yes." 70.7% of all respondents said that they were interested in some kinds of findings that could be obtained from WGS studies. Regarding participation in WGS studies, 29.0% were interested in participating. The demographic factors significantly related to attitudes toward research participation were age, level of education, and employment status. The results also suggest that concerns about WGS have a positive effect on people's willingness to participate. Furthermore, it was shown that for people who were not interested in their gene-related information, concerns about WGS negatively impacted their willingness to participate. However, for people who were interested in their gene-related information, their concerns might not have impacted their willingness to participate. This research has shown

  6. Whole genome SNP discovery and analysis of genetic diversity in Turkey (Meleagris gallopavo)

    Science.gov (United States)

    2012-01-01

    Background The turkey (Meleagris gallopavo) is an important agricultural species and the second largest contributor to the world’s poultry meat production. Genetic improvement is attributed largely to selective breeding programs that rely on highly heritable phenotypic traits, such as body size and breast muscle development. Commercial breeding with small effective population sizes and epistasis can result in loss of genetic diversity, which in turn can lead to reduced individual fitness and reduced response to selection. The presence of genomic diversity in domestic livestock species therefore, is of great importance and a prerequisite for rapid and accurate genetic improvement of selected breeds in various environments, as well as to facilitate rapid adaptation to potential changes in breeding goals. Genomic selection requires a large number of genetic markers such as e.g. single nucleotide polymorphisms (SNPs) the most abundant source of genetic variation within the genome. Results Alignment of next generation sequencing data of 32 individual turkeys from different populations was used for the discovery of 5.49 million SNPs, which subsequently were used for the analysis of genetic diversity among the different populations. All of the commercial lines branched from a single node relative to the heritage varieties and the South Mexican turkey population. Heterozygosity of all individuals from the different turkey populations ranged from 0.17-2.73 SNPs/Kb, while heterozygosity of populations ranged from 0.73-1.64 SNPs/Kb. The average frequency of heterozygous SNPs in individual turkeys was 1.07 SNPs/Kb. Five genomic regions with very low nucleotide variation were identified in domestic turkeys that showed state of fixation towards alleles different than wild alleles. Conclusion The turkey genome is much less diverse with a relatively low frequency of heterozygous SNPs as compared to other livestock species like chicken and pig. The whole genome SNP discovery

  7. Whole genome SNP discovery and analysis of genetic diversity in Turkey (Meleagris gallopavo

    Directory of Open Access Journals (Sweden)

    Aslam Muhammad L

    2012-08-01

    whole genome SNP discovery study in turkey resulted in the detection of 5.49 million putative SNPs compared to the reference genome. All commercial lines appear to share a common origin. Presence of different alleles/haplotypes in the SM population highlights that specific haplotypes have been selected in the modern domesticated turkey.

  8. [Detection of a fetus with paternally derived 2q37.3 microdeletion and 20p13p12.2 microduplication using whole genome microarray technology].

    Science.gov (United States)

    Zhang, Lin; Ren, Meihong; Song, Guining; Liu, Xuexia; Wang, Jianliu; Zhang, Xiaohong

    2016-12-10

    To perform prenatal diagnosis for a fetus with multiple malformations. The fetus was subjected to routine karyotyping and whole genome microarray analysis. The parents were subjected to high-resolution chromosome analysis. Fetal ultrasound at 28+4 weeks has indicated intrauterine growth restriction, left kidney agenesis, right kidney dysplasia, ventricular septal defect, and polyhydramnios. Chromosomal analysis showed that the fetus has a karyotype of 46,XY,der(2),der(20), t(2;20)(q37.3;p12.2), t(5;15) (q12.2;q25) pat. SNP array analysis confirmed that the fetus has a 5.283 Mb deletion at 2q37.3 and a 11.641 Mb duplication at 20p13p12.2. High-resolution chromosome analysis suggested that the father has a karyotype of 46,XY,t(2;20)(q37.3;p12.2),t(5;15)(q12.2;q25), while the mother has a normal karyotype. The abnormal phenotype of the fetus may be attributed to a 2q37.3 microdeletion and a 20p13p12.2 microduplication. The father has carried a complex translocation involving four chromosomes. To increase the chance for successful pregnancy, genetic diagnosis and/or assisted reproductive technology are warranted.

  9. Comparison of analytical and clinical performance of CLART HPV2 genotyping assay to Linear Array and Hybrid Capture 2

    DEFF Research Database (Denmark)

    Ejegod, Ditte Møller; Rebolj, Matejka; Bonde, Jesper

    2015-01-01

    to the Danish nation-wide Pathology Data Bank. For comparison of CLART and LA in terms of genotype detection, we calculated κ-coefficients, and proportions of overall and positive agreement. For comparison of CIN detection between CLART, LA, and HC2, we calculated the relative sensitivity and specificity......), and Hybrid Capture 2 (HC2) using samples stored in SurePath. METHODS: Residual material from 401 routine samples from women with abnormal cytology was tested by CLART, LA, and HC2 (ClinicalTrial.gov: NCT01671462, Ethical Committee approval: H-2012-070). Histological outcomes were ascertained by linkage...... for high-grade CIN. RESULTS: The κ-coefficient for agreement in detection of genotypes 16, 18, 31, 33, 35, and 51 was ≥0.90 (overall agreement: 98-99%, positive agreement: 84-95%). The values were slightly lower, but still in the "substantial" range for genotypes 39, 45, 52, 56, 58, 59, and several low...

  10. Performance of the digene LQ, RH and PS HPVs genotyping systems on clinical samples and comparison with HC2 and PCR-based Linear Array.

    Science.gov (United States)

    Godínez, Jose M; Tous, Sara; Baixeras, Nuria; Moreno-Crespi, Judith; Alejo, María; Lejeune, Marylène; Bravo, Ignacio G; Bosch, F Xavier; de Sanjosé, Silvia

    2011-11-18

    Certain Human Papillomaviruses (HPVs) are the infectious agents involved in cervical cancer development. Detection of HPVs DNA is part of the cervical cancer screening protocols and HPVs genotyping has been proposed for its inclusion in these preventive programs. The aim of this study was to evaluate three novel genotyping tests, namely Qiagen LQ, RH and PS, in clinical samples with and without abnormalities. For this, 305 cervical samples were processed and the results of the evaluated techniques were compared with those obtained in the HPVs diagnostic process in our lab, by using HC2 and Linear Array (LA) technologies. The concordances and kappa statistics (k) for each technique compared with HC2 were 98.69% (k = 0.94) for LQ, 98.03% (k = 0.91) for RH and 91.80% (k = 0.82) for PS. There was a very good agreement in HPVs type-specific concordance for the most prevalent types HPV16 (kappa range = 0.83-0.90), HPV18 (k.r.= 0.74-0.80) and HPV45 (k.r.= 0.82-0.90). The three tests showed an overall good concordance for HPVs detection when compared with HR-HC2 system. LQ and RH rendered lower detection rate for multiple infections than LA genotyping. However, our understanding of the clinical significance of multiple HPVs infections is still incomplete and therefore the relevance of the lower ability to detect multiple infections needs to be evaluated.

  11. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle

    DEFF Research Database (Denmark)

    Daetwyler, Hans D; Capitan, Aurélien; Pausch, Hubert

    2014-01-01

    The 1000 bull genomes project supports the goal of accelerating the rates of genetic gain in domestic cattle while at the same time considering animal health and welfare by providing the annotated sequence variants and genotypes of key ancestor bulls. In the first phase of the 1000 bull genomes p...

  12. Whole genome sequencing and assembly of Eukaryotic microbes isolated from ISS environmental surface Kirovograd region soil Chernobyl Nuclear Power Plant and Chernobyl Exclusion Zone

    Data.gov (United States)

    National Aeronautics and Space Administration — The whole-genome sequences of eight fungal strains that were selected for exposure to microgravity at the International Space Station are presented here. These...

  13. Microarray analysis of serum mRNA in patients with head and neck squamous cell carcinoma at whole-genome scale

    Czech Academy of Sciences Publication Activity Database

    Čapková, M.; Šáchová, Jana; Strnad, Hynek; Kolář, Michal; Hroudová, Miluše; Chovanec, M.; Čada, Z.; Štefl, M.; Valach, J.; Kastner, J.; Smetana, K. Jr.; Plzák, J.

    -, April 23 (2014) ISSN 2314-6141 R&D Projects: GA MZd(CZ) NT13488 Institutional support: RVO:68378050 Keywords : Microarray Analysis * Head and Neck Squamous Cell Carcinoma * whole-genome scale Subject RIV: EB - Genetics ; Molecular Biology

  14. Whole genome mRNA transcriptomics analysis reveals different modes of action of the diarrheic shellfish poisons okadaic acid and dinophysis toxin-1 versus azaspiracid-1 in Caco-2 cells.

    Science.gov (United States)

    Bodero, Marcia; Hoogenboom, Ron L A P; Bovee, Toine F H; Portier, Liza; de Haan, Laura; Peijnenburg, Ad; Hendriksen, Peter J M

    2018-02-01

    A study with DNA microarrays was performed to investigate the effects of two diarrhetic and one azaspiracid shellfish poison, okadaic acid (OA), dinophysistoxin-1 (DTX-1) and azaspiracid-1 (AZA-1) respectively, on the whole-genome mRNA expression of undifferentiated intestinal Caco-2 cells. Previously, the most responding genes were used to develop a dedicated array tube test to screen shellfish samples on the presence of these toxins. In the present study the whole genome mRNA expression was analyzed in order to reveal modes of action and obtain hints on potential biomarkers suitable to be used in alternative bioassays. Effects on key genes in the most affected pathways and processes were confirmed by qPCR. OA and DTX-1 induced almost identical effects on mRNA expression, which strongly indicates that OA and DTX-1induce similar toxic effects. Biological interpretation of the microarray data indicates that both compounds induce hypoxia related pathways/processes, the unfolded protein response (UPR) and endoplasmic reticulum (ER) stress. The gene expression profile of AZA-1 is different and shows increased mRNA expression of genes involved in cholesterol synthesis and glycolysis, suggesting a different mode of action for this toxin. Future studies should reveal whether identified pathways provide suitable biomarkers for rapid detection of DSPs in shellfish. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Effects of a diet high in monounsaturated fat and a full Mediterranean diet on PBMC whole genome gene expression and plasma proteins

    OpenAIRE

    Dijk, van, Susan; Feskens, Edith; Bos, M.B.; Groot, de, Lisette; Vries, de, Jeanne; Muller, Michael; Afman, Lydia

    2012-01-01

    This study aimed to identify the effects of replacement of saturated fat (SFA) by monunsaturated fat (MUFA) in a western-type diet and the effects of a full Mediterranean (MED) diet on whole genome PBMC gene expression and plasma protein profiles. Abdominally overweight subjects were randomized to a 8 wk completely controlled SFA-rich diet, a SFA-by-MUFA-replaced diet (MUFA diet) or a MED diet. Concentrations of 124 plasma proteins and PBMCs whole genome transcriptional profiles were assessed...

  16. Utility of the pooling approach as applied to whole genome association scans with high-density Affymetrix microarrays

    Directory of Open Access Journals (Sweden)

    Gray Joanna

    2010-11-01

    Full Text Available Abstract Background We report an attempt to extend the previously successful approach of combining SNP (single nucleotide polymorphism microarrays and DNA pooling (SNP-MaP employing high-density microarrays. Whereas earlier studies employed a range of Affymetrix SNP microarrays comprising from 10 K to 500 K SNPs, this most recent investigation used the 6.0 chip which displays 906,600 SNP probes and 946,000 probes for the interrogation of CNVs (copy number variations. The genotyping assay using the Affymetrix SNP 6.0 array is highly demanding on sample quality due to the small feature size, low redundancy, and lack of mismatch probes. Findings In the first study published so far using this microarray on pooled DNA, we found that pooled cheek swab DNA could not accurately predict real allele frequencies of the samples that comprised the pools. In contrast, the allele frequency estimates using blood DNA pools were reasonable, although inferior compared to those obtained with previously employed Affymetrix microarrays. However, it might be possible to improve performance by developing improved analysis methods. Conclusions Despite the decreasing costs of genome-wide individual genotyping, the pooling approach may have applications in very large-scale case-control association studies. In such cases, our study suggests that high-quality DNA preparations and lower density platforms should be preferred.

  17. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Directory of Open Access Journals (Sweden)

    Mark R Wilson

    Full Text Available Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS to Salmonella subspecies enterica serotype Tennessee (S. Tennessee to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana, which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs, suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts

  18. The Qatar genome project: translation of whole-genome sequencing into clinical practice.

    Science.gov (United States)

    Zayed, Hatem

    2016-10-01

    Qatar Genome Project was launched in 2013 with the intent to sequence the genome of each Qatari citizen in an effort to protect Qataris from the high rate of indigenous genetic diseases by allowing the mapping of disease-causing variants/rare variants and establishing a Qatari reference genome. Indeed, this project is expected to have numerous global benefits because the elevated homogeneity of the Qatari population, that will make Qatar an excellent genetic laboratory that will generate a wealth of data that will allow us to make sense of the genotype-phenotype correlations of many diseases, especially the complex multifactorial diseases, and will pave the way for changing the traditional medical practice of looking first at the phenotype rather than the genotype. © 2016 John Wiley & Sons Ltd.

  19. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer

    Science.gov (United States)

    Morrison, Carl D.; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M.; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R.; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H.; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C.; Johnson, Candace S.; Trump, Donald L.

    2014-01-01

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as “stitchers,” to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication–licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer. PMID:24469795

  20. Rapid and Easy In Silico Serotyping of Escherichia coli Isolates by Use of Whole-Genome Sequencing Data

    DEFF Research Database (Denmark)

    Joensen, Katrine Grimstrup; Tetzschner, Anna M. M.; Iguchi, Atsushi

    2015-01-01

    typing and surveillance. The aim of this study was to establish a valid and publicly available tool for WGS-based in silico serotyping of E. coli applicable for routine typing and surveillance. A FASTA database of specific O-antigen processing system genes for O typing and flagellin genes for H typing...... tool. SerotypeFinder was evaluated on 682 E. coli genomes, 108 of which were sequenced for this study, where both the whole genome and the serotype were available. In total, 601 and 509 isolates were included for O and H typing, respectively. The O-antigen genes wzx, wzy, wzm, and wzt and the flagellin...

  1. Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods.

    Science.gov (United States)

    Ahrenfeldt, Johanne; Skaarup, Carina; Hasman, Henrik; Pedersen, Anders Gorm; Aarestrup, Frank Møller; Lund, Ole

    2017-01-05

    Whole genome sequencing (WGS) is increasingly used in diagnostics and surveillance of infectious diseases. A major application for WGS is to use the data for identifying outbreak clusters, and there is therefore a need for methods that can accurately and efficiently infer phylogenies from sequencing reads. In the present study we describe a new dataset that we have created for the purpose of benchmarking such WGS-based methods for epidemiological data, and also present an analysis where we use the data to compare the performance of some current methods. Our aim was to create a benchmark data set that mimics sequencing data of the sort that might be collected during an outbreak of an infectious disease. This was achieved by letting an E. coli hypermutator strain grow in the lab for 8 consecutive days, each day splitting the culture in two while also collecting samples for sequencing. The result is a data set consisting of 101 whole genome sequences with known phylogenetic relationship. Among the sequenced samples 51 correspond to internal nodes in the phylogeny because they are ancestral, while the remaining 50 correspond to leaves. We also used the newly created data set to compare three different online available methods that infer phylogenies from whole-genome sequencing reads: NDtree, CSI Phylogeny and REALPHY. One complication when comparing the output of these methods with the known phylogeny is that phylogenetic methods typically build trees where all observed sequences are placed as leafs, even though some of them are in fact ancestral. We therefore devised a method for post processing the inferred trees by collapsing short branches (thus relocating some leafs to internal nodes), and also present two new measures of tree similarity that takes into account the identity of both internal and leaf nodes. Based on this analysis we find that, among the investigated methods, CSI Phylogeny had the best performance, correctly identifying 73% of all branches in the

  2. Analysis of complete genome sequences of G9P[19] rotavirus strains from human and piglet with diarrhea provides evidence for whole-genome interspecies transmission of nonreassorted porcine rotavirus.

    Science.gov (United States)

    Yodmeeklin, Arpaporn; Khamrin, Pattara; Chuchaona, Watchaporn; Kumthip, Kattareeya; Kongkaew, Aphisek; Vachirachewin, Ratchaya; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat

    2017-01-01

    Whole genomes of G9P[19] human (RVA/Human-wt/THA/CMH-S070-13/2013/G9P[19]) and porcine (RVA/Pig-wt/THA/CMP-015-12/2012/G9P[19]) rotaviruses concurrently detected in the same geographical area in northern Thailand were sequenced and analyzed for their genetic relationships using bioinformatic tools. The complete genome sequence of human rotavirus RVA/Human-wt/THA/CMH-S070-13/2013/G9P[19] was most closely related to those of porcine rotavirus RVA/Pig-wt/THA/CMP-015-12/2012/G9P[19] and to those of porcine-like human and porcine rotaviruses reference strains than to those of human rotavirus reference strains. The genotype constellation of G9P[19] detected in human and piglet were identical and displayed as the G9-P[19]-I5-R1-C1-M1-A8-N1-T1-E1-H1 genotypes with the nucleotide sequence identities of VP7, VP4, VP6, VP1, VP2, VP3, NSP1, NSP2, NSP3, NSP4, and NSP5 at 99.0%, 99.5%, 93.2%, 97.7%, 97.7%, 85.6%, 89.5%, 93.2%, 92.9%, 94.0%, and 98.1%, respectively. The findings indicate that human rotavirus strain RVA/Human-wt/THA/CMH-S070-13/2013/G9P[19] containing the genome segments of porcine genetic backbone is most likely a human rotavirus of porcine origin. Our data provide an evidence of interspecies transmission and whole-genome transmission of nonreassorted G9P[19] porcine RVA to human occurring in nature in northern Thailand. Copyright © 2016. Published by Elsevier B.V.

  3. Chemical and radiation mutagenesis: Induction and detection by whole genome sequencing

    Science.gov (United States)

    Brachypodium distachyon has emerged as an effective model system to address fundamental questions in grass biology. With its small sequenced genome, short generation time and rapidly expanding array of genetic tools B. distachyon is an ideal system to elucidate the molecular basis of important trai...

  4. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data.

    Science.gov (United States)

    Pesesky, Mitchell W; Hussain, Tahir; Wallace, Meghan; Patel, Sanket; Andleeb, Saadia; Burnham, Carey-Ann D; Dantas, Gautam

    2016-01-01

    The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitates initial use of empiric (frequently broad-spectrum) antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet this need. Many of these are based on identifying resistance genes, rather than directly assaying resistance phenotypes, and thus require interpretation to translate the genotype into treatment recommendations. These interpretations, like other parts of clinical diagnostic workflows, are likely to be increasingly automated in the future. We set out to evaluate the two major approaches that could be amenable to automation pipelines: rules-based methods and machine learning methods. The rules-based algorithm makes predictions based upon current, curated knowledge of Enterobacteriaceae resistance genes. The machine-learning algorithm predicts resistance and susceptibility based on a model built from a training set of variably resistant isolates. As our test set, we used whole genome sequence data from 78 clinical Enterobacteriaceae isolates, previously identified to represent a variety of phenotypes, from fully-susceptible to pan-resistant strains for the antibiotics tested. We tested three antibiotic resistance determinant databases for their utility in identifying the complete resistome for each isolate. The predictions of the rules-based and machine learning algorithms for these isolates were compared to results of phenotype-based diagnostics. The rules based and machine-learning predictions achieved agreement with standard-of-care phenotypic diagnostics of 89.0 and 90.3%, respectively, across twelve antibiotic agents from six major antibiotic classes. Several sources of disagreement between the algorithms were identified. Novel variants of known resistance factors and

  5. Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data

    Directory of Open Access Journals (Sweden)

    Mitchell Pesesky

    2016-11-01

    Full Text Available The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitate initial use of empiric (frequently broad-spectrum antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet this need. Many of these are based on identifying resistance genes, rather than directly assaying resistance phenotypes, and thus require interpretation to translate the genotype into treatment recommendations. These interpretations, like other parts of clinical diagnostic workflows, are likely to be increasingly automated in the future. We set out to evaluate the two major approaches that could be amenable to automation pipelines: rules-based methods and machine learning methods. The rules-based algorithm makes predictions based upon current, curated knowledge of Enterobacteriaceae resistance genes. The machine-learning algorithm predicts resistance and susceptibility based on a model built from a training set of variably resistant isolates. As our test set, we used whole genome sequence data from 78 clinical Enterobacteriaceae isolates, previously identified to represent a variety of phenotypes, from fully-susceptible to pan-resistant strains for the antibiotics tested. We tested three antibiotic resistance determinant databases for their utility in identifying the complete resistome for each isolate. The predictions of the rules-based and machine learning algorithms for these isolates were compared to results of phenotype-based diagnostics. The rules based and machine-learning predictions achieved agreement with standard-of-care phenotypic diagnostics of 89.0% and 90.3%, respectively, across twelve antibiotic agents from six major antibiotic classes. Several sources of disagreement between the algorithms were identified. Novel variants of known resistance

  6. In Vitro vs In Silico Detected SNPs for the Development of a Genotyping Array: What Can We Learn from a Non-Model Species?

    Science.gov (United States)

    Lepoittevin, Camille; Frigerio, Jean-Marc; Garnier-Géré, Pauline; Salin, Franck; Cervera, María-Teresa; Vornam, Barbara; Harvengt, Luc; Plomion, Christophe

    2010-01-01

    Background There is considerable interest in the high-throughput discovery and genotyping of single nucleotide polymorphisms (SNPs) to accelerate genetic mapping and enable association studies. This study provides an assessment of EST-derived and resequencing-derived SNP quality in maritime pine (Pinus pinaster Ait.), a conifer characterized by a huge genome size (∼23.8 Gb/C). Methodology/Principal Findings A 384-SNPs GoldenGate genotyping array was built from i/ 184 SNPs originally detected in a set of 40 re-sequenced candidate genes (in vitro SNPs), chosen on the basis of functionality scores, presence of neighboring polymorphisms, minor allele frequencies and linkage disequilibrium and ii/ 200 SNPs screened from ESTs (in silico SNPs) selected based on the number of ESTs used for SNP detection, the SNP minor allele frequency and the quality of SNP flanking sequences. The global success rate of the assay was 66.9%, and a conversion rate (considering only polymorphic SNPs) of 51% was achieved. In vitro SNPs showed significantly higher genotyping-success and conversion rates than in silico SNPs (+11.5% and +18.5%, respectively). The reproducibility was 100%, and the genotyping error rate very low (0.54%, dropping down to 0.06% when removing four SNPs showing elevated error rates). Conclusions/Significance This study demonstrates that ESTs provide a resource for SNP identification in non-model species, which do not require any additional bench work and little bio-informatics analysis. However, the time and cost benefits of in silico SNPs are counterbalanced by a lower conversion rate than in vitro SNPs. This drawback is acceptable for population-based experiments, but could be dramatic in experiments involving samples from narrow genetic backgrounds. In addition, we showed that both the visual inspection of genotyping clusters and the estimation of a per SNP error rate should help identify markers that are not suitable to the GoldenGate technology in species

  7. In vitro vs in silico detected SNPs for the development of a genotyping array: what can we learn from a non-model species?

    Directory of Open Access Journals (Sweden)

    Camille Lepoittevin

    2010-06-01

    Full Text Available There is considerable interest in the high-throughput discovery and genotyping of single nucleotide polymorphisms (SNPs to accelerate genetic mapping and enable association studies. This study provides an assessment of EST-derived and resequencing-derived SNP quality in maritime pine (Pinus pinaster Ait., a conifer characterized by a huge genome size ( approximately 23.8 Gb/C.A 384-SNPs GoldenGate genotyping array was built from i/ 184 SNPs originally detected in a set of 40 re-sequenced candidate genes (in vitro SNPs, chosen on the basis of functionality scores, presence of neighboring polymorphisms, minor allele frequencies and linkage disequilibrium and ii/ 200 SNPs screened from ESTs (in silico SNPs selected based on the number of ESTs used for SNP detection, the SNP minor allele frequency and the quality of SNP flanking sequences. The global success rate of the assay was 66.9%, and a conversion rate (considering only polymorphic SNPs of 51% was achieved. In vitro SNPs showed significantly higher genotyping-success and conversion rates than in silico SNPs (+11.5% and +18.5%, respectively. The reproducibility was 100%, and the genotyping error rate very low (0.54%, dropping down to 0.06% when removing four SNPs showing elevated error rates.This study demonstrates that ESTs provide a resource for SNP identification in non-model species, which do not require any additional bench work and little bio-informatics analysis. However, the time and cost benefits of in silico SNPs are counterbalanced by a lower conversion rate than in vitro SNPs. This drawback is acceptable for population-based experiments, but could be dramatic in experiments involving samples from narrow genetic backgrounds. In addition, we showed that both the visual inspection of genotyping clusters and the estimation of a per SNP error rate should help identify markers that are not suitable to the GoldenGate technology in species characterized by a large and complex genome.

  8. Whole-genome in-silico subtractive hybridization (WISH - using massive sequencing for the identification of unique and repetitive sex-specific sequences: the example of Schistosoma mansoni

    Directory of Open Access Journals (Sweden)

    Parrinello Hugues

    2010-06-01

    Full Text Available Abstract Background Emerging methods of massive sequencing that allow for rapid re-sequencing of entire genomes at comparably low cost are changing the way biological questions are addressed in many domains. Here we propose a novel method to compare two genomes (genome-to-genome comparison. We used this method to identify sex-specific sequences of the human blood fluke Schistosoma mansoni. Results Genomic DNA was extracted from male and female (heterogametic S. mansoni adults and sequenced with a Genome Analyzer (Illumina. Sequences are available at the NCBI sequence read archive http://www.ncbi.nlm.nih.gov/Traces/sra/ under study accession number SRA012151.6. Sequencing reads were aligned to the genome, and a pseudogenome composed of known repeats. Straightforward comparative bioinformatics analysis was performed to compare male and female schistosome genomes and identify female-specific sequences. We found that the S. mansoni female W chromosome contains only few specific unique sequences (950 Kb i.e. about 0.2% of the genome. The majority of W-specific sequences are repeats (10.5 Mb i.e. about 2.5% of the genome. Arbitrarily selected W-specific sequences were confirmed by PCR. Primers designed for unique and repetitive sequences allowed to reliably identify the sex of both larval and adult stages of the parasite. Conclusion Our genome-to-genome comparison method that we call "whole-genome in-silico subtractive hybridization" (WISH allows for rapid identification of sequences that are specific for a certain genotype (e.g. the heterogametic sex. It can in principle be used for the detection of any sequence differences between isolates (e.g. strains, pathovars or even closely related species.

  9. A Whole Genome Association Study on Meat Quality Traits Using High Density SNP Chips in a Cross between Korean Native Pig and Landrace

    Directory of Open Access Journals (Sweden)

    K.-T Lee

    2012-11-01

    Full Text Available A whole genome association (WGA study was performed to detect significant polymorphisms for meat quality traits in an F2 cross population (N = 478 that were generated with Korean native pig sires and Landrace dams in National Livestock Research Institute, Songwhan, Korea. The animals were genotyped using Illumina porcine 60k SNP beadchips, in which a set of 46,865 SNPs were available for the WGA analyses on ten carcass quality traits; live weight, crude protein, crude lipids, crude ash, water holding capacity, drip loss, shear force, CIE L, CIE a and CIE b. Phenotypes were regressed on additive and dominance effects for each SNP using a simple linear regression model, after adjusting for sex, sire and slaughter stage as fixed effects. With the significant SNPs for each trait (p<0.001, a stepwise regression procedure was applied to determine the best set of SNPs with the additive and/or dominance effects. A total of 106 SNPs, or quantitative trait loci (QTL were detected, and about 32 to 66% of the total phenotypic variation was explained by the significant SNPs for each trait. The QTL were identified in most porcine chromosomes (SSCs, in which majority of the QTL were detected in SSCs 1, 2, 12, 13, 14 and 16. Several QTL clusters were identified on SSCs 12, 16 and 17, and a cluster of QTL influencing crude protein, crude lipid, drip loss, shear force, CIE a and CIE b were located between 20 and 29 Mb of SSC12. A pleiotropic QTL for drip loss, CIE L and CIE b was also detected on SSC16. These QTL need to be validated in commercial pig populations for genetic improvement in meat quality via marker-assisted selection.

  10. Whole-genome analyses of DS-1-like human G2P[4] and G8P[4] rotavirus strains from Eastern, Western and Southern Africa.

    Science.gov (United States)

    Nyaga, Martin M; Stucker, Karla M; Esona, Mathew D; Jere, Khuzwayo C; Mwinyi, Bakari; Shonhai, Annie; Tsolenyanu, Enyonam; Mulindwa, Augustine; Chibumbya, Julia N; Adolfine, Hokororo; Halpin, Rebecca A; Roy, Sunando; Stockwell, Timothy B; Berejena, Chipo; Seheri, Mapaseka L; Mwenda, Jason M; Steele, A Duncan; Wentworth, David E; Mphahlele, M Jeffrey

    2014-10-01

    Group A rotaviruses (RVAs) with distinct G and P genotype combinations have been reported globally. We report the genome composition and possible origin of seven G8P[4] and five G2P[4] human RVA strains based on the genetic evolution of all 11 genome segments at the nucleotide level. Twelve RVA ELISA positive stool samples collected in the representative countries of Eastern, Southern and West Africa during the 2007-2012 surveillance seasons were subjected to sequencing using the Ion Torrent PGM and Illumina MiSeq platforms. A reference-based assembly was performed using CLC Bio's clc_ref_assemble_long program, and full-genome consensus sequences were obtained. With the exception of the neutralising antigen, VP7, all study strains exhibited the DS-1-like genome constellation (P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2) and clustered phylogenetically with reference strains having a DS-1-like genetic backbone. Comparison of the nucleotide and amino acid sequences with selected global cognate genome segments revealed nucleotide and amino acid sequence identities of 81.7-100 % and 90.6-100 %, respectively, with NSP4 gene segment showing the most diversity among the strains. Bayesian analyses of all gene sequences to estimate the time of divergence of the lineage indicated that divergence times ranged from 16 to 44 years, except for the NSP4 gene where the lineage seemed to arise in the more distant past at an estimated 203 years ago. However, the long-term effects of changes found within the NSP4 genome segment should be further explored, and thus we recommend continued whole-genome analyses from larger sample sets to determine the evolutionary mechanisms of the DS-1-like strains collected in Africa.

  11. A SNP Based Linkage Map of the Arctic Charr (Salvelinus alpinus Genome Provides Insights into the Diploidization Process After Whole Genome Duplication

    Directory of Open Access Journals (Sweden)

    Cameron M. Nugent

    2017-02-01

    Full Text Available Diploidization, which follows whole genome duplication events, does not occur evenly across the genome. In salmonid fishes, certain pairs of homeologous chromosomes preserve tetraploid loci in higher frequencies toward the telomeres due to residual tetrasomic inheritance. Research suggests this occurs only in homeologous pairs where one chromosome arm has undergone a fusion event. We present a linkage map for Arctic charr (Salvelinus alpinus, a salmonid species with relatively fewer chromosome fusions. Genotype by sequencing identified 19,418 SNPs, and a linkage map consisting of 4508 markers was constructed from a subset of high quality SNPs and microsatellite markers that were used to anchor the new map to previous versions. Both male- and female-specific linkage maps contained the expected number of 39 linkage groups. The chromosome type associated with each linkage group was determined, and 10 stable metacentric chromosomes were identified, along with a chromosome polymorphism involving the sex chromosome AC04. Two instances of a weak form of pseudolinkage were detected in the telomeric regions of homeologous chromosome arms in both female and male linkage maps. Chromosome arm homologies within the Atlantic salmon (Salmo salar and rainbow trout (Oncorhynchus mykiss genomes were determined. Paralogous sequence variants (PSVs were identified, and their comparative BLASTn hit locations showed that duplicate markers exist in higher numbers on seven pairs of homeologous arms, previously identified as preserving tetrasomy in salmonid species. Homeologous arm pairs where neither arm has been part of a fusion event in Arctic charr had fewer PSVs, suggesting faster diploidization rates in these regions.

  12. Investigating Drought Tolerance in Chickpea Using Genome-Wide Association Mapping and Genomic Selection Based on Whole-Genome Resequencing Data.

    Science.gov (United States)

    Li, Yongle; Ruperao, Pradeep; Batley, Jacqueline; Edwards, David; Khan, Tanveer; Colmer, Timothy D; Pang, Jiayin; Siddique, Kadambot H M; Sutton, Tim

    2018-01-01

    Drought tolerance is a complex trait that involves numerous genes. Identifying key causal genes or linked molecular markers can facilitate the fast development of drought tolerant varieties. Using a whole-genome resequencing approach, we sequenced 132 chickpea varieties and advanced breeding lines and found more than 144,000 single nucleotide polymorphisms (SNPs). We measured 13 yield and yield-related traits in three drought-prone environments of Western Australia. The genotypic effects were significant for all traits, and many traits showed highly significant correlations, ranging from 0.83 between grain yield and biomass to -0.67 between seed weight and seed emergence rate. To identify candidate genes, the SNP and trait data were incorporated into the SUPER genome-wide association study (GWAS) model, a modified version of the linear mixed model. We found that several SNPs from auxin-related genes, including auxin efflux carrier protein (PIN3), p-glycoprotein, and nodulin MtN21/EamA-like transporter, were significantly associated with yield and yield-related traits under drought-prone environments. We identified four genetic regions containing SNPs significantly associated with several different traits, which was an indication of pleiotropic effects. We also investigated the possibility of incorporating the GWAS results into a genomic selection (GS) model, which is another approach to deal with complex traits. Compared to using all SNPs, application of the GS model using subsets of SNPs significantly associated with the traits under investigation increased the prediction accuracies of three yield and yield-related traits by more than twofold. This has important implication for implementing GS in plant breeding programs.

  13. Investigating Drought Tolerance in Chickpea Using Genome-Wide Association Mapping and Genomic Selection Based on Whole-Genome Resequencing Data

    Directory of Open Access Journals (Sweden)

    Yongle Li

    2018-02-01

    Full Text Available Drought tolerance is a complex trait that involves numerous genes. Identifying key causal genes or linked molecular markers can facilitate the fast development of drought tolerant varieties. Using a whole-genome resequencing approach, we sequenced 132 chickpea varieties and advanced breeding lines and found more than 144,000 single nucleotide polymorphisms (SNPs. We measured 13 yield and yield-related traits in three drought-prone environments of Western Australia. The genotypic effects were significant for all traits, and many traits showed highly significant correlations, ranging from 0.83 between grain yield and biomass to -0.67 between seed weight and seed emergence rate. To identify candidate genes, the SNP and trait data were incorporated into the SUPER genome-wide association study (GWAS model, a modified version of the linear mixed model. We found that several SNPs from auxin-related genes, including auxin efflux carrier protein (PIN3, p-glycoprotein, and nodulin MtN21/EamA-like transporter, were significantly associated with yield and yield-related traits under drought-prone environments. We identified four genetic regions containing SNPs significantly associated with several different traits, which was an indication of pleiotropic effects. We also investigated the possibility of incorporating the GWAS results into a genomic selection (GS model, which is another approach to deal with complex traits. Compared to using all SNPs, application of the GS model using subsets of SNPs significantly associated with the traits under investigation increased the prediction accuracies of three yield and yield-related traits by more than twofold. This has important implication for implementing GS in plant breeding programs.

  14. Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Alexander C Outhred

    Full Text Available Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways.We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants.Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade.Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster.

  15. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  16. Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments

    Science.gov (United States)

    2014-01-01

    Background Whole-genome bisulfite sequencing currently provides the highest-precision view of the epigenome, with quantitative information about populations of cells down to single nucleotide resolution. Several studies have demonstrated the value of this precision: meaningful features that correlate strongly with biological functions can be found associated with only a few CpG sites. Understanding the role of DNA methylation, and more broadly the role of DNA accessibility, requires that methylation differences between populations of cells are identified with extreme precision and in complex experimental designs. Results In this work we investigated the use of beta-binomial regression as a general approach for modeling whole-genome bisulfite data to identify differentially methylated sites and genomic intervals. Conclusions The regression-based analysis can handle medium- and large-scale experiments where it becomes critical to accurately model variation in methylation levels between replicates and account for influence of various experimental factors like cell types or batch effects. PMID:24962134

  17. KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses.

    Science.gov (United States)

    Kim, Jungeun; Weber, Jessica A; Jho, Sungwoong; Jang, Jinho; Jun, JeHoon; Cho, Yun Sung; Kim, Hak-Min; Kim, Hyunho; Kim, Yumi; Chung, OkSung; Kim, Chang Geun; Lee, HyeJin; Kim, Byung Chul; Han, Kyudong; Koh, InSong; Chae, Kyun Shik; Lee, Semin; Edwards, Jeremy S; Bhak, Jong

    2018-04-04

    High-coverage whole-genome sequencing data of a single ethnicity can provide a useful catalogue of population-specific genetic variations, and provides a critical resource that can be used to more accurately identify pathogenic genetic variants. We report a comprehensive analysis of the Korean population, and present the Korean National Standard Reference Variome (KoVariome). As a part of the Korean Personal Genome Project (KPGP), we constructed the KoVariome database using 5.5 terabases of whole genome sequence data from 50 healthy Korean individuals in order to characterize the benign ethnicity-relevant genetic variation present in the Korean population. In total, KoVariome includes 12.7M single-nucleotide variants (SNVs), 1.7M short insertions and deletions (indels), 4K structural variations (SVs), and 3.6K copy number variations (CNVs). Among them, 2.4M (19%) SNVs and 0.4M (24%) indels were identified as novel. We also discovered selective enrichment of 3.8M SNVs and 0.5M indels in Korean individuals, which were used to filter out 1,271 coding-SNVs not originally removed from the 1,000 Genomes Project when prioritizing disease-causing variants. KoVariome health records were used to identify novel disease-causing variants in the Korean population, demonstrating the value of high-quality ethnic variation databases for the accurate interpretation of individual genomes and the precise characterization of genetic variations.

  18. Identification of somatic mutations in postmortem human brains by whole genome sequencing and their implications for psychiatric disorders.

    Science.gov (United States)

    Nishioka, Masaki; Bundo, Miki; Ueda, Junko; Katsuoka, Fumiki; Sato, Yukuto; Kuroki, Yoko; Ishii, Takao; Ukai, Wataru; Murayama, Shigeo; Hashimoto, Eri; Nagasaki, Masao; Yasuda, Jun; Kasai, Kiyoto; Kato, Tadafumi; Iwamoto, Kazuya

    2018-04-01

    Somatic mutations in the human brain are hypothesized to contribute to the functional diversity of brain cells as well as the pathophysiology of neuropsychiatric diseases. However, there are still few reports on somatic mutations in non-neoplastic human brain tissues. This study attempted to unveil the landscape of somatic mutations in the human brain. We explored the landscape of somatic mutations in human brain tissues derived from three individuals with no neuropsychiatric diseases by whole-genome deep sequencing at a depth of around 100. The candidate mutations underwent multi-layered filtering, and were validated by ultra-deep target amplicon sequencing at a depth of around 200 000. Thirty-one somatic mutations were identified in the human brain, demonstrating the utility of whole-genome sequencing of bulk brain tissue. The mutations were enriched in neuron-expressed genes, and two-thirds of the identified somatic single nucleotide variants in the brain tissues were cytosine-to-thymine transitions, half of which were in CpG dinucleotides. Our developed filtering and validation approaches will be useful to identify somatic mutations in the human brain. The vulnerability of neuron-expressed genes to mutational events suggests their potential relevance to neuropsychiatric diseases. © 2017 The Authors. Psychiatry and Clinical Neurosciences published by John Wiley & Sons Australia, Ltd on behalf of Japanese Society of Psychiatry and Neurology.

  19. Whole genome sequencing of Mycobacterium bovis to obtain molecular fingerprints in human and cattle isolates from Baja California, Mexico.

    Science.gov (United States)

    Sandoval-Azuara, Sarai Estrella; Muñiz-Salazar, Raquel; Perea-Jacobo, Ricardo; Robbe-Austerman, Suelee; Perera-Ortiz, Alejandro; López-Valencia, Gilberto; Bravo, Doris M; Sanchez-Flores, Alejandro; Miranda-Guzmán, Daniela; Flores-López, Carlos Alberto; Zenteno-Cuevas, Roberto; Laniado-Laborín, Rafael; de la Cruz, Fabiola Lafarga; Stuber, Tod P

    2017-10-01

    To determine genetic diversity by comparing the whole genome sequences of cattle and human Mycobacterium bovis isolates from Baja California. A whole genome sequencing strategy was used to obtain the molecular fingerprints of 172 isolates of M. bovis obtained from Baja California, Mexico; 155 isolates were from cattle and 17 isolates were from humans. Spoligotypes were characterized in silico and single nucleotide polymorphism (SNP) differences between the isolates were evaluated. A total of 12 M. bovis spoligotype patterns were identified in cattle and humans. Two predominant spoligotypes patterns were seen in both cattle and humans: SB0145 and SB1040. The SB0145 spoligotype represented 59% of cattle isolates (n=91) and 65% of human isolates (n=11), while the SB1040 spoligotype represented 30% of cattle isolates (n=47) and 30% of human isolates (n=5). When evaluating SNP differences, the human isolates were intimately intertwined with the cattle isolates. All isolates from humans had spoligotype patterns that matched those observed in the cattle isolates, and all human isolates shared common ancestors with cattle in Baja California based on SNP analysis. This suggests that most human tuberculosis caused by M. bovis in Baja California is derived from M. bovis circulating in Baja California cattle. These results reinforce the importance of bovine tuberculosis surveillance and control in this region. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  20. Comparing whole-genome sequencing with Sanger sequencing for spa typing of methicillin-resistant Staphylococcus aureus.

    Science.gov (United States)

    Bartels, Mette Damkjær; Petersen, Andreas; Worning, Peder; Nielsen, Jesper Boye; Larner-Svensson, Hanna; Johansen, Helle Krogh; Andersen, Leif Percival; Jarløv, Jens Otto; Boye, Kit; Larsen, Anders Rhod; Westh, Henrik

    2014-12-01

    spa typing of methicillin-resistant Staphylococcus aureus (MRSA) has traditionally been done by PCR amplification and Sanger sequencing of the spa repeat region. At Hvidovre Hospital, Denmark, whole-genome sequencing (WGS) of all MRSA isolates has been performed routinely since January 2013, and an in-house analysis pipeline determines the spa types. Due to national surveillance, all MRSA isolates are sent to Statens Serum Institut, where the spa type is determined by PCR and Sanger sequencing. The purpose of this study was to evaluate the reliability of the spa types obtained by 150-bp paired-end Illumina WGS. MRSA isolates from new MRSA patients in 2013 (n = 699) in the capital region of Denmark were included. We found a 97% agreement between spa types obtained by the two methods. All isolates achieved a spa type by both methods. Nineteen isolates differed in spa types by the two methods, in most cases due to the lack of 24-bp repeats in the whole-genome-sequenced isolates. These related but incorrect spa types should have no consequence in outbreak investigations, since all epidemiologically linked isolates, regardless of spa type, will be included in the single nucleotide polymorphism (SNP) analysis. This will reveal the close relatedness of the spa types. In conclusion, our data show that WGS is a reliable method to determine the spa type of MRSA. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  1. Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Ulrike Ober

    Full Text Available Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ∼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012 for starvation resistance (startle response. The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP-based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.

  2. Use of allele-specific FAIRE to determine functional regulatory polymorphism using large-scale genotyping arrays.

    Directory of Open Access Journals (Sweden)

    Andrew J P Smith

    Full Text Available Following the widespread use of genome-wide association studies (GWAS, focus is turning towards identification of causal variants rather than simply genetic markers of diseases and traits. As a step towards a high-throughput method to identify genome-wide, non-coding, functional regulatory variants, we describe the technique of allele-specific FAIRE, utilising large-scale genotyping technology (FAIRE-gen to determine allelic effects on chromatin accessibility and regulatory potential. FAIRE-gen was explored using lymphoblastoid cells and the 50,000 SNP Illumina CVD BeadChip. The technique identified an allele-specific regulatory polymorphism within NR1H3 (coding for LXR-α, rs7120118, coinciding with a previously GWAS-identified SNP for HDL-C levels. This finding was confirmed using FAIRE-gen with the 200,000 SNP Illumina Metabochip and verified with the established method of TaqMan allelic discrimination. Examination of this SNP in two prospective Caucasian cohorts comprising 15,000 individuals confirmed the association with HDL-C levels (combined beta = 0.016; p = 0.0006, and analysis of gene expression identified an allelic association with LXR-α expression in heart tissue. Using increasingly comprehensive genotyping chips and distinct tissues for examination, FAIRE-gen has the potential to aid the identification of many causal SNPs associated with disease from GWAS.

  3. Performance of the digene LQ, RH and PS HPVs genotyping systems on clinical samples and comparison with HC2 and PCR-based Linear Array

    Directory of Open Access Journals (Sweden)

    Godínez Jose M

    2011-11-01

    Full Text Available Abstract Background Certain Human Papillomaviruses (HPVs are the infectious agents involved in cervical cancer development. Detection of HPVs DNA is part of the cervical cancer screening protocols and HPVs genotyping has been proposed for its inclusion in these preventive programs. The aim of this study was to evaluate three novel genotyping tests, namely Qiagen LQ, RH and PS, in clinical samples with and without abnormalities. For this, 305 cervical samples were processed and the results of the evaluated techniques were compared with those obtained in the HPVs diagnostic process in our lab, by using HC2 and Linear Array (LA technologies. Results The concordances and kappa statistics (k for each technique compared with HC2 were 98.69% (k = 0.94 for LQ, 98.03% (k = 0.91 for RH and 91.80% (k = 0.82 for PS. There was a very good agreement in HPVs type-specific concordance for the most prevalent types HPV16 (kappa range = 0.83-0.90, HPV18 (k.r.= 0.74-0.80 and HPV45 (k.r.= 0.82-0.90. Conclusions The three tests showed an overall good concordance for HPVs detection when compared with HR-HC2 system. LQ and RH rendered lower detection rate for multiple infections than LA genotyping. However, our understanding of the clinical significance of multiple HPVs infections is still incomplete and therefore the relevance of the lower ability to detect multiple infections needs to be evaluated.

  4. Whole genome association mapping of plant height in winter wheat (Triticum aestivum L..

    Directory of Open Access Journals (Sweden)

    Christine D Zanke

    Full Text Available The genetic architecture of plant height was investigated in a set of 358 recent European winter wheat varieties plus 14 spring wheat varieties based on field data in eight environments. Genotyping of diagnostic markers revealed the Rht-D1b mutant allele in 58% of the investigated varieties, while the Rht-B1b mutant was only present in 7% of the varieties. Rht-D1 was significantly associated with plant height by using a mixed linear model and employing a kinship matrix to correct for population stratification. Further genotyping data included 732 microsatellite markers, resulting in 770 loci, of which 635 markers were placed on the ITMI map plus a set of 7769 mapped SNP markers genotyped with the 90 k iSELECT chip. When Bonferroni correction was applied, a total of 153 significant marker-trait associations (MTAs were observed for plant height and the SSR markers (-log10 (P-value ≥ 4.82 and 280 (-log10 (P-value ≥ 5.89 for the SNPs. Linear regression between the most effective markers and the BLUEs for plant height indicated additive effects for the MTAs of different chromosomal regions. Analysis of syntenic regions in the rice genome revealed closely linked rice genes related to gibberellin acid (GA metabolism and perception, i.e. GA20 and GA2 oxidases orthologous to wheat chromosomes 1A, 2A, 3A, 3B, 5B, 5D and 7B, ent-kaurenoic acid oxidase orthologous to wheat chromosome 7A, ent-kaurene synthase on wheat chromosome 2B, as well as GA-receptors like DELLA genes orthologous to wheat chromosomes 4B, 4D and 7A and genes of the GID family orthologous to chromosomes 2B and 5B. The data indicated that besides the widely used GA-insensitive dwarfing genes Rht-B1 and Rht-D1 there is a wide spectrum of loci available that could be used for modulating plant height in variety development.

  5. Powerful Inference With the D-Statistic on Low-Coverage Whole-Genome Data

    DEFF Research Database (Denmark)

    Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders

    2018-01-01

    The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness...... is assessed by evaluating specific coincidences of alleles between the groups. When working with high throughput sequencing data calling genotypes accurately is not always possi