WorldWideScience

Sample records for vulgaris whole-genome oligonucleotide

  1. Construction and Evaluation of Desulfovibrio vulgaris Whole-Genome Oligonucleotide Microarrays

    Energy Technology Data Exchange (ETDEWEB)

    Z. He; Q. He; L. Wu; M.E. Clark; J.D. Wall; Jizhong Zhou; Matthew W. Fields

    2004-03-17

    Desulfovibrio vulgaris Hildenborough has been the focus of biochemical and physiological studies in the laboratory, and the metabolic versatility of this organism has been largely recognized, particularly the reduction of sulfate, fumarate, iron, uranium and chromium. In addition, a Desulfovibrio sp. has been shown to utilize uranium as the sole electron acceptor. D. vulgaris is a d-Proteobacterium with a genome size of 3.6 Mb and 3584 ORFs. The whole-genome microarrays of D. vulgaris have been constructed using 70mer oligonucleotides. All ORFs in the genome were represented with 3471 (97.1%) unique probes and 103 (2.9%) non-specific probes that may have cross-hybridization with other ORFs. In preparation for use of the experimental microarrays, artificial probes and targets were designed to assess specificity and sensitivity and identify optimal hybridization conditions for oligonucleotide microarrays. The results indicated that for 50mer and 70mer oligonucleotide arrays, hybridization at 45 C to 50 C, washing at 37 C and a wash time of 2.5 to 5 minutes obtained specific and strong hybridization signals. In order to evaluate the performance of the experimental microarrays, growth conditions were selected that were expected to give significant hybridization differences for different sets of genes. The initial evaluations were performed using D. vulgaris cells grown at logarithmic and stationary phases. Transcriptional analysis of D. vulgaris cells sampled during logarithmic phase growth indicated that 25% of annotated ORFs were up-regulated and 3% of annotated ORFs were downregulated compared to stationary phase cells. The up-regulated genes included ORFs predicted to be involved with acyl chain biosynthesis, amino acid ABC transporter, translational initiation factors, and ribosomal proteins. In the stationary phase growth cells, the two most up-regulated ORFs (70-fold) were annotated as a carboxynorspermidine decarboxylase and a 2C-methyl-D-erythritol-2

  2. Construction of Whole Genome Microarrays, and Expression Analysis of Desulfovibrio vulgaris cells in Metal-Reducing Conditions (Uranium and Chromium)

    Energy Technology Data Exchange (ETDEWEB)

    Fields, Matthew W.

    2005-06-01

    One of the major goals of the project is to construct whole-genome microarrays for Desulfovibrio vulgaris. Previous whole-genome microarrays constructed at ORNL have been PCR-amplimer based, and we wanted to re-evaluate the type of microarrays being built because oligonucleotide probes have several advantages. Microarrays have been generally constructed with two types of probes, PCR-generated probes that typically range in size between 200 and 2000 bp, and oligonucleotide probes with typical size of 20-70 nt. Producing PCR product-based DNA arrays can be a time-consuming procedure that includes PCR primer design, amplification, size verification, product purification, and product quantification. Also, some ORFs are difficult to amplify and thus the construction of comprehensive arrays can be a challenge. Recently, to alleviate some of the problems associated with PCR product-based microarrays, oligonucleotide microarrays that contain probes longer than 40 nt have been evaluated and used for whole genome expression studies. These microarrays should have higher specificity and are easy to construct, and can thus provide an important alternative approach to monitor gene expression. However, due to the smaller probe size, it is expected that the detection sensitivity of oligonucleotide arrays will be lower than PCR product-based probes.

  3. Comparative analysis of copy number detection by whole-genome BAC and oligonucleotide array CGH

    Directory of Open Access Journals (Sweden)

    Bejjani Bassem A

    2010-06-01

    Full Text Available Abstract Background Microarray-based comparative genomic hybridization (aCGH is a powerful diagnostic tool for the detection of DNA copy number gains and losses associated with chromosome abnormalities, many of which are below the resolution of conventional chromosome analysis. It has been presumed that whole-genome oligonucleotide (oligo arrays identify more clinically significant copy-number abnormalities than whole-genome bacterial artificial chromosome (BAC arrays, yet this has not been systematically studied in a clinical diagnostic setting. Results To determine the difference in detection rate between similarly designed BAC and oligo arrays, we developed whole-genome BAC and oligonucleotide microarrays and validated them in a side-by-side comparison of 466 consecutive clinical specimens submitted to our laboratory for aCGH. Of the 466 cases studied, 67 (14.3% had a copy-number imbalance of potential clinical significance detectable by the whole-genome BAC array, and 73 (15.6% had a copy-number imbalance of potential clinical significance detectable by the whole-genome oligo array. However, because both platforms identified copy number variants of unclear clinical significance, we designed a systematic method for the interpretation of copy number alterations and tested an additional 3,443 cases by BAC array and 3,096 cases by oligo array. Of those cases tested on the BAC array, 17.6% were found to have a copy-number abnormality of potential clinical significance, whereas the detection rate increased to 22.5% for the cases tested by oligo array. In addition, we validated the oligo array for detection of mosaicism and found that it could routinely detect mosaicism at levels of 30% and greater. Conclusions Although BAC arrays have faster turnaround times, the increased detection rate of oligo arrays makes them attractive for clinical cytogenetic testing.

  4. Final Report Construction of Whole Genome Microarrays, and Expression Analysis of Desulfovibrio vulgaris cells in Metal-Reducing Conditions

    Energy Technology Data Exchange (ETDEWEB)

    M.W. Fields; J.D. Wall; J. Keasling; J. Zhou

    2008-05-15

    We continue to utilize the oligonucleotide microarrays that were constructed through funding with this project to characterize growth responses of Desulfovibrio vulgaris relevant to metal-reducing conditions. To effectively immobilize heavy metals and radionuclides via sulfate-reduction, it is important to understand the cellular responses to adverse factors observed at contaminated subsurface environments (e.g., nutrients, pH, contaminants, growth requirements and products). One of the major goals of the project is to construct whole-genome microarrays for Desulfovibrio vulgaris. First, in order to experimentally establish the criteria for designing gene-specific oligonucleotide probes, an oligonucleotide array was constructed that contained perfect match (PM) and mismatch (MM) probes (50mers and 70mers) based upon 4 genes. The effects of probe-target identity, continuous stretch, mismatch position, and hybridization free energy on specificity were examined. Little hybridization was observed at a probe-target identity of <85% for both 50mer and 70mer probes. 33 to 48% of the PM signal intensities were detected at a probe-target identity of 94% for 50mer oligonucleotides, and 43 to 55% for 70mer probes at a probe-target identity of 96%. When the effects of sequence identity and continuous stretch were considered independently, a stretch probe (>15 bases) contributed an additional 9% of the PM signal intensity compared to a non-stretch probe (< 15 bases) at the same identity level. Cross-hybridization increased as the length of continuous stretch increased. A 35-base stretch for 50mer probes or a 50-base stretch for 70mer probes had approximately 55% of the PM signal. Mismatches should be as close to the middle position of an oligonucleotide probe as possible to minimize cross-hybridization. Little cross-hybridization was observed for probes with a minimal binding free energy greater than -30 kcal/mol for 50mer probes or -40 kcal/mol for 70mer probes. Based on the

  5. Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray

    OpenAIRE

    Carter, Mark G.; Sharov, Alexei A; VanBuren, Vincent; Dudekula, Dawood B.; Carmack, Condie E; Nelson, Charlie; Ko, Minoru SH

    2005-01-01

    The ability to quantitatively measure the expression of all genes in a given tissue or cell with a single assay is an exciting promise of gene-expression profiling technology. An in situ-synthesized 60-mer oligonucleotide microarray designed to detect transcripts from all mouse genes was validated, as well as a set of exogenous RNA controls derived from the yeast genome (made freely available without restriction), which allow quantitative estimation of absolute endogenous transcript abundance.

  6. Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray

    Science.gov (United States)

    Carter, Mark G; Sharov, Alexei A; VanBuren, Vincent; Dudekula, Dawood B; Carmack, Condie E; Nelson, Charlie; Ko, Minoru SH

    2005-01-01

    The ability to quantitatively measure the expression of all genes in a given tissue or cell with a single assay is an exciting promise of gene-expression profiling technology. An in situ-synthesized 60-mer oligonucleotide microarray designed to detect transcripts from all mouse genes was validated, as well as a set of exogenous RNA controls derived from the yeast genome (made freely available without restriction), which allow quantitative estimation of absolute endogenous transcript abundance. PMID:15998450

  7. Comparative genomic analysis of Acidithiobacillus ferrooxidans strains using the A. ferrooxidans ATCC 23270 whole-genome oligonucleotide microarray.

    Science.gov (United States)

    Luo, Hailang; Shen, Li; Yin, Huaqun; Li, Qian; Chen, Qijiong; Luo, Yanjie; Liao, Liqin; Qiu, Guanzhou; Liu, Xueduan

    2009-05-01

    Acidithiobacillus ferrooxidans is an important microorganism used in biomining operations for metal recovery. Whole-genomic diversity analysis based on the oligonucleotide microarray was used to analyze the gene content of 12 strains of A. ferrooxidans purified from various mining areas in China. Among the 3100 open reading frames (ORFs) on the slides, 1235 ORFs were absent in at least 1 strain of bacteria and 1385 ORFs were conserved in all strains. The hybridization results showed that these strains were highly diverse from a genomic perspective. The hybridization results of 4 major functional gene categories, namely electron transport, carbon metabolism, extracellular polysaccharides, and detoxification, were analyzed. Based on the hybridization signals obtained, a phylogenetic tree was built to analyze the evolution of the 12 tested strains, which indicated that the geographic distribution was the main factor influencing the strain diversity of these strains. Based on the hybridization signals of genes associated with bioleaching, another phylogenetic tree showed an evolutionary relationship from which the co-relation between the clustering of specific genes and geochemistry could be observed. The results revealed that the main factor was geochemistry, among which the following 6 factors were the most important: pH, Mg, Cu, S, Fe, and Al.

  8. Optimized design and assessment of whole genome tiling arrays.

    NARCIS (Netherlands)

    Graf, S.; Nielsen, F.G.G.; Kurtz, S.; Huynen, M.A.; Birney, E.; Stunnenberg, H.G.; Flicek, P.

    2007-01-01

    MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling arra

  9. Whole Genome Epidemiological Typing of Escherichia coli

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer

    is in general expensive and to some extent unreliable. Next generation sequencing has quickly become a tool widely available and has enabled even smaller laboratories to do whole genome sequencing (WGS). Having the entire genome available provides the opportunity to create the ultimate typing method. This Ph...... validating each position analyzed and ignoring the positions that cannot be validated thereby creating a distance matrix that is used as input to an UPGMA method that creates the final phylogeny. The ND method was also implemented as a web server and published. If whole genome sequencing is to be used...... compared to the background strains, but only the SNP method was able to set one common threshold for outbreak isolates versus non-outbreak isolates for the entire dataset. Whole genome sequencing is a powerful but also a rather new tool. This PhD thesis has hopefully shed some light on how we can continue...

  10. Metabolic Adaptation after Whole Genome Duplication

    NARCIS (Netherlands)

    Hoek, M.J.A. van; Hogeweg, P.

    2009-01-01

    Whole genome duplications (WGDs) have been hypothesized to be responsible for major transitions in evolution. However, the effects of WGD and subsequent gene loss on cellular behavior and metabolism are still poorly understood. Here we develop a genome scale evolutionary model to study the dynamics

  11. Whole genome phylogenies for multiple Drosophila species

    Directory of Open Access Journals (Sweden)

    Seetharam Arun

    2012-12-01

    Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between

  12. Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Plant Ramona N

    2006-08-01

    Full Text Available Abstract Background Whole genome amplification is an increasingly common technique through which minute amounts of DNA can be multiplied to generate quantities suitable for genetic testing and analysis. Questions of amplification-induced error and template bias generated by these methods have previously been addressed through either small scale (SNPs or large scale (CGH array, FISH methodologies. Here we utilized whole genome sequencing to assess amplification-induced bias in both coding and non-coding regions of two bacterial genomes. Halobacterium species NRC-1 DNA and Campylobacter jejuni were amplified by several common, commercially available protocols: multiple displacement amplification, primer extension pre-amplification and degenerate oligonucleotide primed PCR. The amplification-induced bias of each method was assessed by sequencing both genomes in their entirety using the 454 Sequencing System technology and comparing the results with those obtained from unamplified controls. Results All amplification methodologies induced statistically significant bias relative to the unamplified control. For the Halobacterium species NRC-1 genome, assessed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 119 times greater than those from unamplified material, 164.0 times greater for Repli-G, 165.0 times greater for PEP-PCR and 252.0 times greater than the unamplified controls for DOP-PCR. For Campylobacter jejuni, also analyzed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 15 times greater than those from unamplified material, 19.8 times greater for Repli-G, 61.8 times greater for PEP-PCR and 220.5 times greater than the unamplified controls for DOP-PCR. Conclusion Of the amplification methodologies examined in this paper, the multiple displacement amplification products generated the least bias, and produced significantly higher yields of amplified DNA.

  13. Whole Genome Amplification from Blood Spot Samples.

    Science.gov (United States)

    Sørensen, Karina Meden

    2015-01-01

    Whole genome amplification is an invaluable technique when working with DNA extracted from blood spots, as the DNA obtained from this source often is too limited for extensive genetic analysis. Two techniques that amplify the entire genome are common. Here, both are described with focus on the benefits and drawbacks of each system. However, in order to obtain the best possible WGA result the quality of input DNA extracted from the blood spot is essential, but also time consumption, flexibility in format and elution volume and price of the technology are factors influencing system choice. Here, three DNA extraction techniques are described and the above aspects are compared between the systems.

  14. Whole genome sequencing analysis of Plasmodium vivax using whole genome capture

    Directory of Open Access Journals (Sweden)

    Bright A

    2012-06-01

    Full Text Available Abstract Background Malaria caused by Plasmodium vivax is an experimentally neglected severe disease with a substantial burden on human health. Because of technical limitations, little is known about the biology of this important human pathogen. Whole genome analysis methods on patient-derived material are thus likely to have a substantial impact on our understanding of P. vivax pathogenesis and epidemiology. For example, it will allow study of the evolution and population biology of the parasite, allow parasite transmission patterns to be characterized, and may facilitate the identification of new drug resistance genes. Because parasitemias are typically low and the parasite cannot be readily cultured, on-site leukocyte depletion of blood samples is typically needed to remove human DNA that may be 1000X more abundant than parasite DNA. These features have precluded the analysis of archived blood samples and require the presence of laboratories in close proximity to the collection of field samples for optimal pre-cryopreservation sample preparation. Results Here we show that in-solution hybridization capture can be used to extract P. vivax DNA from human contaminating DNA in the laboratory without the need for on-site leukocyte filtration. Using a whole genome capture method, we were able to enrich P. vivax DNA from bulk genomic DNA from less than 0.5% to a median of 55% (range 20%-80%. This level of enrichment allows for efficient analysis of the samples by whole genome sequencing and does not introduce any gross biases into the data. With this method, we obtained greater than 5X coverage across 93% of the P. vivax genome for four P. vivax strains from Iquitos, Peru, which is similar to our results using leukocyte filtration (greater than 5X coverage across 96% . Conclusion The whole genome capture technique will enable more efficient whole genome analysis of P. vivax from a larger geographic region and from valuable archived sample collections.

  15. Microbial species delineation using whole genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  16. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  17. Whole Genome Epidemiological Typing of Salmonella

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas

    . Technological advances and effective price in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Typing of Salmonella, especially sub-typing within the same serotype or even the same clone, the genetic variation of the target genes being...... used for typing is crucial for successful discrimination. The core genes or the genes that are conserved in all members of a genus or species are potentially good candidates for investigating genomic variation in phylogeny and epidemiology. A total of 2,882 core genes have been observed among 73...... evolution and remain useful as candidate genes for bacterial genome typing-even if they cannot be expected to differentiate highly clonal isolates e.g. outbreak cases of Salmonella [I]. To achieve successful ‘real-time’ monitoring and identification of outbreaks, rapid and reliable sub-typing is essential...

  18. Strategies and tools for whole genome alignments

    Energy Technology Data Exchange (ETDEWEB)

    Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas; Ishkhanov,Tigran; Ryaboy, Dmitriy; Rubin, Edward; Pachter, Lior; Dubchak, Inna

    2002-11-25

    The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With a view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.

  19. Vulgaris

    Directory of Open Access Journals (Sweden)

    Mukaddes Kavala

    2012-01-01

    Full Text Available Background. Thyroid disorders may affect all of the organ systems of the body and they are also highly associated with a wide variety of skin disorders. The aim of this study was to investigate the prevalence of thyroid function abnormalities and thyroid autoimmunity in patients with pemphigus vulgaris (PV and to determine the association between thyroid disorders and clinical involvement and systemic corticosteroid treatment in patients with PV. Methods. The study consisted of eighty patients with PV and eighty healthy individuals. Thyroid functions (fT3, fT4, and TSH and thyroid autoimmunity (anti-thyroid peroxidase (anti-TPO, and anti-thyroglobulin (anti-Tg antibodies were investigated in both groups. Primary thyroid disease (PTD was diagnosed with one or more of the following diagnostic criteria: (i positive antithyroid antibodies, (ii primary thyroid function abnormalities. Results. Significant changes in the serum thyroid profile were found in 16% (13/80 of the PV group and 5% (4/80 of the control group. Positive titers of antithyroid antibodies (anti-TPO and anti-Tg were observed in 7 patients (9% with PV and one in the control group (1,2%. Hashimoto thyroiditis was diagnosed in 9% of PV patients and it was found to be more prevalent in the mucosal form of PV. PTD was found in 13 of (%16 PV patients which was significantly high compared to controls. PTD was not found to be associated with systemic corticosteroid use. Free T3 levels were significantly lower in PV group compared to the control group and free T4 levels were significantly higher in PV group compared to the controls. Conclusions. PV may exist together with autoimmune thyroid diseases especially Hashimoto thyroiditis and primer thyroid diseases. Laboratory work-up for thyroid function tests and thyroid autoantibodies should be performed to determine underlying thyroid diseases in patients with PV.

  20. BSMAP: whole genome bisulfite sequence MAPping program

    Directory of Open Access Journals (Sweden)

    Li Wei

    2009-07-01

    Full Text Available Abstract Background Bisulfite sequencing is a powerful technique to study DNA cytosine methylation. Bisulfite treatment followed by PCR amplification specifically converts unmethylated cytosines to thymine. Coupled with next generation sequencing technology, it is able to detect the methylation status of every cytosine in the genome. However, mapping high-throughput bisulfite reads to the reference genome remains a great challenge due to the increased searching space, reduced complexity of bisulfite sequence, asymmetric cytosine to thymine alignments, and multiple CpG heterogeneous methylation. Results We developed an efficient bisulfite reads mapping algorithm BSMAP to address the above issues. BSMAP combines genome hashing and bitwise masking to achieve fast and accurate bisulfite mapping. Compared with existing bisulfite mapping approaches, BSMAP is faster, more sensitive and more flexible. Conclusion BSMAP is the first general-purpose bisulfite mapping software. It is able to map high-throughput bisulfite reads at whole genome level with feasible memory and CPU usage. It is freely available under GPL v3 license at http://code.google.com/p/bsmap/.

  1. Whole genome analysis of a Vietnamese trio

    Indian Academy of Sciences (India)

    Dang Thanh Hai; Nguyen Dai Thanh; Pham Thi Minh Trang; Le Si Quang; Phan Thi Thu Hang; Dang Cao Cuong; Hoang Kim Phuc; Nguyen Huu Duc; Do Duc Dong; Bui Quang Minh; Pham Bao Son; Le Sy Vinh

    2015-03-01

    We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome.We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥ 300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.

  2. Small Sample Whole-Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Hara, C A; Nguyen, C P; Wheeler, E K; Sorensen, K J; Arroyo, E S; Vrankovich, G P; Christian, A T

    2005-09-20

    Many challenges arise when trying to amplify and analyze human samples collected in the field due to limitations in sample quantity, and contamination of the starting material. Tests such as DNA fingerprinting and mitochondrial typing require a certain sample size and are carried out in large volume reactions; in cases where insufficient sample is present whole genome amplification (WGA) can be used. WGA allows very small quantities of DNA to be amplified in a way that enables subsequent DNA-based tests to be performed. A limiting step to WGA is sample preparation. To minimize the necessary sample size, we have developed two modifications of WGA: the first allows for an increase in amplified product from small, nanoscale, purified samples with the use of carrier DNA while the second is a single-step method for cleaning and amplifying samples all in one column. Conventional DNA cleanup involves binding the DNA to silica, washing away impurities, and then releasing the DNA for subsequent testing. We have eliminated losses associated with incomplete sample release, thereby decreasing the required amount of starting template for DNA testing. Both techniques address the limitations of sample size by providing ample copies of genomic samples. Carrier DNA, included in our WGA reactions, can be used when amplifying samples with the standard purification method, or can be used in conjunction with our single-step DNA purification technique to potentially further decrease the amount of starting sample necessary for future forensic DNA-based assays.

  3. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  4. Rapid whole genome sequencing and precision neonatology.

    Science.gov (United States)

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care.

  5. Comparative analysis of whole genome structure of Streptococcus suis using whole genome PCR scanning

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An outbreak associated with Streptococcus suis infection in humans emerged in Sichuan province, China in 2005. The outbreak is atypical for the apparent large number of human cases, high fatality rate and geographical spread. To determine whether the bacterium has changed, we compared both human and animal isolates from the Sichuan outbreak with those collected previously within China and in other countries using whole genome PCR scanning (WGPScaning) comparative sequencing of several known virulence factor genes and multilocus sequence typing (MLST) analysis. WGPScanning analysis showed that all primer pairs yielded PCR products of the expected sizes in all four strains tested. The nucleotide sequences of all the detected virulence factor genes are identical in the four strains and MLST results showed that the four isolates studied and reference strain all belonged to the ST1 complex. No new genetic changes were found in the genome structure of the isolates from this Sichuan outbreak.

  6. Comparative analysis of whole genome structure of Streptococcus suis using whole genome PCR scanning

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An outbreak associated with Streptococcus suis infection in humans emerged in Sichuan province, China in 2005. The outbreak is atypical for the apparent large number of human cases, high fatality rate and geographical spread. To determine whether the bacterium has changed, we compared both human and animal isolates from the Sichuan outbreak with those collected previously within China and in other countries using whole genome PCR scanning (WGPScaning) comparative sequencing of several known virulence factor genes and multilocus sequence typing (MLST) analysis. WGPScanning analysis showed that all primer pairs yielded PCR products of the expected sizes in all four strains tested. The nucleotide sequences of all the detected virulence factor genes are identical in the four strains and MLST results showed that the four isolates studied and reference strain all belonged to the ST1 com-plex. No new genetic changes were found in the genome structure of the isolates from this Sichuan outbreak.

  7. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  8. Whole-genome sequence-based analysis of thyroid function

    OpenAIRE

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 1...

  9. Whole-Genome Sequences of 26 Vibrio cholerae Isolates

    Science.gov (United States)

    Watve, Samit S.; Chande, Aroon T.; Rishishwar, Lavanya; Jordan, I. King

    2016-01-01

    The human pathogen Vibrio cholerae employs several adaptive mechanisms for environmental persistence, including natural transformation and type VI secretion, creating a reservoir for the spread of disease. Here, we report whole-genome sequences of 26 diverse V. cholerae isolates, significantly increasing the sequence diversity of publicly available V. cholerae genomes. PMID:28007852

  10. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.

    2014-08-14

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  11. Whole-Genome Sequencing of Two Bartonella bacilliformis Strains

    Science.gov (United States)

    Guillen, Yolanda; Casadellà, Maria; García-de-la-Guarda, Ruth; Espinoza-Culupú, Abraham; Paredes, Roger; Ruiz, Joaquim

    2016-01-01

    Bartonella bacilliformis is the causative agent of Carrion’s disease, a highly endemic human bartonellosis in Peru. We performed a whole-genome assembly of two B. bacilliformis strains isolated from the blood of infected patients in the acute phase of Carrion’s disease from the Cusco and Piura regions in Peru. PMID:27389274

  12. Whole Genome Analysis of Epidemiologically Closely Related Staphylococcus aureus Isolates

    NARCIS (Netherlands)

    M. Schijffelen (Maarten); S.R. Konstantinov (Sergey); G. Lina (Gérard); I. Spiliopoulou (Iris); E. van Duijkeren (Engeline); E.C. Brouwer (Ellen); A.C. Fluit (Ad)

    2013-01-01

    textabstractThe change of the bacteria from colonizers to pathogens is accompanied by a drastic change in expression profiles. These changes may be due to environmental signals or to mutational changes. We therefore compared the whole genome sequences of four sets of S. aureus isolates. Three sets

  13. Genomic prediction using QTL derived from whole genome sequence data

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    This study investigated the gain in accuracy of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k SNP data. Analyses were performed for Nordic Holstein and Danish Jersey animals, using eithe...

  14. Whole genome amplification - Review of applications and advances

    Energy Technology Data Exchange (ETDEWEB)

    Hawkins, Trevor L.; Detter, J.C.; Richardson, Paul

    2001-11-15

    The concept of Whole Genome Amplification is something that has arisen in the past few years as modifications to the polymerase chain reaction (PCR) have been adapted to replicate regions of genomes which are of biological interest. The applications here are many--forensics, embryonic disease diagnosis, bio terrorism genome detection, ''imoralization'' of clinical samples, microbial diversity, and genotyping. The key question is if DNA can be replicated a genome at a time without bias or non random distribution of the target. Several papers published in the last year and currently in preparation may lead to the conclusion that whole genome amplification may indeed be possible and therefore open up a new avenue to molecular biology.

  15. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  16. Whole genome and transcriptome sequencing of a B3 thymoma.

    Directory of Open Access Journals (Sweden)

    Iacopo Petrini

    Full Text Available Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37. Copy number (CN aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs and 2 insertion/deletions (INDELs were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma.

  17. Whole genome sequencing of clinical isolates of Giardia lamblia.

    Science.gov (United States)

    Hanevik, K; Bakken, R; Brattbakk, H R; Saghaug, C S; Langeland, N

    2015-02-01

    Clinical isolates from protozoan parasites such as Giardia lamblia are at present practically impossible to culture. By using simple cyst purification methods, we show that Giardia whole genome sequencing of clinical stool samples is possible. Immunomagnetic separation after sucrose gradient flotation gave superior results compared to sucrose gradient flotation alone. The method enables detailed analysis of a wide range of genes of interest for genotyping, virulence and drug resistance.

  18. Whole genome amplification of DNA for genotyping pharmacogenetics candidate genes.

    Directory of Open Access Journals (Sweden)

    Santosh ePhilips

    2012-03-01

    Full Text Available Whole genome amplification (WGA technologies can be used to amplify genomic DNA when only small amounts of DNA are available. The Multiple Displacement Amplification Phi polymerase based amplification has been shown to accurately amplify DNA for a variety of genotyping assays; however, it has not been tested for genotyping many of the clinically relevant genes important for pharmacogenetic studies, such as the cytochrome P450 genes, that are typically difficult to genotype due to multiple pseudogenes, copy number variations, and high similarity to other related genes. We evaluated whole genome amplified samples for Taqman™ genotyping of SNPs in a variety of pharmacogenetic genes. In 24 DNA samples from the Coriell human diversity panel, the call rates and concordance between amplified (~200-fold amplification and unamplified samples was 100% for two SNPs in CYP2D6 and one in ESR1. In samples from a breast cancer clinical trial (Trial 1, we compared the genotyping results in samples before and after WGA for four SNPs in CYP2D6, one SNP in CYP2C19, one SNP in CYP19A1, two SNPs in ESR1, and two SNPs in ESR2. The concordance rates were all >97%. Finally, we compared the allele frequencies of 143 SNPs determined in Trial 1 (whole genome amplified DNA to the allele frequencies determined in unamplified DNA samples from a separate trial (Trial 2 that enrolled a similar population. The call rates and allele frequencies between the two trials were 98% and 99.7%, respectively. We conclude that the whole genome amplified DNA is suitable for Taqman™ genotyping for a wide variety of pharmacogenetically relevant SNPs.

  19. Whole-Genome Sequence Assembly for Mammalian Genomes: Arachne 2

    OpenAIRE

    Jaffe, David B.; Butler, Jonathan; Gnerre, Sante; Mauceli, Evan; Lindblad-Toh, Kerstin; Jill P. Mesirov; Michael C Zody; Lander, Eric S.

    2003-01-01

    We previously described the whole-genome assembly program Arachne, presenting assemblies of simulated data for small to mid-sized genomes. Here we describe algorithmic adaptations to the program, allowing for assembly of mammalian-size genomes, and also improving the assembly of smaller genomes. Three principal changes were simultaneously made and applied to the assembly of the mouse genome, during a six-month period of development: (1) Supercontigs (scaffolds) were iteratively broken and rej...

  20. Whole-genome molecular haplotyping of single cells

    OpenAIRE

    Fan, H. Christina; Wang, Jianbin; Potanina, Anastasia; Quake, Stephen R

    2010-01-01

    Conventional experimental methods of studying the human genome are limited by the inability to independently study the combination of alleles, or haplotype, on each of the homologous copies of the chromosomes. We developed a microfluidic device capable of separating and amplifying homologous copies of each chromosome from a single human metaphase cell. Single-nucleotide polymorphism (SNP) array analysis of amplified DNA enabled us to achieve completely deterministic, whole-genome, personal ha...

  1. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Directory of Open Access Journals (Sweden)

    David Koslicki

    Full Text Available With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  2. Whole-genome shotgun optical mapping of Rhodospirillum rubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, S. [Univ. Wisc.-Madison; Zhou, S. [Univ. Wisc.-Madison; Place, M. [Univ. Wisc.-Madison; Zhang, Y. [Univ. Wisc.-Madison; Briska, A. [Univ. Wisc.-Madison; Goldstein, S. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Forrest, D. [Univ. Wisc.-Madison; Lim, A. [Univ. Wisc.-Madison; Lapidus, A. [Univ. Wisc.-Madison; Han, C. S. [Univ. Wisc.-Madison; Roberts, G. P. [Univ. Wisc.-Madison; Schwartz, D. C. [Univ. Wisc.-Madison

    2005-09-01

    Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.

  3. Whole-genome shotgun optical mapping of rhodospirillumrubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, Susan; Zhou, Shiguo; Place, Mike; Zhang, Yaoping; Briska, Adam; Goldstein, Steve; Churas, Chris; Runnheim, Rod; Forrest,Dan; Lim, Alex; Lapidus, Alla; Han, Cliff S.; Roberts, Gary P.; Schwartz,David C.

    2004-07-01

    Rhodospirillum rubrum is a phototrophic purple non-sulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems, and as a source of hydrogen and biodegradable plastics production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction maps (Xba I, Nhe I, and Hind III) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction maps from randomly sheared genomic DNA molecules extracted directly from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the Hind III map acted as a scaffold for high resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and validation of genome sequence, our work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a ''molecular cytogenetics'' approach to solving problems in genomic analysis.

  4. Whole genome microarray analysis, from neonatal blood cards

    Directory of Open Access Journals (Sweden)

    Hogan Michael E

    2009-07-01

    Full Text Available Abstract Background Neonatal blood, obtained from a heel stick and stored dry on paper cards, has been the standard for birth defects screening for 50 years. Such dried blood samples are used, primarily, for analysis of small-molecule analytes. More recently, the DNA complement of such dried blood cards has been used for targeted genetic testing, such as for single nucleotide polymorphism in cystic fibrosis. Expansion of such testing to include polygenic traits, and perhaps whole genome scanning, has been discussed as a formal possibility. However, until now the amount of DNA that might be obtained from such dried blood cards has been limiting, due to inefficient DNA recovery technology. Results A new technology is employed for efficient DNA release from a standard neonatal blood card. Using standard Guthrie cards, stored an average of ten years post-collection, about 1/40th of the air-dried neonatal blood specimen (two 3 mm punches was processed to obtain DNA that was sufficient in mass and quality for direct use in microarray-based whole genome scanning. Using that same DNA release technology, it is also shown that approximately 1/250th of the original purified DNA (about 1 ng could be subjected to whole genome amplification, thus yielding an additional microgram of amplified DNA product. That amplified DNA product was then used in microarray analysis and yielded statistical concordance of 99% or greater to the primary, unamplified DNA sample. Conclusion Together, these data suggest that DNA obtained from less than 10% of a standard neonatal blood specimen, stored dry for several years on a Guthrie card, can support a program of genome-wide neonatal genetic testing.

  5. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  6. The potential of whole genome NGS for infectious disease diagnosis.

    Science.gov (United States)

    Lecuit, Marc; Eloit, Marc

    2015-01-01

    Non-targeted identification of microbes is now possible directly in biological samples, based on whole-genome-NGS (WG-NGS) techniques that allow deep sequencing of nucleic acids, data mining and sorting out of sequences of pathogens without any a priori hypothesis. WG-NGS was first only used as a research tool due to its cost, complexity and lack of standardization. Recent improvements in sample preparation and bioinformatics pipelines and decrease in cost now allow actionable diagnostics in patients. The potency and limits of WG-NGS and possible future indications are discussed here. WG-NGS will likely soon become a standard procedure in microbiological diagnosis.

  7. Whole genome comparison of donor and cloned dogs.

    Science.gov (United States)

    Kim, Hak-Min; Cho, Yun Sung; Kim, Hyunmin; Jho, Sungwoong; Son, Bongjun; Choi, Joung Yoon; Kim, Sangsoo; Lee, Byeong Chun; Bhak, Jong; Jang, Goo

    2013-10-21

    Cloning is a process that produces genetically identical organisms. However, the genomic degree of genetic resemblance in clones needs to be determined. In this report, the genomes of a cloned dog and its donor were compared. Compared with a human monozygotic twin, the genome of the cloned dog showed little difference from the genome of the nuclear donor dog in terms of single nucleotide variations, chromosomal instability, and telomere lengths. These findings suggest that cloning by somatic cell nuclear transfer produced an almost identical genome. The whole genome sequence data of donor and cloned dogs can provide a resource for further investigations on epigenetic contributions in phenotypic differences.

  8. Improving pan-genome annotation using whole genome multiple alignment

    Directory of Open Access Journals (Sweden)

    Salzberg Steven L

    2011-06-01

    Full Text Available Abstract Background Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. Results We introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review. Conclusions Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.

  9. Whole genome amplification and its impact on CGH array profiles

    Directory of Open Access Journals (Sweden)

    Meldrum Cliff

    2008-07-01

    Full Text Available Abstract Background Some array comparative genomic hybridisation (array CGH platforms require a minimum of micrograms of DNA for the generation of reliable and reproducible data. For studies where there are limited amounts of genetic material, whole genome amplification (WGA is an attractive method for generating sufficient quantities of genomic material from miniscule amounts of starting material. A range of WGA methods are available and the multiple displacement amplification (MDA approach has been shown to be highly accurate, although amplification bias has been reported. In the current study, WGA was used to amplify DNA extracted from whole blood. In total, six array CGH experiments were performed to investigate whether the use of whole genome amplified DNA (wgaDNA produces reliable and reproducible results. Four experiments were conducted on amplified DNA compared to unamplified DNA and two experiments on unamplified DNA compared to unamplified DNA. Findings All the experiments involving wgaDNA resulted in a high proportion of losses and gains of genomic material. Previously, amplification bias has been overcome by using amplified DNA in both the test and reference DNA. Our data suggests that this approach may not be effective, as the gains and losses introduced by WGA appears to be random and are not reproducible between different experiments using the same DNA. Conclusion In light of these findings, the use of both amplified test and reference DNA on CGH arrays may not provide an accurate representation of copy number variation in the DNA.

  10. Deep whole-genome sequencing of 100 southeast Asian Malays.

    Science.gov (United States)

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

  11. Whole genome sequence-based serogrouping of Listeria monocytogenes isolates.

    Science.gov (United States)

    Hyden, Patrick; Pietzka, Ariane; Lennkh, Anna; Murer, Andrea; Springer, Burkhard; Blaschitz, Marion; Indra, Alexander; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner; Sensen, Christoph W

    2016-10-10

    Whole genome sequencing (WGS) is currently becoming the method of choice for characterization of Listeria monocytogenes isolates in national reference laboratories (NRLs). WGS is superior with regards to accuracy, resolution and analysis speed in comparison to several other methods including serotyping, PCR, pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable number tandem repeat analysis (MLVA), and multivirulence-locus sequence typing (MVLST), which have been used thus far for the characterization of bacterial isolates (and are still important tools in reference laboratories today) to control and prevent listeriosis, one of the major sources of foodborne diseases for humans. Backward compatibility of WGS to former methods can be maintained by extraction of the respective information from WGS data. Serotyping was the first subtyping method for L. monocytogenes capable of differentiating 12 serovars and national reference laboratories still perform serotyping and PCR-based serogrouping as a first level classification method for Listeria monocytogenes surveillance. Whole genome sequence based core genome MLST analysis of a L. monocytogenes collection comprising 172 isolates spanning all 12 serotypes was performed for serogroup determination. These isolates clustered according to their serotypes and it was possible to group them either into the IIa, IIc, IVb or IIb clusters, respectively, which were generated by minimum spanning tree (MST) and neighbor joining (NJ) tree data analysis, demonstrating the power of the new approach. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  12. Whole-genome landscapes of major melanoma subtypes.

    Science.gov (United States)

    Hayward, Nicholas K; Wilmott, James S; Waddell, Nicola; Johansson, Peter A; Field, Matthew A; Nones, Katia; Patch, Ann-Marie; Kakavand, Hojabr; Alexandrov, Ludmil B; Burke, Hazel; Jakrot, Valerie; Kazakoff, Stephen; Holmes, Oliver; Leonard, Conrad; Sabarinathan, Radhakrishnan; Mularoni, Loris; Wood, Scott; Xu, Qinying; Waddell, Nick; Tembe, Varsha; Pupo, Gulietta M; De Paoli-Iseppi, Ricardo; Vilain, Ricardo E; Shang, Ping; Lau, Loretta M S; Dagg, Rebecca A; Schramm, Sarah-Jane; Pritchard, Antonia; Dutton-Regester, Ken; Newell, Felicity; Fitzgerald, Anna; Shang, Catherine A; Grimmond, Sean M; Pickett, Hilda A; Yang, Jean Y; Stretch, Jonathan R; Behren, Andreas; Kefford, Richard F; Hersey, Peter; Long, Georgina V; Cebon, Jonathan; Shackleton, Mark; Spillane, Andrew J; Saw, Robyn P M; López-Bigas, Núria; Pearson, John V; Thompson, John F; Scolyer, Richard A; Mann, Graham J

    2017-05-11

    Melanoma of the skin is a common cancer only in Europeans, whereas it arises in internal body surfaces (mucosal sites) and on the hands and feet (acral sites) in people throughout the world. Here we report analysis of whole-genome sequences from cutaneous, acral and mucosal subtypes of melanoma. The heavily mutated landscape of coding and non-coding mutations in cutaneous melanoma resolved novel signatures of mutagenesis attributable to ultraviolet radiation. However, acral and mucosal melanomas were dominated by structural changes and mutation signatures of unknown aetiology, not previously identified in melanoma. The number of genes affected by recurrent mutations disrupting non-coding sequences was similar to that affected by recurrent mutations to coding sequences. Significantly mutated genes included BRAF, CDKN2A, NRAS and TP53 in cutaneous melanoma, BRAF, NRAS and NF1 in acral melanoma and SF3B1 in mucosal melanoma. Mutations affecting the TERT promoter were the most frequent of all; however, neither they nor ATRX mutations, which correlate with alternative telomere lengthening, were associated with greater telomere length. Most melanomas had potentially actionable mutations, most in components of the mitogen-activated protein kinase and phosphoinositol kinase pathways. The whole-genome mutation landscape of melanoma reveals diverse carcinogenic processes across its subtypes, some unrelated to sun exposure, and extends potential involvement of the non-coding genome in its pathogenesis.

  13. Whole genome sequencing: an efficient approach to ensuring food safety

    Science.gov (United States)

    Lakicevic, B.; Nastasijevic, I.; Dimitrijevic, M.

    2017-09-01

    Whole genome sequencing is an effective, powerful tool that can be applied to a wide range of public health and food safety applications. A major difference between WGS and the traditional typing techniques is that WGS allows all genes to be included in the analysis, instead of a well-defined subset of genes or variable intergenic regions. Also, the use of WGS can facilitate the understanding of contamination/colonization routes of foodborne pathogens within the food production environment, and can also afford efficient tracking of pathogens’ entry routes and distribution from farm-to-consumer. Tracking foodborne pathogens in the food processing-distribution-retail-consumer continuum is of the utmost importance for facilitation of outbreak investigations and rapid action in controlling/preventing foodborne outbreaks. Therefore, WGS likely will replace most of the numerous workflows used in public health laboratories to characterize foodborne pathogens into one consolidated, efficient workflow.

  14. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  15. A whole genome RNAi screen identifies replication stress response genes.

    Science.gov (United States)

    Kavanaugh, Gina; Ye, Fei; Mohni, Kareem N; Luzwick, Jessica W; Glick, Gloria; Cortez, David

    2015-11-01

    Proper DNA replication is critical to maintain genome stability. When the DNA replication machinery encounters obstacles to replication, replication forks stall and the replication stress response is activated. This response includes activation of cell cycle checkpoints, stabilization of the replication fork, and DNA damage repair and tolerance mechanisms. Defects in the replication stress response can result in alterations to the DNA sequence causing changes in protein function and expression, ultimately leading to disease states such as cancer. To identify additional genes that control the replication stress response, we performed a three-parameter, high content, whole genome siRNA screen measuring DNA replication before and after a challenge with replication stress as well as a marker of checkpoint kinase signalling. We identified over 200 replication stress response genes and subsequently analyzed how they influence cellular viability in response to replication stress. These data will serve as a useful resource for understanding the replication stress response.

  16. Nitrogen regulation in Sinorhizobium meliloti probed with whole genome arrays.

    Science.gov (United States)

    Davalos, Marcela; Fourment, Joëlle; Lucas, Antoine; Bergès, Hélène; Kahn, Daniel

    2004-12-01

    Using whole genome arrays, we systematically investigated nitrogen regulation in the plant symbiotic bacterium Sinorhizobium meliloti. The use of glutamate instead of ammonium as a nitrogen source induced nitrogen catabolic genes independently of the carbon source, including two glutamine synthetase genes, various aminoacid transporters and the glnKamtB operon. These responses depended on both the ntrC and glnB nitrogen regulators. Glutamate repressible genes included glutamate synthase and a H+-translocating pyrophosphate synthase. The smc01041-ntrBC operon was negatively autoregulated in a glnB-dependent fashion, indicating an involvement of phosphorylated NtrC. In addition to the nitrogen response, glutamate remodelled expression of carbon metabolism by inhibiting expression of the Entner-Doudoroff and pentose phosphate pathways, and by stimulating gluconeogenetic genes independently of ntrC.

  17. Origin of the Yeast Whole-Genome Duplication.

    Directory of Open Access Journals (Sweden)

    Kenneth H Wolfe

    2015-08-01

    Full Text Available Whole-genome duplications (WGDs are rare evolutionary events with profound consequences. They double an organism's genetic content, immediately creating a reproductive barrier between it and its ancestors and providing raw material for the divergence of gene functions between paralogs. Almost all eukaryotic genome sequences bear evidence of ancient WGDs, but the causes of these events and the timing of intermediate steps have been difficult to discern. One of the best-characterized WGDs occurred in the lineage leading to the baker's yeast Saccharomyces cerevisiae. Marcet-Houben and Gabaldón now show that, rather than simply doubling the DNA of a single ancestor, the yeast WGD likely involved mating between two different ancestral species followed by a doubling of the genome to restore fertility.

  18. Whole-genome characterization of chemoresistant ovarian cancer.

    Science.gov (United States)

    Patch, Ann-Marie; Christie, Elizabeth L; Etemadmoghadam, Dariush; Garsed, Dale W; George, Joshy; Fereday, Sian; Nones, Katia; Cowin, Prue; Alsop, Kathryn; Bailey, Peter J; Kassahn, Karin S; Newell, Felicity; Quinn, Michael C J; Kazakoff, Stephen; Quek, Kelly; Wilhelm-Benartzi, Charlotte; Curry, Ed; Leong, Huei San; Hamilton, Anne; Mileshkin, Linda; Au-Yeung, George; Kennedy, Catherine; Hung, Jillian; Chiew, Yoke-Eng; Harnett, Paul; Friedlander, Michael; Quinn, Michael; Pyman, Jan; Cordner, Stephen; O'Brien, Patricia; Leditschke, Jodie; Young, Greg; Strachan, Kate; Waring, Paul; Azar, Walid; Mitchell, Chris; Traficante, Nadia; Hendley, Joy; Thorne, Heather; Shackleton, Mark; Miller, David K; Arnau, Gisela Mir; Tothill, Richard W; Holloway, Timothy P; Semple, Timothy; Harliwong, Ivon; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Idrisoglu, Senel; Bruxner, Timothy J C; Christ, Angelika N; Poudel, Barsha; Holmes, Oliver; Anderson, Matthew; Leonard, Conrad; Lonie, Andrew; Hall, Nathan; Wood, Scott; Taylor, Darrin F; Xu, Qinying; Fink, J Lynn; Waddell, Nick; Drapkin, Ronny; Stronach, Euan; Gabra, Hani; Brown, Robert; Jewell, Andrea; Nagaraj, Shivashankar H; Markham, Emma; Wilson, Peter J; Ellul, Jason; McNally, Orla; Doyle, Maria A; Vedururu, Ravikiran; Stewart, Collin; Lengyel, Ernst; Pearson, John V; Waddell, Nicola; deFazio, Anna; Grimmond, Sean M; Bowtell, David D L

    2015-05-28

    Patients with high-grade serous ovarian cancer (HGSC) have experienced little improvement in overall survival, and standard treatment has not advanced beyond platinum-based combination chemotherapy, during the past 30 years. To understand the drivers of clinical phenotypes better, here we use whole-genome sequencing of tumour and germline DNA samples from 92 patients with primary refractory, resistant, sensitive and matched acquired resistant disease. We show that gene breakage commonly inactivates the tumour suppressors RB1, NF1, RAD51B and PTEN in HGSC, and contributes to acquired chemotherapy resistance. CCNE1 amplification was common in primary resistant and refractory disease. We observed several molecular events associated with acquired resistance, including multiple independent reversions of germline BRCA1 or BRCA2 mutations in individual patients, loss of BRCA1 promoter methylation, an alteration in molecular subtype, and recurrent promoter fusion associated with overexpression of the drug efflux pump MDR1.

  19. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  20. Whole genome sequence analysis of the TALLYHO/Jng mouse.

    Science.gov (United States)

    Denvir, James; Boskovic, Goran; Fan, Jun; Primerano, Donald A; Parkman, Jacaline K; Kim, Jung Han

    2016-11-11

    The TALLYHO/Jng (TH) mouse is a polygenic model for obesity and type 2 diabetes first described in the literature in 2001. The origin of the TH strain is an outbred colony of the Theiler Original strain and mice derived from this source were selectively bred for male hyperglycemia establishing an inbred strain at The Jackson Laboratory. TH mice manifest many of the disease phenotypes observed in human obesity and type 2 diabetes. We sequenced the whole genome of TH mice maintained at Marshall University to a depth of approximately 64.8X coverage using data from three next generation sequencing runs. Genome-wide, we found approximately 4.31 million homozygous single nucleotide polymorphisms (SNPs) and 1.10 million homozygous small insertions and deletions (indels) of which 98,899 SNPs and 163,720 indels were unique to the TH strain compared to 28 previously sequenced inbred mouse strains. In order to identify potentially clinically-relevant genes, we intersected our list of SNP and indel variants with human orthologous genes in which variants were associated in GWAS studies with obesity, diabetes, and metabolic syndrome, and with genes previously shown to confer a monogenic obesity phenotype in humans, and found several candidate variants that could be functionally tested using TH mice. Further, we filtered our list of variants to those occurring in an obesity quantitative trait locus, tabw2, identified in TH mice and found a missense polymorphism in the Cidec gene and characterized this variant's effect on protein function. We generated a complete catalog of variants in TH mice using the data from whole genome sequencing. Our findings will facilitate the identification of causal variants that underlie metabolic diseases in TH mice and will enable identification of candidate susceptibility genes for complex human obesity and type 2 diabetes.

  1. Genomic V exons from whole genome shotgun data in reptiles.

    Science.gov (United States)

    Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

    2014-08-01

    Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).

  2. Review:Whole genome amplification in preimplantation genetic diagnosis

    Institute of Scientific and Technical Information of China (English)

    Ying-ming ZHENG; Ning WANG; Lei LI; Fan JIN

    2011-01-01

    Preimplantation genetic diagnosis(PGD)refers to a procedure for genetically analyzing embryos prior to implantation,improving the chance of conception for patients at high risk of transmitting specific inherited disorders.This method has been widely used for a large number of genetic disorders since the first successful application in the early 1990s.Polymerase chain reaction(PCR)and fluorescent in situ hybridization(FISH)are the two main methods in PGD,but there are some inevitable shortcomings limiting the scope of genetic diagnosis.Fortunately,different whole genome amplification(WGA)techniques have been developed to overcome these problems.Sufficient DNA can be amplified and multiple tasks which need abundant DNA can be performed.Moreover,WGA products can be analyzed as a template for multi-loci and multi-gene during the subsequent DNA analysis.In this review,we will focus on the currently available WGA techniques and their applications,as well as the new technical trends from WGA products.

  3. Information recovery from low coverage whole-genome bisulfite sequencing.

    Science.gov (United States)

    Libertini, Emanuele; Heath, Simon C; Hamoudi, Rifat A; Gut, Marta; Ziller, Michael J; Czyz, Agata; Ruotti, Victor; Stunnenberg, Hendrik G; Frontini, Mattia; Ouwehand, Willem H; Meissner, Alexander; Gut, Ivo G; Beck, Stephan

    2016-06-27

    The cost of whole-genome bisulfite sequencing (WGBS) remains a bottleneck for many studies and it is therefore imperative to extract as much information as possible from a given dataset. This is particularly important because even at the recommend 30X coverage for reference methylomes, up to 50% of high-resolution features such as differentially methylated positions (DMPs) cannot be called with current methods as determined by saturation analysis. To address this limitation, we have developed a tool that dynamically segments WGBS methylomes into blocks of comethylation (COMETs) from which lost information can be recovered in the form of differentially methylated COMETs (DMCs). Using this tool, we demonstrate recovery of ∼30% of the lost DMP information content as DMCs even at very low (5X) coverage. This constitutes twice the amount that can be recovered using an existing method based on differentially methylated regions (DMRs). In addition, we explored the relationship between COMETs and haplotypes in lymphoblastoid cell lines of African and European origin. Using best fit analysis, we show COMETs to be correlated in a population-specific manner, suggesting that this type of dynamic segmentation may be useful for integrated (epi)genome-wide association studies in the future.

  4. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  5. Whole genome sequencing of Chinese clearhead icefish, Protosalanx hyalocranius.

    Science.gov (United States)

    Liu, Kai; Xu, Dongpo; Li, Jia; Bian, Chao; Duan, Jinrong; Zhou, Yanfeng; Zhang, Minying; You, Xinxin; You, Yang; Chen, Jieming; Yu, Hui; Xu, Gangchun; Fang, Di-An; Qiang, Jun; Jiang, Shulun; He, Jie; Xu, Junmin; Shi, Qiong; Zhang, Zhiyong; Xu, Pao

    2017-04-01

    Chinese clearhead icefish, Protosalanx hyalocranius , is a representative icefish species with economic importance and special appearance. Due to its great economic value in China, the fish was introduced into Lake Dianchi and several other lakes from the Lake Taihu half a century ago. Similar to the Sinocyclocheilus cavefish, the clearhead icefish has certain cavefish-like traits, such as transparent body and nearly scaleless skin. Here, we provide the whole genome sequence of this surface-dwelling fish and generated a draft genome assembly, aiming at exploring molecular mechanisms for the biological interests. A total of 252.1 Gb of raw reads were sequenced. Subsequently, a novel draft genome assembly was generated, with the scaffold N50 reaching 1.163 Mb. The genome completeness was estimated to be 98.39 % by using the CEGMA evaluation. Finally, we annotated 19 884 protein-coding genes and observed that repeat sequences account for 24.43 % of the genome assembly. We report the first draft genome of the Chinese clearhead icefish. The genome assembly will provide a solid foundation for further molecular breeding and germplasm resource protection in Chinese clearhead icefish, as well as other icefishes. It is also a valuable genetic resource for revealing the molecular mechanisms for the cavefish-like characters.

  6. Whole-genome landscape of pancreatic neuroendocrine tumours.

    Science.gov (United States)

    Scarpa, Aldo; Chang, David K; Nones, Katia; Corbo, Vincenzo; Patch, Ann-Marie; Bailey, Peter; Lawlor, Rita T; Johns, Amber L; Miller, David K; Mafficini, Andrea; Rusev, Borislav; Scardoni, Maria; Antonello, Davide; Barbi, Stefano; Sikora, Katarzyna O; Cingarlini, Sara; Vicentini, Caterina; McKay, Skye; Quinn, Michael C J; Bruxner, Timothy J C; Christ, Angelika N; Harliwong, Ivon; Idrisoglu, Senel; McLean, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wilson, Peter J; Anderson, Matthew J; Fink, J Lynn; Newell, Felicity; Waddell, Nick; Holmes, Oliver; Kazakoff, Stephen H; Leonard, Conrad; Wood, Scott; Xu, Qinying; Nagaraj, Shivashankar Hiriyur; Amato, Eliana; Dalai, Irene; Bersani, Samantha; Cataldo, Ivana; Dei Tos, Angelo P; Capelli, Paola; Davì, Maria Vittoria; Landoni, Luca; Malpaga, Anna; Miotto, Marco; Whitehall, Vicki L J; Leggett, Barbara A; Harris, Janelle L; Harris, Jonathan; Jones, Marc D; Humphris, Jeremy; Chantrill, Lorraine A; Chin, Venessa; Nagrial, Adnan M; Pajic, Marina; Scarlett, Christopher J; Pinho, Andreia; Rooman, Ilse; Toon, Christopher; Wu, Jianmin; Pinese, Mark; Cowley, Mark; Barbour, Andrew; Mawson, Amanda; Humphrey, Emily S; Colvin, Emily K; Chou, Angela; Lovell, Jessica A; Jamieson, Nigel B; Duthie, Fraser; Gingras, Marie-Claude; Fisher, William E; Dagg, Rebecca A; Lau, Loretta M S; Lee, Michael; Pickett, Hilda A; Reddel, Roger R; Samra, Jaswinder S; Kench, James G; Merrett, Neil D; Epari, Krishna; Nguyen, Nam Q; Zeps, Nikolajs; Falconi, Massimo; Simbolo, Michele; Butturini, Giovanni; Van Buren, George; Partelli, Stefano; Fassan, Matteo; Khanna, Kum Kum; Gill, Anthony J; Wheeler, David A; Gibbs, Richard A; Musgrove, Elizabeth A; Bassi, Claudio; Tortora, Giampaolo; Pederzoli, Paolo; Pearson, John V; Waddell, Nicola; Biankin, Andrew V; Grimmond, Sean M

    2017-03-02

    The diagnosis of pancreatic neuroendocrine tumours (PanNETs) is increasing owing to more sensitive detection methods, and this increase is creating challenges for clinical management. We performed whole-genome sequencing of 102 primary PanNETs and defined the genomic events that characterize their pathogenesis. Here we describe the mutational signatures they harbour, including a deficiency in G:C > T:A base excision repair due to inactivation of MUTYH, which encodes a DNA glycosylase. Clinically sporadic PanNETs contain a larger-than-expected proportion of germline mutations, including previously unreported mutations in the DNA repair genes MUTYH, CHEK2 and BRCA2. Together with mutations in MEN1 and VHL, these mutations occur in 17% of patients. Somatic mutations, including point mutations and gene fusions, were commonly found in genes involved in four main pathways: chromatin remodelling, DNA damage repair, activation of mTOR signalling (including previously undescribed EWSR1 gene fusions), and telomere maintenance. In addition, our gene expression analyses identified a subgroup of tumours associated with hypoxia and HIF signalling.

  7. Current Developments in Prokaryotic Single Cell Whole Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Goudeau, Danielle; Nath, Nandita; Ciobanu, Doina; Cheng, Jan-Fang; Malmstrom, Rex

    2014-03-14

    Our approach to prokaryotic single-cell Whole Genome Amplification at the JGI continues to evolve. To increase both the quality and number of single-cell genomes produced, we explore all aspects of the process from cell sorting to sequencing. For example, we now utilize specialized reagents, acoustic liquid handling, and reduced reaction volumes eliminate non-target DNA contamination in WGA reactions. More specifically, we use a cleaner commercial WGA kit from Qiagen that employs a UV decontamination procedure initially developed at the JGI, and we use the Labcyte Echo for tip-less liquid transfer to set up 2uL reactions. Acoustic liquid handling also dramatically reduces reagent costs. In addition, we are exploring new cell lysis methods including treatment with Proteinase K, lysozyme, and other detergents, in order to complement standard alkaline lysis and allow for more efficient disruption of a wider range of cells. Incomplete lysis represents a major hurdle for WGA on some environmental samples, especially rhizosphere, peatland, and other soils. Finding effective lysis strategies that are also compatible with WGA is challenging, and we are currently assessing the impact of various strategies on genome recovery.

  8. Whole genomes redefine the mutational landscape of pancreatic cancer

    Science.gov (United States)

    Waddell, Nicola; Pajic, Marina; Patch, Ann-Marie; Chang, David K.; Kassahn, Karin S.; Bailey, Peter; Johns, Amber L.; Miller, David; Nones, Katia; Quek, Kelly; Quinn, Michael C. J.; Robertson, Alan J.; Fadlullah, Muhammad Z. H.; Bruxner, Tim J. C.; Christ, Angelika N.; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Wilson, Peter J; Markham, Emma; Cloonan, Nicole; Anderson, Matthew J.; Fink, J. Lynn; Holmes, Oliver; Kazakoff, Stephen H.; Leonard, Conrad; Newell, Felicity; Poudel, Barsha; Song, Sarah; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Wu, Jianmin; Pinese, Mark; Cowley, Mark J.; Lee, Hong C.; Jones, Marc D.; Nagrial, Adnan M.; Humphris, Jeremy; Chantrill, Lorraine A.; Chin, Venessa; Steinmann, Angela M.; Mawson, Amanda; Humphrey, Emily S.; Colvin, Emily K.; Chou, Angela; Scarlett, Christopher J.; Pinho, Andreia V.; Giry-Laterriere, Marc; Rooman, Ilse; Samra, Jaswinder S.; Kench, James G.; Pettitt, Jessica A.; Merrett, Neil D.; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q.; Barbour, Andrew; Zeps, Nikolajs; Jamieson, Nigel B.; Graham, Janet S.; Niclou, Simone P.; Bjerkvig, Rolf; Grützmann, Robert; Aust, Daniela; Hruban, Ralph H.; Maitra, Anirban; Iacobuzio-Donahue, Christine A.; Wolfgang, Christopher L.; Morgan, Richard A.; Lawlor, Rita T.; Corbo, Vincenzo; Bassi, Claudio; Falconi, Massimo; Zamboni, Giuseppe; Tortora, Giampaolo; Tempero, Margaret A.; Gill, Anthony J.; Eshleman, James R.; Pilarsky, Christian; Scarpa, Aldo; Musgrove, Elizabeth A.; Pearson, John V.; Biankin, Andrew V.; Grimmond, Sean M.

    2015-01-01

    Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded. PMID:25719666

  9. A whole-genome phylogeny of the family Pasteurellaceae.

    Science.gov (United States)

    Bonaventura, Maria Pia Di; Lee, Ernest K; Desalle, Rob; Planet, Paul J

    2010-03-01

    A phylogenomic approach was used to generate an amino acid phylogeny for 12 whole genomes representing 10 species in the family Pasteurellaceae. Orthology of genes was determined using an approach similar to OrthologID (http://nypg.bio.nyu.edu/orthologid/about.html) and resulted in the generation of a matrix with 3130 genes with 1,194,615 aligned amino acid characters of which 239,504 characters are phylogenetically informative. Phylogenetic analysis of the concatenated matrix using all standard approaches (maximum parsimony, maximum likelihood, and Bayesian analysis) results in a single extremely robust phylogenetic hypothesis for the species examined in this study. Remarkably, no single gene partition gives the same tree as the concatenated analysis. By analyzing partitioned support in the data matrix, we show that there is very little negative support emanating from individual gene partitions to suggest that the concatenated hypothesis is not tenable. The large number of characters in the matrix allows us to test hypotheses concerning missing data and character number in phylogenomic studies, and we conclude that matrices constructed using genome level information are very robust to missing data. We show that a very large number of concatenated gene sequences (>160) are needed to reliably obtain the same topology as the overall analysis. Copyright 2009 Elsevier Inc. All rights reserved.

  10. Signatures of selection in tilapia revealed by whole genome resequencing.

    Science.gov (United States)

    Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua

    2015-09-16

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.

  11. Cryptococcus gattii in the Age of Whole-Genome Sequencing.

    Science.gov (United States)

    Meyer, Wieland

    2015-11-17

    Cryptococcus gattii, the sister species of Cryptococcus neoformans, is an emerging pathogen which gained importance in connection with the ongoing cryptococcosis outbreak on Vancouver Island. Many molecular studies have divided this species into for major lineages: VGI, VGII, VGIII, and VGIV. This commentary summarizes the whole-genome sequencing (WGS) studies that have been carried out with this species, re-emphasizing the phylogenetic relationships, showing chromosomal rearrangements between those four groups, and identifying VGII as ancestral population within C. gattii. In addition, WGS specific to VGII, containing the Vancouver Island outbreak genotypes and those from the Pacific Northwest region of the United States, has placed the origin of this lineage within South America and identified specific genes responsible for either brain or lung infection. It also showed, that many genotypes are spread across a number of different continents, as has been previously shown by multilocus sequence typing (MLST). In addition, it showed that recombination occurs more frequently between mitochondrial than nuclear genomes.

  12. Whole-Genome Sequencing for National Surveillance of Shigella flexneri

    Directory of Open Access Journals (Sweden)

    Marie A. Chattaway

    2017-09-01

    Full Text Available National surveillance of Shigella flexneri ensures the rapid detection of outbreaks to facilitate public health investigation and intervention strategies. In this study, we used whole-genome sequencing (WGS to type S. flexneri in order to detect linked cases and support epidemiological investigations. We prospectively analyzed 330 isolates of S. flexneri received at the Gastrointestinal Bacteria Reference Unit at Public Health England between August 2015 and January 2016. Traditional phenotypic and WGS sub-typing methods were compared. PCR was carried out on isolates exhibiting phenotypic/genotypic discrepancies with respect to serotype. Phylogenetic relationships between isolates were analyzed by WGS using single nucleotide polymorphism (SNP typing to facilitate cluster detection. For 306/330 (93% isolates there was concordance between serotype derived from the genome and phenotypic serology. Discrepant results between the phenotypic and genotypic tests were attributed to novel O-antigen synthesis/modification gene combinations or indels identified in O-antigen synthesis/modification genes rendering them dysfunctional. SNP typing identified 36 clusters of two isolates or more. WGS provided microbiological evidence of epidemiologically linked clusters and detected novel O-antigen synthesis/modification gene combinations associated with two outbreaks. WGS provided reliable and robust data for monitoring trends in the incidence of different serotypes over time. SNP typing can be used to facilitate outbreak investigations in real-time thereby informing surveillance strategies and providing the opportunities for implementing timely public health interventions.

  13. Evolution after whole-genome duplication: a network perspective.

    Science.gov (United States)

    Zhu, Yun; Lin, Zhenguo; Nakhleh, Luay

    2013-11-06

    Gene duplication plays an important role in the evolution of genomes and interactomes. Elucidating how evolution after gene duplication interplays at the sequence and network level is of great interest. In this work, we analyze a data set of gene pairs that arose through whole-genome duplication (WGD) in yeast. All these pairs have the same duplication time, making them ideal for evolutionary investigation. We investigated the interplay between evolution after WGD at the sequence and network levels and correlated these two levels of divergence with gene expression and fitness data. We find that molecular interactions involving WGD genes evolve at rates that are three orders of magnitude slower than the rates of evolution of the corresponding sequences. Furthermore, we find that divergence of WGD pairs correlates strongly with gene expression and fitness data. Because of the role of gene duplication in determining redundancy in biological systems and particularly at the network level, we investigated the role of interaction networks in elucidating the evolutionary fate of duplicated genes. We find that gene neighborhoods in interaction networks provide a mechanism for inferring these fates, and we developed an algorithm for achieving this task. Further epistasis analysis of WGD pairs categorized by their inferred evolutionary fates demonstrated the utility of these techniques. Finally, we find that WGD pairs and other pairs of paralogous genes of small-scale duplication origin share similar properties, giving good support for generalizing our results from WGD pairs to evolution after gene duplication in general.

  14. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    Francioli, Laurent C.; Menelaou, Andronild; Pulit, Sara L.; Van Dijk, Freerk; Palamara, Pier Francesco; Elbers, Clara C.; Neerincx, Pieter B. T.; Ye, Kai; Guryev, Victor; Kloosterman, Wigard P.; Deelen, Patrick; Abdellaoui, Abdel; Van Leeuwen, Elisabeth M.; Van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F. J.; Karssen, Lennart C.; Kanterakis, Alexandros; Amin, Najaf; Hottenga, Jouke Jan; Lameijer, Eric-Wubbo; Kattenberg, Mathijs; Dijkstra, Martijn; Byelas, Heorhiy; Van Settenl, Jessica; Van Schaik, Barbera D. C.; Bot, Jan; Nijman, Isaac J.; Renkens, Ivo; Marscha, Tobias; Schonhuth, Alexander; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Polak, Paz; Sohail, Mashaal; Vuzman, Dana; Hormozdiari, Fereydoun; Van Enckevort, David; Mei, Hailiang; Koval, Vyacheslav; Moed, Ma-Tthijs H.; Van der Velde, K. Joeri; Rivadeneira, Fernando; Estrada, Karol; Medina-Gomez, Carolina; Isaacs, Aaron; McCarroll, Steven A.; Beekrnan, Marian; De Craen, Anton J. M.; Suchiman, H. Eka D.; Hofman, Albert; Oostra, Ben; Uitterlinden, Andre G.; Willemsen, Gonneke; Platteel, Mathieu; Veldink, Jan H.; Van den Berg, Leonard H.; Pitts, Steven J.; Potluri, Shobha; Sundar, Purnima; Cox, David R.; Sunyaev, Shamil R.; Den Dunnen, Johan T.; Stoneking, Mark; De Knijff, Peter; Kayser, Manfred; Li, Qibin; Li, Yingrui; Du, Yuanping; Chen, Ruoyan; Cao, Hongzhi; Li, Ning; Cao, Sujie; Wang, Jun; Bovenberg, Jasper A.; Peer, Itsik; Slagboom, P. Eline; Van Duijn, Cornelia M.; Boomsma, Dorret I.; Van Ommen, Gert-Jan B.; De Bakker, Paul I. W.; Swertz, Morris A.; Wijmenga, Cisca

    2014-01-01

    Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring

  15. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls...

  16. Utilization of touch preparations and whole genome amplification for loss of heterozygosity analysis in prostate cancer

    Energy Technology Data Exchange (ETDEWEB)

    Wick, M.J.; Halling, K.; Thibodeau, S.N. [Mayo Clinic and Foundation, Rochester, MN (United States)

    1994-09-01

    Loss of heterozygosity (LOH) analyses have been used extensively to identify tumor suppressor genes in a variety of tumor systems. In an effort to localize such genes in prostate cancer, we have examined tissue for LOH with the use of PCR-based assays for a variety of microsatellites. However, the highly infiltrative nature of prostate carcinoma makes it virtually impossible, by conventional methods, to obtain tumor DNA that is uncontaminated with DNA from normal cells. Thus, we have examined the use of touch preparations as a means to increase the percentage of tumor DNA for our LOH analyses. This method, which involves lightly touching the cut surface of fresh prostate tissue to the surface of a microscope slide, allows for selection of tumor cell clusters. DNA from these cells can then be used in a variety of PCR-based assays. In this study, we demonstrate that tumor cell clusters can be used effectively for LOH analysis. Our studies also demonstrate that use of the touch preparation technique reduces or eliminates normal cell contamination. However, the small quantity of DNA in these clusters prohibits analysis at multiple loci. Therefore, we have examined whole genome amplification (WGA) of tumor cells clusters as a method of avoiding this difficulty. Random 15 base oligonucleotides were used as primers for WGA of cell cluster DNA. Aliquots of the WGA were then subjected to a second round of PCR in which microsatellite markers demonstrating allelic loss in prostate cancer were amplified. Our studies indicate that analysis of limited quantities of prostate tumor DNA at multiple loci can be accomplished through coupling of the touch preparation technique with WGA. This method may have ramifications for the analysis of tissue in which procurement of sufficient quantities of DNA is difficult.

  17. Whole-genome synthesis and characterization of viable S13-like bacteriophages.

    Directory of Open Access Journals (Sweden)

    Yuchen Liu

    Full Text Available BACKGROUND: Unprecedented progresses in high-throughput DNA sequencing and de novo gene synthesis technologies have allowed us to create living organisms in the absence of natural template. METHODOLOGY/PRINCIPAL FINDINGS: The sequence of wild-type S13 phage genome was downloaded from GenBank. Two synonymous mutations were introduced into wt-S13 genome to generate m1-S13 genome. Another mutant, m2-S13 genome, was obtained by engineering two nonsynonymous mutations in the capsid protein coding region of wt-S13 genome. A chimeric phage genome was designed by replacing the F capsid protein open reading frame (ORF from phage S13 with the F capsid protein ORF from phage G4. The whole genomes of all four phages were assembled from a series of chemically synthesized short overlapping oligonucleotides. The linear synthesized genomes were circularized and electroporated into E.coli C, the standard laboratory host of S13 phage. All four phages were recovered and plaques were visualized. The results of sequencing showed the accuracy of these synthetic genomes. The synthetic phages were capable of lysing their bacterial host and tolerating general environmental conditions. While no phenotypic differences among the variant strains were observed when grown in LB medium with CaCl(2, the S13/G4 chimera was found to be much more sensitive to the absence of calcium and to have a lower adsorption rate under calcium free condition. CONCLUSIONS/SIGNIFICANCE: The bacteriophage S13 and its variants can be chemically synthesized. The major capsid gene of phage G4 is functional in the phage S13 life cycle. These results support an evolutional hypothesis which has been proposed that a homologous recombination event involving gene F of quite divergent ancestral lineages should be included in the history of the microvirid family.

  18. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  19. Whole genome analysis of epidemiologically closely related Staphylococcus aureus isolates.

    Directory of Open Access Journals (Sweden)

    Maarten Schijffelen

    Full Text Available The change of the bacteria from colonizers to pathogens is accompanied by a drastic change in expression profiles. These changes may be due to environmental signals or to mutational changes. We therefore compared the whole genome sequences of four sets of S. aureus isolates. Three sets were from the same patients. The isolates of each pair (S1800/S1805, S2396/S2395, S2398/S2397, an isolate from colonization and an isolate from infection, respectively were obtained within <30 days of each other and the isolate from infection caused skin infections. The isolates were then compared for differences in gene content and SNPs. In addition, a set of isolates from a colonized pig and a farmer from the same farm at the same time (S0462 and S0460 were analyzed. The isolates pair S1800/S1805 showed a difference in a prophage, but these are easily lost or acquired. However, S1805 contained an integrative conjugative element not present in S1800. In addition, 92 SNPs were present in a variety of genes and the isolates S1800 and S1805 were not considered a pair. Between S2395/S2396 two SNPs were present: one was in an intergenic region and one was a synonymous mutation in a putative membrane protein. Between S2397/S2398 only one synonymous mutation in a putative lipoprotein was found. The two farm isolates were very similar and showed 12 SNPs in genes that belong to a number of different functional categories. However, we cannot pinpoint any gene that explains the change from carrier status to infection. The data indicate that differences between the isolate from infection and the colonizing isolate for S2395/S2396 and S2397/S2398 exist as well as between isolates from different hosts, but S1800/S1805 are not clonal.

  20. Cost analysis of whole genome sequencing in German clinical practice.

    Science.gov (United States)

    Plöthner, Marika; Frank, Martin; von der Schulenburg, J-Matthias Graf

    2017-06-01

    Whole genome sequencing (WGS) is an emerging tool in clinical diagnostics. However, little has been said about its procedure costs, owing to a dearth of related cost studies. This study helps fill this research gap by analyzing the execution costs of WGS within the setting of German clinical practice. First, to estimate costs, a sequencing process related to clinical practice was undertaken. Once relevant resources were identified, a quantification and monetary evaluation was conducted using data and information from expert interviews with clinical geneticists, and personnel at private enterprises and hospitals. This study focuses on identifying the costs associated with the standard sequencing process, and the procedure costs for a single WGS were analyzed on the basis of two sequencing platforms-namely, HiSeq 2500 and HiSeq Xten, both by Illumina, Inc. In addition, sensitivity analyses were performed to assess the influence of various uses of sequencing platforms and various coverage values on a fixed-cost degression. In the base case scenario-which features 80 % utilization and 30-times coverage-the cost of a single WGS analysis with the HiSeq 2500 was estimated at €3858.06. The cost of sequencing materials was estimated at €2848.08; related personnel costs of €396.94 and acquisition/maintenance costs (€607.39) were also found. In comparison, the cost of sequencing that uses the latest technology (i.e., HiSeq Xten) was approximately 63 % cheaper, at €1411.20. The estimated costs of WGS currently exceed the prediction of a 'US$1000 per genome', by more than a factor of 3.8. In particular, the material costs in themselves exceed this predicted cost.

  1. A whole-genome association study for pig reproductive traits.

    Science.gov (United States)

    Onteru, S K; Fan, B; Du, Z-Q; Garrick, D J; Stalder, K J; Rothschild, M F

    2012-02-01

    A whole-genome association study was performed for reproductive traits in commercial sows using the PorcineSNP60 BeadChip and Bayesian statistical methods. The traits included total number born (TNB), number born alive (NBA), number of stillborn (SB), number of mummified foetuses at birth (MUM) and gestation length (GL) in each of the first three parities. We report the associations of informative QTL and the genes within the QTL for each reproductive trait in different parities. These results provide evidence of gene effects having temporal impacts on reproductive traits in different parities. Many QTL identified in this study are new for pig reproductive traits. Around 48% of total genes located in the identified QTL regions were predicted to be involved in placental functions. The genomic regions containing genes important for foetal developmental (e.g. MEF2C) and uterine functions (e.g. PLSCR4) were associated with TNB and NBA in the first two parities. Similarly, QTL in other foetal developmental (e.g. HNRNPD and AHR) and placental (e.g. RELL1 and CD96) genes were associated with SB and MUM in different parities. The QTL with genes related to utero-placental blood flow (e.g. VEGFA) and hematopoiesis (e.g. MAFB) were associated with GL differences among sows in this population. Pathway analyses using genes within QTL identified some modest underlying biological pathways, which are interesting candidates (e.g. the nucleotide metabolism pathway for SB) for pig reproductive traits in different parities. Further validation studies on large populations are warranted to improve our understanding of the complex genetic architecture for pig reproductive traits.

  2. Post-Fragmentation Whole Genome Amplification-Based Method

    Science.gov (United States)

    Benardini, James; LaDuc, Myron T.; Langmore, John

    2011-01-01

    This innovation is derived from a proprietary amplification scheme that is based upon random fragmentation of the genome into a series of short, overlapping templates. The resulting shorter DNA strands (fragments with defined 3 and 5 termini. Specific primers to these termini are then used to isothermally amplify this library into potentially unlimited quantities that can be used immediately for multiple downstream applications including gel eletrophoresis, quantitative polymerase chain reaction (QPCR), comparative genomic hybridization microarray, SNP analysis, and sequencing. The standard reaction can be performed with minimal hands-on time, and can produce amplified DNA in as little as three hours. Post-fragmentation whole genome amplification-based technology provides a robust and accurate method of amplifying femtogram levels of starting material into microgram yields with no detectable allele bias. The amplified DNA also facilitates the preservation of samples (spacecraft samples) by amplifying scarce amounts of template DNA into microgram concentrations in just a few hours. Based on further optimization of this technology, this could be a feasible technology to use in sample preservation for potential future sample return missions. The research and technology development described here can be pivotal in dealing with backward/forward biological contamination from planetary missions. Such efforts rely heavily on an increasing understanding of the burden and diversity of microorganisms present on spacecraft surfaces throughout assembly and testing. The development and implementation of these technologies could significantly improve the comprehensiveness and resolving power of spacecraft-associated microbial population censuses, and are important to the continued evolution and advancement of planetary protection capabilities. Current molecular procedures for assaying spacecraft-associated microbial burden and diversity have inherent sample loss issues at

  3. Whole genome methylation profiling by immunoprecipitation of methylated DNA.

    Science.gov (United States)

    Sharp, Andrew J

    2012-01-01

    I provide a protocol for DNA methylation profiling based on immunoprecipitation of methylated DNA using commercially available monoclonal antibodies that specifically recognize 5-methylcytosine. Quantification of the level of enrichment of the resulting DNA enables DNA methylation to be assayed for any genomic locus, including entire chromosomes or genomes if appropriate microarray or high-throughput sequencing platforms are used. In previous studies (1, 2), I have used hybridization to oligonucleotide arrays from Roche Nimblegen Inc, which allow any genomic region of interest to be interrogated, dependent on the array design. For example, using modern tiling arrays comprising millions of oligonucleotide probes, several complete human chromosomes can be assayed at densities of one probe per 100 bp or greater, sufficient to yield high-quality data. However, other methods such as quantitative real-time PCR or high-throughput sequencing can be used, giving either measurement of methylation at a single locus or across the entire genome, respectively. While the data produced by single locus assays is relatively simple to analyze and interpret, global assays such as microarrays or high-throughput sequencing require more complex statistical approaches in order to effectively identify regions of differential methylation, and a brief outline of some approaches is given.

  4. Copy Number Variation Analysis by Array Analysis of Single Cells Following Whole Genome Amplification.

    Science.gov (United States)

    Dimitriadou, Eftychia; Zamani Esteki, Masoud; Vermeesch, Joris Robert

    2015-01-01

    Whole genome amplification is required to ensure the availability of sufficient material for copy number variation analysis of a genome deriving from an individual cell. Here, we describe the protocols we use for copy number variation analysis of non-fixed single cells by array-based approaches following single-cell isolation and whole genome amplification. We are focusing on two alternative protocols, an isothermal and a PCR-based whole genome amplification method, followed by either comparative genome hybridization (aCGH) or SNP array analysis, respectively.

  5. Tracing Mycobacterium tuberculosis transmission by whole genome sequencing in a high incidence setting

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Soborg, B; Koch, A;

    2016-01-01

    In East Greenland, a dramatic increase of tuberculosis (TB) incidence has been observed in recent years. Classical genotyping suggests a genetically similar Mycobacterium tuberculosis (Mtb) strain population as cause, however, precise transmission patterns are unclear. We performed whole genome...

  6. New perspectives on microbial community distortion after whole-genome amplification

    Science.gov (United States)

    Whole-genome amplification (WGA) has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the e...

  7. Systematic evaluation of bias in microbial community profiles induced by whole genome amplification

    NARCIS (Netherlands)

    Direito, S.O.L.; Zaura, E.; Little, M.; Ehrenfreund, P.; Röling, W.F.M.

    2014-01-01

    Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used [multiple displacement amplific

  8. Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data

    NARCIS (Netherlands)

    J.M. Bryant (Josephine); A. Schürch (Anita); H. van Deutekom (Henk); S.R. Harris (Simon); J.L. de Beer (Jessica); V. de Jager (Victor); K. Kremer (Kristin); S.A.F.T. van Hijum (Sacha); R.J. Siezen (Roland); M.W. Borgdorff (Martien ); S.D. Bentley (Stephen); J. Parkhill (Julian); D. van Soolingen (Dick)

    2013-01-01

    textabstractBackground: Mycobacterium tuberculosis is characterised by limited genomic diversity, which makes the application of whole genome sequencing particularly attractive for clinical and epidemiological investigation. However, in order to confidently infer transmission events, an accurate kno

  9. Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data

    NARCIS (Netherlands)

    Bryant, J.M.; Schürch, A.C.; Deutekom, van H.; Harris, S.R.; Beer, de J.L.; Jager, de V.C.L.; Kremer, K.; Hijum, van S.A.F.T.; Siezen, R.J.; Borgdorff, M.; Bentley, S.D.; Parkhill, J.; Soolingen, van D.

    2013-01-01

    BACKGROUND: Mycobacterium tuberculosis is characterised by limited genomic diversity, which makes the application of whole genome sequencing particularly attractive for clinical and epidemiological investigation. However, in order to confidently infer transmission events, an accurate knowledge of th

  10. Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data.

    NARCIS (Netherlands)

    Bryant, J.M.; Schurch, A.C.; Deutekom, H. van; Harris, S.R.; Beer, J.L. de; Jager, V. de; Kremer, K.; Hijum, S.A.F.T. van; Siezen, R.J.; Borgdorff, M.; Bentley, S.D.; Parkhill, J.; Soolingen, D. van

    2013-01-01

    BACKGROUND: Mycobacterium tuberculosis is characterised by limited genomic diversity, which makes the application of whole genome sequencing particularly attractive for clinical and epidemiological investigation. However, in order to confidently infer transmission events, an accurate knowledge of th

  11. A Whole Genome Pairwise Comparative and Functional Analysis of Geobacter sulfurreducens PCA

    OpenAIRE

    2013-01-01

    Geobacter species are involved in electricity production, bioremediations, and various environmental friendly activities. Whole genome comparative analyses of Geobacter sulfurreducens PCA, Geobacter bemidjiensis Bem, Geobacter sp. FRC-32, Geobacter lovleyi SZ, Geobacter sp. M21, Geobacter metallireducens GS-15, Geobacter uraniireducens Rf4 have been made to find out similarities and dissimilarities among them. For whole genome comparison of Geobacter species, an in-house tool, Geobacter Compa...

  12. Whole-Genome Sequence of the Nitrogen-Fixing Symbiotic Rhizobium Mesorhizobium loti Strain TONO

    Science.gov (United States)

    Hirakawa, Hideki; Sato, Shusei; Saeki, Kazuhiko; Hayashi, Makoto

    2016-01-01

    Mesorhizobium loti is the nitrogen-fixing microsymbiont for legumes of the genus Lotus. Here, we report the whole-genome sequence of a Mesorhizobium loti strain, TONO, which is used as a symbiont for the model legume Lotus japonicus. The whole-genome sequence of the strain TONO will be a solid platform for comparative genomics analyses and for the identification of genes responsible for the symbiotic properties of Mesorhizobium species.

  13. Whole genome sequencing as the ultimate tool to diagnose tuberculosis

    Directory of Open Access Journals (Sweden)

    Dick van Soolingen

    2016-01-01

    Full Text Available In the past two decades, DNA techniques have been increasingly used in the laboratory diagnosis of tuberculosis (TB. The (sub species of the Mycobacterium tuberculosis complex are usually identified using reverse line blot techniques. The resistance is predicted by the detection of mutations in genes associated with resistance. Nevertheless, all cases are still subjected to cumbersome phenotypic resistance testing. The production of a strain-characteristic DNA fingerprint, to investigate the epidemiology of TB, is done by the 24-locus variable number tandem repeat (VNTR typing. However, most of the molecular techniques in the diagnosis of TB can eventually be replaced by whole genome sequencing (WGS. Many international TB reference laboratories are currently working on the introduction of WGS; however, standardization in the international context is lacking. The European Centre for Infectious Disease Prevention and Control in Stockholm, Sweden organizes a yearly round of quality control on VNTR typing and in 2015 for the first time also WGS. In this first proficiency study, only three out of eight international TB laboratories produced WGS results in line with those of the reference laboratory. The whole process of DNA isolation, purification, quantification, sequencing, and analysis/interpretation of data is still under development. In this presentation, many aspects will be covered that influence the quality and interpretation of WGS results. The turn-around-time, analysis, and utility of WGS will be discussed. Moreover, the experiences in the use of WGS in the molecular epidemiology of TB in The Netherlands are detailed. It can be concluded that many difficulties still have to be conquered. The state of the art is that bacteria still have to be cultured to have sufficient quality and quantity of DNA for succesful WGS. The quality of sequencing has improved significantly over the past 7 years, and the detection of mutations has, therefore

  14. Removing the bottleneck in whole genome sequencing of Mycobacterium tuberculosis for rapid drug resistance analysis: a call to action

    Directory of Open Access Journals (Sweden)

    Ruth McNerney

    2017-03-01

    Full Text Available Whole genome sequencing (WGS can provide a comprehensive analysis of Mycobacterium tuberculosis mutations that cause resistance to anti-tuberculosis drugs. With the deployment of bench-top sequencers and rapid analytical software, WGS is poised to become a useful tool to guide treatment. However, direct sequencing from clinical specimens to provide a full drug resistance profile remains a serious challenge. This article reviews current practices for extracting M. tuberculosis DNA and possible solutions for sampling sputum. Techniques under consideration include enzymatic digestion, physical disruption, chemical degradation, detergent solubilization, solvent extraction, ligand-coated magnetic beads, silica columns, and oligonucleotide pull-down baits. Selective amplification of genomic bacterial DNA in sputum prior to WGS may provide a solution, and differential lysis to reduce the levels of contaminating human DNA is also being explored. To remove this bottleneck and accelerate access to WGS for patients with suspected drug-resistant tuberculosis, it is suggested that a coordinated and collaborative approach be taken to more rapidly optimize, compare, and validate methodologies for sequencing from patient samples.

  15. Implementation of High Resolution Whole Genome Array CGH in the Prenatal Clinical Setting: Advantages, Challenges, and Review of the Literature

    Directory of Open Access Journals (Sweden)

    Paola Evangelidou

    2013-01-01

    Full Text Available Array Comparative Genomic Hybridization analysis is replacing postnatal chromosomal analysis in cases of intellectual disabilities, and it has been postulated that it might also become the first-tier test in prenatal diagnosis. In this study, array CGH was applied in 64 prenatal samples with whole genome oligonucleotide arrays (BlueGnome, Ltd. on DNA extracted from chorionic villi, amniotic fluid, foetal blood, and skin samples. Results were confirmed with Fluorescence In Situ Hybridization or Real-Time PCR. Fifty-three cases had normal karyotype and abnormal ultrasound findings, and seven samples had balanced rearrangements, five of which also had ultrasound findings. The value of array CGH in the characterization of previously known aberrations in five samples is also presented. Seventeen out of 64 samples carried copy number alterations giving a detection rate of 26.5%. Ten of these represent benign or variables of unknown significance, giving a diagnostic capacity of the method to be 10.9%. If karyotype is performed the additional diagnostic capacity of the method is 5.1% (3/59. This study indicates the ability of array CGH to identify chromosomal abnormalities which cannot be detected during routine prenatal cytogenetic analysis, therefore increasing the overall detection rate. In addition a thorough review of the literature is presented.

  16. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia

    Science.gov (United States)

    Gurjav, Ulziijargal; Outhred, Alexander C.; Jelfs, Peter; McCallum, Nadine; Wang, Qinning; Hill-Cawthorne, Grant A.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24) genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW), Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841) had complete demographic and genotyping data. East-African Indian (474; 28.0%) and Beijing (470; 27.8%) lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692) and was highest among Beijing lineage strains (35.7%; 168/470). One Beijing and three East-African Indian (EAI) clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates) demonstrated diverse single nucleotide polymorphisms (SNPs) within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings. PMID:27737005

  17. Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly

    Energy Technology Data Exchange (ETDEWEB)

    Shou, S. [Univ. Wisc.-Madison; Kvikstad, E. [Univ. Wisc.-Madison; Kile, A. [Univ. Wisc.-Madison; Severin, J. [Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly; Forrest, D. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Hickman, J. W. [Univ. Wisc.-Madison; Mackenzie, C. [University of Texas–Houston Medical School; Choudhary, M. [University of Texas–Houston Medical School; Donohue, T. [Univ. Wisc.-Madison; Kaplan, S. [University of Texas–Houston Medical School; Schwartz, D. C. [Univ. Wisc.-Madison

    2003-09-01

    Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verification of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.

  18. PEMapper and PECaller provide a simplified approach to whole-genome sequencing.

    Science.gov (United States)

    Johnston, H Richard; Chopra, Pankaj; Wingo, Thomas S; Patel, Viren; Epstein, Michael P; Mulle, Jennifer G; Warren, Stephen T; Zwick, Michael E; Cutler, David J

    2017-03-07

    The analysis of human whole-genome sequencing data presents significant computational challenges. The sheer size of datasets places an enormous burden on computational, disk array, and network resources. Here, we present an integrated computational package, PEMapper/PECaller, that was designed specifically to minimize the burden on networks and disk arrays, create output files that are minimal in size, and run in a highly computationally efficient way, with the single goal of enabling whole-genome sequencing at scale. In addition to improved computational efficiency, we implement a statistical framework that allows for a base by base error model, allowing this package to perform as well or better than the widely used Genome Analysis Toolkit (GATK) in all key measures of performance on human whole-genome sequences.

  19. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    Science.gov (United States)

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  20. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities.

  1. Whole-Genome Sequences of Two Borrelia afzelii and Two Borrelia garinii Lyme Disease Agent Isolates

    Energy Technology Data Exchange (ETDEWEB)

    Casjens, S.R.; Dunn, J.; Mongodin, E. F.; Qiu, W.-G.; Luft, B. J.; Fraser-Liggett, C. M.; Schutzer, S. E.

    2011-12-01

    Human Lyme disease is commonly caused by several species of spirochetes in the Borrelia genus. In Eurasia these species are largely Borrelia afzelii, B. garinii, B. burgdorferi, and B. bavariensis sp. nov. Whole-genome sequencing is an excellent tool for investigating and understanding the influence of bacterial diversity on the pathogenesis and etiology of Lyme disease. We report here the whole-genome sequences of four isolates from two of the Borrelia species that cause human Lyme disease, B. afzelii isolates ACA-1 and PKo and B. garinii isolates PBr and Far04.

  2. Whole genome sequence of Enterobacter ludwigii type strain EN-119T, isolated from clinical specimens.

    Science.gov (United States)

    Li, Gengmi; Hu, Zonghai; Zeng, Ping; Zhu, Bing; Wu, Lijuan

    2015-04-01

    Enterobacter ludwigii strain EN-119(T) is the type strain of E. ludwigii, which belongs to the E. cloacae complex (Ecc). This strain was first reported and nominated in 2005 and later been found in many hospitals. In this paper, the whole genome sequencing of this strain was carried out. The total genome size of EN-119(T) is 4952,770 bp with 4578 coding sequences, 88 tRNAs and 10 rRNAs. The genome sequence of EN-119(T) is the first whole genome sequence of E. ludwigii, which will further our understanding of Ecc.

  3. Whole-Genome de novo Sequencing Of Quail And Grey Partridge

    DEFF Research Database (Denmark)

    Holm, Lars-Erik; Panitz, Frank; Burt, Dave;

    2011-01-01

    The development in sequencing methods has made it possible to perform whole genome de novo sequencing of species without large commercial interests. Within the EU-financed QUANTOMICS project (KBBE-2A-222664), we have performed de novo sequencing of quail (Coturnix coturnix) and grey partridge...... comparative studies towards the chicken genome and will aid in identifying evolutionarily conserved sequences within the Galliformes. The obtained sequences from quail and partridge represent a beginning of generating the whole genome sequence for these species. The continuation of establishing the genome...

  4. Effective Normalization for Copy Number Variation Detection from Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2012-01-01

    Background Whole genome sequencing enables a high resolution view ofthe human genome and provides unique insights into genome structureat an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools while validatedalso include a number of parame

  5. Whole genome scan to detect quantitative trait loci for bovine milk protein composition

    NARCIS (Netherlands)

    Schopen, G.C.B.; Koks, P.D.; Arendonk, van J.A.M.; Bovenhuis, H.; Visker, M.H.P.W.

    2009-01-01

    The objective of this study was to perform a whole genome scan to detect quantitative trait loci (QTL) for milk protein composition in 849 Holstein–Friesian cows originating from seven sires. One morning milk sample was analysed for the major milk proteins using capillary zone electrophoresis. A gen

  6. Whole genome analysis of Klebsiella pneumoniae T2-1-1 from human oral cavity

    Directory of Open Access Journals (Sweden)

    Kok-Gan Chan

    2016-03-01

    Full Text Available Klebsiella pneumoniae T2-1-1 was isolated from the human tongue debris and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession JAQL00000000.

  7. Whole-genome characterization and genotyping of global WU polyomavirus strains

    NARCIS (Netherlands)

    Bialasiewicz, Seweryn; Rockett, Rebecca; Whiley, David W.; Abed, Yacine; Allander, Tobias; Binks, Michael; Boivin, Guy; Cheng, Allen C.; Chung, Ju-Young; Ferguson, Patricia E.; Gilroy, Nicole M.; Leach, Amanda J.; Lindau, Cecilia; Rossen, John W.; Sorrell, Tania C.; Nissen, Michael D.; Sloots, Theo P.

    2010-01-01

    Exploration of the genetic diversity of WU polyomavirus (WUV) has been limited in terms of the specimen numbers and particularly the sizes of the genomic fragments analyzed. Using whole-genome sequencing of 48 WUV strains collected in four continents over a 5-year period and 16 publicly available wh

  8. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing

    DEFF Research Database (Denmark)

    Li, Ying-hui; Zhao, Shan-cen; Ma, Jian-xin;

    2013-01-01

    BACKGROUND:Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re...

  9. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian;

    2015-01-01

    BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate-oligonucleoti...

  10. Whole-Genome Scans Provide Evidence of Adaptive Evolution in Malawian Plasmodium falciparum Isolates

    DEFF Research Database (Denmark)

    Ocholla, Harold; Preston, Mark D; Mipando, Mwapatsa

    2014-01-01

    BACKGROUND:  Selection by host immunity and antimalarial drugs has driven extensive adaptive evolution in Plasmodium falciparum and continues to produce ever-changing landscapes of genetic variation. METHODS:  We performed whole-genome sequencing of 69 P. falciparum isolates from Malawi and used ...

  11. Consequences of splitting whole-genome sequencing effort over multiple breeds on imputation accuracy

    NARCIS (Netherlands)

    Bouwman, A.C.; Veerkamp, R.F.

    2014-01-01

    The aim of this study was to determine the consequences of splitting sequencing effort over multiple breeds for imputation accuracy from a high-density SNP chip towards whole-genome sequence. Such information would assist for instance numerical smaller cattle breeds, but also pig and chicken

  12. CViT: “Chromosome Visualization Tool” – A whole-genome viewer

    Science.gov (United States)

    CViT (Chromosome Visualization Tool) is a Perl utility for quickly generating images of features on a whole genome at once. It reads GFF3-format data representing chromosomes (linkage groups or pseudomolecules), and features on those chromosomes. It can display features on any chromosomal unit syste...

  13. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits

    DEFF Research Database (Denmark)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L

    2017-01-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader alleli...

  14. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans)

    OpenAIRE

    2015-01-01

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy.

  15. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans).

    Science.gov (United States)

    Tran, Phuong N; Tan, Nicholas E H; Lee, Yin Peng; Gan, Han Ming; Polter, Steven J; Dailey, Lucas K; Hudson, André O; Savka, Michael A

    2015-11-19

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy.

  16. Whole Genome PCR Scanning (WGPS) of C. burnetii strains from ruminants

    NARCIS (Netherlands)

    Sidi-Boumedine, Karim; Adam, Gilbert; Angen, Oysten; Aspán, A.; Bossers, A.; Roest, H.I.J.; Prigent, Myriam; Thiéry, R.; Rousset, Elodie

    2015-01-01

    Coxiella burnetii is the causative agent of Q fever, a zoonosis that spreads from ruminants to humans via the inhalation of aerosols contaminated by livestock's birth products. This study aimed to compare the genomes of strains isolated from ruminants by “Whole Genome PCR Scanning (WGPS)” in order t

  17. Clinical Application of Whole Genome Sequencing In Patients with Primary Immunodeficiency

    Science.gov (United States)

    Mousallem, Talal; Urban, Thomas J.; McSweeney, K. Melodi; Kleinstein, Sarah E.; Zhu, Mingfu; Adeli, Mehdi; Parrott, Roberta E.; Roberts, Joseph L.; Krueger, Brian; Buckley, Rebecca H.; Goldstein, David B

    2016-01-01

    Summary This report illustrates the value of whole genome sequencing (WGS) in elucidating the genetic cause of disease in patients with primary immunodeficiency (PID). As sequencing costs decline, we predict that utilization of next generation sequencing (NGS) in the clinical setting will increase. PMID:25981738

  18. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer

    NARCIS (Netherlands)

    Wang, Kai; Yuen, Siu Tsan; Xu, Jiangchun; Lee, Siu Po; Yan, Helen H N; Shi, Stephanie T; Siu, Hoi Cheong; Deng, Shibing; Chu, Kent Man; Law, Simon; Chan, Kok Hoe; Chan, Annie S Y; Tsui, Wai Yin; Ho, Siu Lun; Chan, Anthony K W; Man, Jonathan L K; Foglizzo, Valentina; Ng, Man Kin; Chan, April S; Ching, Yick Pang; Cheng, Grace H W; Xie, Tao; Fernandez, Julio; Li, Vivian S W; Clevers, Hans; Rejto, Paul A; Mao, Mao; Leung, Suet Yi

    2014-01-01

    Gastric cancer is a heterogeneous disease with diverse molecular and histological subtypes. We performed whole-genome sequencing in 100 tumor-normal pairs, along with DNA copy number, gene expression and methylation profiling, for integrative genomic analysis. We found subtype-specific genetic and e

  19. Multiplex SNP analysis on whole genome amplified DNA from archived dried bloodspots, a validation study

    DEFF Research Database (Denmark)

    Tvedegaard, Kristine C.; Parner, Erik; Hooper, Craig W.

    Multiplex SNP analysis on whole genome amplified DNA from archived dried bloodspots, a validation study Kristine C. Tvedegaard,1 Erik Parner,1 Craig W. Hooper,2 Jørn Atterman,1 Niels Gregersen3, Poul Thorsen,1 1Institute of Public Health, NANEA at Department of Epidemiology, University of Aarhus...

  20. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci

    DEFF Research Database (Denmark)

    Rasmusen, L. H.; Dargis, R.; Iversen, Katrine Højholt

    2016-01-01

    with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were...

  1. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    NARCIS (Netherlands)

    Yuen, Ryan K C; Merico, Daniele; Bookman, Matt; Howe, Jennifer L.; Thiruvahindrapuram, Bhooma; Patel, Rohan V.; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A.; Walker, Susan; Marshall, Christian R.; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D'Abate, Lia; Chan, Ada J S; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L.; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J.; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R.; Nalpathamkalam, Thomas; Sung, Wilson W L; Tsoi, Fiona J.; Wei, John; Xu, Lizhen; Tasse, Anne Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie Mackinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M.; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H.; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A.; Parr, Jeremy R.; Spence, Sarah J.; Vorstman, Jacob|info:eu-repo/dai/nl/304817023; Frey, Brendan J.; Robinson, James T.; Strug, Lisa J.; Fernandez, Bridget A.; Elsabbagh, Mayada; Carter, Melissa T.; Hallmayer, Joachim; Knoppers, Bartha M.; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H.; Glazer, David; Pletcher, Mathew T.; Scherer, Stephen W.

    2017-01-01

    We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information,

  2. Whole-genome sequencing for identification of the source in hospital-acquired Legionnaires' disease

    DEFF Research Database (Denmark)

    Rosendahl Madsen, A M; Holm, A; Jensen, T G

    2017-01-01

    -genome sequencing to identify the source of infection in hospital-acquired Legionnaires' disease. Phylogenetic analyses showed close relatedness between one patient isolate and a strain found in hospital water, confirming suspicion of nosocomial infection. It was found that whole-genome sequencing can be a useful...

  3. Analysis of common k-mers for whole genome sequences using SSB-tree.

    Science.gov (United States)

    Choi, Jeong-Hyeon; Cho, Hwan-Gue

    2002-01-01

    As sequenced genomes become larger and sequencing process becomes faster, there is a need to develop a tool to analyze sequences in the whole genomic scale. However, on-memory algorithms such as suffix tree and suffix array are not applicable to the analysis of whole genome sequence set, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce a workbench called SequeX for the analysis and visualization of whole genome sequences using SSB-tree (Static SB-tree). It consists of two parts: the analysis query subsystem and the visualization subsystem. The query subsystem supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization subsystem helps biologists to easily understand whole genome structure and feature by sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, and k-mer viewer. The system also supports a user-friendly programming interface based on Java script for batch processing and the extension for a specific purpose of a user. SequeX can be used to identify conserved genes or sequences by the analysis of the common k-mers and annotation. We analyze the common k-mer for 72 microbial genomes announced by Entrez, and find an interesting biological fact that the longest common k-mer for 72 sequences is 11-mer, and only 11 such sequences exist. Finally we note that many common k-mers occur in conserved region such as CDS, rRNA, and tRNA.

  4. Generation of Physical Map Contig-Specific Sequences Useful for Whole Genome Sequence Scaffolding

    Science.gov (United States)

    Jiang, Yanliang; Ninwichian, Parichart; Liu, Shikai; Zhang, Jiaren; Kucuktas, Huseyin; Sun, Fanyue; Kaltenboeck, Ludmilla; Sun, Luyang; Bao, Lisui; Liu, Zhanjiang

    2013-01-01

    Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge. PMID:24205335

  5. Generation of physical map contig-specific sequences useful for whole genome sequence scaffolding.

    Directory of Open Access Journals (Sweden)

    Yanliang Jiang

    Full Text Available Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.

  6. Whole genome amplification for CGH analysis: Linker-adapter PCR as the method of choice for difficult and limited samples.

    Science.gov (United States)

    Pirker, Christine; Raidl, Maria; Steiner, Elisabeth; Elbling, Leonilla; Holzmann, Klaus; Spiegl-Kreinecker, Sabine; Aubele, Michaela; Grasl-Kraupp, Bettina; Marosi, Christine; Micksche, Michael; Berger, Walter

    2004-09-01

    Comparative genomic hybridization (CGH) is a powerful method to investigate chromosomal imbalances in tumor cells. However, DNA quantity and quality can be limiting factors for successful CGH analysis. The aim of this study was to investigate the applicability of degenerate oligonucleotide-primed PCR (DOP-PCR) and a recently developed linker-adapter-mediated PCR (LA-PCR) for whole genome amplification for use in CGH, especially for difficult source material. We comparatively analyzed DNA of variable quality derived from different cell/tissue types. Additionally, dilution experiments down to the DNA content of a single cell were performed. FISH and/or classical cytogenetic analyses were used as controls. In the case of high quality DNA samples, both methods were equally suitable for CGH. When analyzing very small amounts of these DNA samples (equivalent to one or a few human diploid cells), DOP-PCR-CGH, but not LA-PCR-CGH, frequently produced false-positive signals (e.g., gains in 1p and 16p, and losses in chromosome 4q). In case of formalin-fixed paraffin-embedded tissues, success rates by LA-PCR-CGH were significantly higher as compared to DOP-PCR-CGH. DNA of minor quality frequently could be analyzed correctly by LA-PCR-CGH, but was prone to give false-positive and/or false-negative results by DOP-PCR-CGH. LA-PCR is superior to DOP-PCR for amplification of DNA for CGH analysis, especially in the case of very limited or partly degraded source material. Copyright 2004 Wiley-Liss, Inc

  7. Determining the cause of recurrent Clostridium difficile infection using whole genome sequencing.

    Science.gov (United States)

    Sim, James Heng Chiak; Truong, Cynthia; Minot, Samuel S; Greenfield, Nick; Budvytiene, Indre; Lohith, Akshar; Anikst, Victoria; Pourmand, Nader; Banaei, Niaz

    2017-01-01

    Understanding the contribution of relapse and reinfection to recurrent Clostridium difficile infection (CDI) has implications for therapy and infection prevention, respectively. We used whole genome sequencing to determine the relation of C. difficile strains isolated from patients with recurrent CDI at an academic medical center in the United States. Thirty-five toxigenic C. difficile isolates from 16 patients with 19 recurrent CDI episodes with median time of 53.5days (range, 13-362) between episodes were whole genome sequenced on the Illumina MiSeq platform. In 84% (16) of recurrences, the cause of recurrence was relapse with prior strain of C. difficile. In 16% (3) of recurrent episodes, reinfection with a new strain of C. difficile was the cause. In conclusion, the majority of CDI recurrences at our institution were due to infection with the same strain rather than infection with a new strain. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Whole genome sequencing of Mycobacterium tuberculosis SB24 isolated from Sabah, Malaysia

    Directory of Open Access Journals (Sweden)

    Noraini Philip

    2016-09-01

    Full Text Available Mycobacterium tuberculosis (M. tuberculosis is the causative agent of tuberculosis (TB that causes millions of death every year. We have sequenced the genome of M. tuberculosis isolated from cerebrospinal fluid (CSF of a patient diagnosed with tuberculous meningitis (TBM. The isolated strain was referred as M. tuberculosis SB24. Genomic DNA of the M. tuberculosis SB24 was extracted and subjected to whole genome sequencing using PacBio platform. The draft genome size of M. tuberculosis SB24 was determined to be 4,452,489 bp with a G + C content of 65.6%. The whole genome shotgun project has been deposited in NCBI SRA under the accession number SRP076503.

  9. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing

    DEFF Research Database (Denmark)

    Li, Ying-hui; Zhao, Shan-cen; Ma, Jian-xin

    2013-01-01

    BACKGROUND:Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re......-sequencing accessions, which represent wild, domesticated landrace, and Chinese elite soybean populations were analyzed.RESULTS:A total of 5,102,244 single nucleotide polymorphisms (SNPs) and 707,969 insertion/deletions were identified. Among the SNPs detected, 25.5% were not described previously. We found...... that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars. Only a small proportion (2.99%) of the whole genomic regions appear to be affected by artificial selection for preferred agricultural traits...

  10. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Science.gov (United States)

    Satoh, Soichirou; Mimuro, Mamoru; Tanaka, Ayumi

    2013-01-01

    Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  11. Applications of the double-barreled data in whole-genome shotgun sequence assembly and analysis

    Institute of Scientific and Technical Information of China (English)

    HAN Yujun; WANG Jing; GU Xiaocheng; YU Jun; LI Songgang; NI Peixiang; L(U) Hong; YE Jia; HU Jianfei; CHEN Chen; HUANG Xiangang; CONG Lijuan; LI Guangyuan

    2005-01-01

    Double-barreled (DB) data have been widely used for the assembly of large genomes. Based on the experience of building the whole-genome working draft of Oryza sativa L.ssp. Indica, we present here the prevailing and improved uses of DB data in the assembly procedure and report on novel applications during the following data-mining processes such as acquiring precise insert fragment information of each clone across the genome, and a new kind of Iow-cost whole-genome microarray. With the increasing number of organisms being sequenced,we believe that DB data will play an important role both in other assembly procedures and infuture genomic studies.

  12. Whole genome multilocus sequence typing as an epidemiologic tool for Yersinia pestis.

    Science.gov (United States)

    Kingry, Luke C; Rowe, Lori A; Respicio-Kingry, Laurel B; Beard, Charles B; Schriefer, Martin E; Petersen, Jeannine M

    2016-04-01

    Human plague is a severe and often fatal zoonotic disease caused by Yersinia pestis. For public health investigations of human cases, nonintensive whole genome molecular typing tools, capable of defining epidemiologic relationships, are advantageous. Whole genome multilocus sequence typing (wgMLST) is a recently developed methodology that simplifies genomic analyses by transforming millions of base pairs of sequence into character data for each gene. We sequenced 13 US Y. pestis isolates with known epidemiologic relationships. Sequences were assembled de novo, and multilocus sequence typing alleles were assigned by comparison against 3979 open reading frames from the reference strain CO92. Allele-based cluster analysis accurately grouped the 13 isolates, as well as 9 publicly available Y. pestis isolates, by their epidemiologic relationships. Our findings indicate wgMLST is a simplified, sensitive, and scalable tool for epidemiologic analysis of Y. pestis strains.

  13. Single Cell HLA Matching Feasibility by Whole Genomic Amplification and Nested PCR

    Institute of Scientific and Technical Information of China (English)

    Xiao-hong Li; Fang-yin Meng

    2004-01-01

    @@ PCR based single-cell DNA analysis has been widely used in forensic science, preimplantation genetic diagnosis and so on. However, the original sample cannot be efficiently retrieved following single cell PCR, consequently the amount of information gained is limited. HLA system is too sophisticated that it is very hard to complete HLA typing by single cell. A Taq polymerase-based method using random primers to amplify whole genome termed as whole genome amplification (WGA) has demonstrated to be a useful method in increasing the copies of minimum sample. We establish a technique in this study to amplify HLA-A and HLA-B loci at same time in a single cell using WGA.

  14. Error rates, PCR recombination, and sampling depth in HIV-1 whole genome deep sequencing.

    Science.gov (United States)

    Zanini, Fabio; Brodin, Johanna; Albert, Jan; Neher, Richard A

    2016-12-27

    Deep sequencing is a powerful and cost-effective tool to characterize the genetic diversity and evolution of virus populations. While modern sequencing instruments readily cover viral genomes many thousand fold and very rare variants can in principle be detected, sequencing errors, amplification biases, and other artifacts can limit sensitivity and complicate data interpretation. For this reason, the number of studies using whole genome deep sequencing to characterize viral quasi-species in clinical samples is still limited. We have previously undertaken a large scale whole genome deep sequencing study of HIV-1 populations. Here we discuss the challenges, error profiles, control experiments, and computational test we developed to quantify the accuracy of variant frequency estimation.

  15. Economic evidence on identifying clinically actionable findings with whole-genome sequencing: a scoping review.

    OpenAIRE

    2016-01-01

    The American College of Medical Genetics and Genomics (ACMG) recommends that mutations in 56 genes for 24 conditions are clinically actionable and should be reported as secondary findings after whole-genome sequencing (WGS). Our aim was to identify published economic evaluations of detecting mutations in these genes among the general population or among targeted/high-risk populations and conditions and identify gaps in knowledge. A targeted PubMed search from 1994 through November 2014 was pe...

  16. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation

    OpenAIRE

    2016-01-01

    Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris ...

  17. Evolutionary insight from whole-genome sequencing of Pseudomonas aeruginosa from cystic fibrosis patients

    DEFF Research Database (Denmark)

    Marvig, Rasmus Lykke; Madsen Sommer, Lea Mette; Jelsbak, Lars;

    2015-01-01

    is suggested to be due to the large genetic repertoire of P. aeruginosa and its ability to genetically adapt to the host environment. Here, we review the recent work that has applied whole-genome sequencing to understand P. aeruginosa population genomics, within-host microevolution and diversity, mutational...... mechanisms, genetic adaptation and transmission events. Finally, we summarize the advances in relation to medical applications and laboratory evolution experiments....

  18. Microarray-based whole-genome hybridization as a tool for determining procaryotic species relatedness

    Energy Technology Data Exchange (ETDEWEB)

    Wu, L.; Liu, X.; Fields, M.W.; Thompson, D.K.; Bagwell, C.E.; Tiedje, J. M.; Hazen, T.C.; Zhou, J.

    2008-01-15

    The definition and delineation of microbial species are of great importance and challenge due to the extent of evolution and diversity. Whole-genome DNA-DNA hybridization is the cornerstone for defining procaryotic species relatedness, but obtaining pairwise DNA-DNA reassociation values for a comprehensive phylogenetic analysis of procaryotes is tedious and time consuming. A previously described microarray format containing whole-genomic DNA (the community genome array or CGA) was rigorously evaluated as a high-throughput alternative to the traditional DNA-DNA reassociation approach for delineating procaryotic species relationships. DNA similarities for multiple bacterial strains obtained with the CGA-based hybridization were comparable to those obtained with various traditional whole-genome hybridization methods (r=0.87, P<0.01). Significant linear relationships were also observed between the CGA-based genome similarities and those derived from small subunit (SSU) rRNA gene sequences (r=0.79, P<0.0001), gyrB sequences (r=0.95, P<0.0001) or REP- and BOX-PCR fingerprinting profiles (r=0.82, P<0.0001). The CGA hybridization-revealed species relationships in several representative genera, including Pseudomonas, Azoarcus and Shewanella, were largely congruent with previous classifications based on various conventional whole-genome DNA-DNA reassociation, SSU rRNA and/or gyrB analyses. These results suggest that CGA-based DNA-DNA hybridization could serve as a powerful, high-throughput format for determining species relatedness among microorganisms.

  19. Whole-Genome Duplications Spurred the Functional Diversification of the Globin Gene Superfamily in Vertebrates

    OpenAIRE

    Hoffmann, Federico G.; Opazo, Juan C; Storz, Jay F.

    2011-01-01

    It has been hypothesized that two successive rounds of whole-genome duplication (WGD) in the stem lineage of vertebrates provided genetic raw materials for the evolutionary innovation of many vertebrate-specific features. However, it has seldom been possible to trace such innovations to specific functional differences between paralogous gene products that derive from a WGD event. Here, we report genomic evidence for a direct link between WGD and key physiological innovations in the vertebrate...

  20. Whole-Genome Sequence of Rummeliibacillus stabekisii Strain PP9 Isolated from Antarctic Soil.

    Science.gov (United States)

    da Mota, Fábio Faria; Vollú, Renata Estebanez; Jurelevicius, Diogo; Seldin, Lucy

    2016-05-26

    The whole genome of Rummeliibacillus stabekisii PP9, isolated from a soil sample from Antarctica, consists of a circular chromosome of 3,412,092 bp and a circular plasmid of 8,647 bp, with 3,244 protein-coding genes, 12 copies of the 16S-23S-5S rRNA operon, 101 tRNA genes, and 6 noncoding RNAs (ncRNAs).

  1. An integrated computational pipeline and database to support whole-genome sequence annotation.

    Science.gov (United States)

    Mungall, C J; Misra, S; Berman, B P; Carlson, J; Frise, E; Harris, N; Marshall, B; Shu, S; Kaminker, J S; Prochnik, S E; Smith, C D; Smith, E; Tupy, J L; Wiel, C; Rubin, G M; Lewis, S E

    2002-01-01

    We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture.

  2. Whole-Genome Sequence of Endophytic Plant Growth-Promoting Escherichia coli USML2.

    Science.gov (United States)

    Tharek, Munirah; Sim, Kee-Shin; Khairuddin, Dzulaikha; Ghazali, Amir Hamzah; Najimudin, Nazalan

    2017-05-11

    Escherichia coli strain USML2 was originally isolated from the inner leaf tissues of surface-sterilized phytopathogenic-free oil palm (Elaeis guineensis Jacq.). We present here the whole-genome sequence of this plant-endophytic strain. The genome consists of a single circular chromosome of 4,502,758 bp, 4,315 predicted coding sequences, and a G+C content of 50.8%. Copyright © 2017 Tharek et al.

  3. Whole Genome Sequencing of Sugar Beet and Transcriptional Profiling of Beet Curly Top Resistance

    Science.gov (United States)

    The genome of the sugar beet (Beta vulgaris subsp. vulgaris) doubled haploid line (KDH13) has been sequenced using Illumina HiSeq2000 next generation sequencing platform. This line (PI663862) was released by USDA-ARS as a genetic stock resistant to beet curly top. Sequencing of a standard paired end...

  4. Analysis on n-gram statistics and linguistic features of whole genome protein sequences

    Institute of Scientific and Technical Information of China (English)

    DONG Qi-wen; WANG Xiao-long; LIN Lei

    2008-01-01

    To obtain the statistical sequence analysis on a large number of genomic and proteomie sequences available for different organisms,the n-grams of whole genome protein sequences from 20 organisms were extracted.Their linguistic features were analyzed by two tests:Zipf power law and Shannon entropy,developed for analysis of natural languages and symbolic sequences.The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered.The results show that:the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4;the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins;a simple unigram model can distinguish different organisms;there exist organism-specific usages of "phrases" in protein sequences.It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence,structure and function.

  5. Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set.

    Science.gov (United States)

    Pilipenko, Valentina V; He, Hua; Kurowski, Brad G; Alexander, Eileen S; Zhang, Xue; Ding, Lili; Mersha, Tesfaye B; Kottyan, Leah; Fardo, David W; Martin, Lisa J

    2014-01-01

    Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Given the low mutation rate, Mendelian inheritance errors are likely as a result of erroneous genotype calls. Thus, our goal was to identify the characteristics that determine Mendelian inheritance errors. To accomplish this, we used chromosome 3 whole genome sequencing family based data from the Genetic Analysis Workshop 18. Mendelian inheritance errors were provided as part of the GAW18 data set. Additionally, for binary variants we calculated Mendelian inheritance errors using PLINK. Based on our analysis, nonbinary single-nucleotide variants have an inherently high number of Mendelian inheritance errors. Furthermore, in binary variants, Mendelian inheritance errors are not randomly distributed. Indeed, we identified 3 Mendelian inheritance error peaks that were enriched with repetitive elements. However, these peaks can be lessened with the inclusion of a single filter from the sequencing file. In summary, we demonstrated that erroneous sequencing calls are nonrandomly distributed across the genome and quality control metrics can dramatically reduce the number of mendelian inheritance errors. Appropriate quality control will allow optimal use of genetic data to realize the full potential of whole genome sequencing.

  6. Targeted analysis of whole genome sequence data to diagnose genetic cardiomyopathy.

    Science.gov (United States)

    Golbus, Jessica R; Puckelwartz, Megan J; Dellefave-Castillo, Lisa; Fahrenbach, John P; Nelakuditi, Viswateja; Pesce, Lorenzo L; Pytel, Peter; McNally, Elizabeth M

    2014-12-01

    Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of >50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift toward comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused on 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1 to 14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and segregation analysis, where available. Three of 3 previously identified primary mutations were detected by this analysis. In 6 subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and had additional pathological correlation to provide evidence for causality. For 2 subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. These pilot data demonstrate that ≈30 to 40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes. © 2014 American Heart Association, Inc.

  7. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat

    Directory of Open Access Journals (Sweden)

    Huajing Teng

    2016-07-01

    Full Text Available Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches.

  8. Rapid Identification of Potential Drugs for Diabetic Nephropathy Using Whole-Genome Expression Profiles of Glomeruli

    Directory of Open Access Journals (Sweden)

    Jingsong Shi

    2016-01-01

    Full Text Available Objective. To investigate potential drugs for diabetic nephropathy (DN using whole-genome expression profiles and the Connectivity Map (CMAP. Methodology. Eighteen Chinese Han DN patients and six normal controls were included in this study. Whole-genome expression profiles of microdissected glomeruli were measured using the Affymetrix human U133 plus 2.0 chip. Differentially expressed genes (DEGs between late stage and early stage DN samples and the CMAP database were used to identify potential drugs for DN using bioinformatics methods. Results. (1 A total of 1065 DEGs (FDR 1.5 were found in late stage DN patients compared with early stage DN patients. (2 Piperlongumine, 15d-PGJ2 (15-delta prostaglandin J2, vorinostat, and trichostatin A were predicted to be the most promising potential drugs for DN, acting as NF-κB inhibitors, histone deacetylase inhibitors (HDACIs, PI3K pathway inhibitors, or PPARγ agonists, respectively. Conclusion. Using whole-genome expression profiles and the CMAP database, we rapidly predicted potential DN drugs, and therapeutic potential was confirmed by previously published studies. Animal experiments and clinical trials are needed to confirm both the safety and efficacy of these drugs in the treatment of DN.

  9. Personalized oncogenomics: clinical experience with malignant peritoneal mesothelioma using whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Brandon S Sheffield

    Full Text Available Peritoneal mesothelioma is a rare and sometimes lethal malignancy that presents a clinical challenge for both diagnosis and management. Recent studies have led to a better understanding of the molecular biology of peritoneal mesothelioma. Translation of the emerging data into better treatments and outcome is needed. From two patients with peritoneal mesothelioma, we derived whole genome sequences, RNA expression profiles, and targeted deep sequencing data. Molecular data were made available for translation into a clinical treatment plan. Treatment responses and outcomes were later examined in the context of molecular findings. Molecular studies presented here provide the first reported whole genome sequences of peritoneal mesothelioma. Mutations in known mesothelioma-related genes NF2, CDKN2A, LATS2, amongst others, were identified. Activation of MET-related signaling pathways was demonstrated in both cases. A hypermutated phenotype was observed in one case (434 vs. 18 single nucleotide variants and was associated with a favourable outcome despite sarcomatoid histology and multifocal disease. This study represents the first report of whole genome analyses of peritoneal mesothelioma, a key step in the understanding and treatment of this disease.

  10. Personalized oncogenomics: clinical experience with malignant peritoneal mesothelioma using whole genome sequencing.

    Science.gov (United States)

    Sheffield, Brandon S; Tinker, Anna V; Shen, Yaoqing; Hwang, Harry; Li-Chang, Hector H; Pleasance, Erin; Ch'ng, Carolyn; Lum, Amy; Lorette, Julie; McConnell, Yarrow J; Sun, Sophie; Jones, Steven J M; Gown, Allen M; Huntsman, David G; Schaeffer, David F; Churg, Andrew; Yip, Stephen; Laskin, Janessa; Marra, Marco A

    2015-01-01

    Peritoneal mesothelioma is a rare and sometimes lethal malignancy that presents a clinical challenge for both diagnosis and management. Recent studies have led to a better understanding of the molecular biology of peritoneal mesothelioma. Translation of the emerging data into better treatments and outcome is needed. From two patients with peritoneal mesothelioma, we derived whole genome sequences, RNA expression profiles, and targeted deep sequencing data. Molecular data were made available for translation into a clinical treatment plan. Treatment responses and outcomes were later examined in the context of molecular findings. Molecular studies presented here provide the first reported whole genome sequences of peritoneal mesothelioma. Mutations in known mesothelioma-related genes NF2, CDKN2A, LATS2, amongst others, were identified. Activation of MET-related signaling pathways was demonstrated in both cases. A hypermutated phenotype was observed in one case (434 vs. 18 single nucleotide variants) and was associated with a favourable outcome despite sarcomatoid histology and multifocal disease. This study represents the first report of whole genome analyses of peritoneal mesothelioma, a key step in the understanding and treatment of this disease.

  11. Enhanced de novo assembly of high throughput pyrosequencing data using whole genome mapping.

    Science.gov (United States)

    Onmus-Leone, Fatma; Hang, Jun; Clifford, Robert J; Yang, Yu; Riley, Matthew C; Kuschner, Robert A; Waterman, Paige E; Lesho, Emil P

    2013-01-01

    Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity.

  12. Whole-genome sequencing of uropathogenic Escherichia coli reveals long evolutionary history of diversity and virulence.

    Science.gov (United States)

    Lo, Yancy; Zhang, Lixin; Foxman, Betsy; Zöllner, Sebastian

    2015-08-01

    Uropathogenic Escherichia coli (UPEC) are phenotypically and genotypically very diverse. This diversity makes it challenging to understand the evolution of UPEC adaptations responsible for causing urinary tract infections (UTI). To gain insight into the relationship between evolutionary divergence and adaptive paths to uropathogenicity, we sequenced at deep coverage (190×) the genomes of 19 E. coli strains from urinary tract infection patients from the same geographic area. Our sample consisted of 14 UPEC isolates and 5 non-UTI-causing (commensal) rectal E. coli isolates. After identifying strain variants using de novo assembly-based methods, we clustered the strains based on pairwise sequence differences using a neighbor-joining algorithm. We examined evolutionary signals on the whole-genome phylogeny and contrasted these signals with those found on gene trees constructed based on specific uropathogenic virulence factors. The whole-genome phylogeny showed that the divergence between UPEC and commensal E. coli strains without known UPEC virulence factors happened over 32 million generations ago. Pairwise diversity between any two strains was also high, suggesting multiple genetic origins of uropathogenic strains in a small geographic region. Contrasting the whole-genome phylogeny with three gene trees constructed from common uropathogenic virulence factors, we detected no selective advantage of these virulence genes over other genomic regions. These results suggest that UPEC acquired uropathogenicity long time ago and used it opportunistically to cause extraintestinal infections.

  13. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    Science.gov (United States)

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-07-07

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches.

  14. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Johanna Hasmats

    Full Text Available Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74% of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  15. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Science.gov (United States)

    Hasmats, Johanna; Gréen, Henrik; Orear, Cedric; Validire, Pierre; Huss, Mikael; Käller, Max; Lundeberg, Joakim

    2014-01-01

    Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74%) of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  16. Whole-Genome Sequencing in Microbial Forensic Analysis of Gamma-Irradiated Microbial Materials.

    Science.gov (United States)

    Broomall, Stacey M; Ait Ichou, Mohamed; Krepps, Michael D; Johnsky, Lauren A; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; Betters, Janet L; Redmond, Brady W; Rivers, Bryan A; Liem, Alvin T; Hill, Jessica M; Fochler, Edward T; Roth, Pierce A; Rosenzweig, C Nicole; Skowronski, Evan W; Gibbons, Henry S

    2015-11-13

    Effective microbial forensic analysis of materials used in a potential biological attack requires robust methods of morphological and genetic characterization of the attack materials in order to enable the attribution of the materials to potential sources and to exclude other potential sources. The genetic homogeneity and potential intersample variability of many of the category A to C bioterrorism agents offer a particular challenge to the generation of attributive signatures, potentially requiring whole-genome or proteomic approaches to be utilized. Currently, irradiation of mail is standard practice at several government facilities judged to be at particularly high risk. Thus, initial forensic signatures would need to be recovered from inactivated (nonviable) material. In the study described in this report, we determined the effects of high-dose gamma irradiation on forensic markers of bacterial biothreat agent surrogate organisms with a particular emphasis on the suitability of genomic DNA (gDNA) recovered from such sources as a template for whole-genome analysis. While irradiation of spores and vegetative cells affected the retention of Gram and spore stains and sheared gDNA into small fragments, we found that irradiated material could be utilized to generate accurate whole-genome sequence data on the Illumina and Roche 454 sequencing platforms. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  17. Selective Whole-Genome Amplification Is a Robust Method That Enables Scalable Whole-Genome Sequencing of Plasmodium vivax from Unprocessed Clinical Samples

    Directory of Open Access Journals (Sweden)

    Annie N. Cowell

    2017-02-01

    Full Text Available Whole-genome sequencing (WGS of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA, which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax, the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens.

  18. Selective Whole-Genome Amplification Is a Robust Method That Enables Scalable Whole-Genome Sequencing of Plasmodium vivax from Unprocessed Clinical Samples

    Science.gov (United States)

    Loy, Dorothy E.; Sundararaman, Sesh A.; Valdivia, Hugo; Fisch, Kathleen; Lescano, Andres G.; Baldeviano, G. Christian; Durand, Salomon; Gerbasi, Vince; Sutherland, Colin J.; Nolder, Debbie; Vinetz, Joseph M.; Hahn, Beatrice H.

    2017-01-01

    ABSTRACT Whole-genome sequencing (WGS) of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA), which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax, the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA) can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP) characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens. PMID:28174312

  19. Whole genome sequencing of M.tuberculosis in Kazakhstan: preliminary data

    Directory of Open Access Journals (Sweden)

    Ulykbek Kairov

    2014-03-01

    Full Text Available Background: Tuberculosis is a major public health problem which infects one third of the world’s population, resulting in more than two million deaths every year. The emergence of whole genome sequencing (WGS technologies as a primary research tool has allowed for the detection of genetic diversity in Mycobacterium tuberculosis (MTB with unprecedented resolution. WGS has been used to address a broad range of topics, including the dynamics of evolution, transmission, and treatment. To our knowledge, studies involving WGS of Kazakhstani strains of M. tuberculosis have not yet been performed. Aim: To perform whole genome sequencing of M. tuberculosis strains isolated in Kazakhstan and analyze sequence data (first experience and preliminary data. Results: In the present report, we announce the whole-genome sequences of the two clinical isolates of Mycobacterium tuberculosis, MTB-489 and MTB-476, isolated from the Almaty region. These strains were part of a repository that was created during our project “Creating prerequisites of personalized approach in the diagnosis and treatment of tuberculosis, based on whole genome-sequencing of M. tuberculosis”. Two strains were isolated from sputum samples of patients P1 and P2. Phenotypically, two isolates were drug-susceptible M. tuberculosis. Sequence data was compared with the publicly available data on M. tuberculosis laboratory strain H37Rv and others. The sequencing of the strains was performed on a Roche 454 GS FLX+ next-generation sequencing platform using a standard protocol for a shotgun genome library. The whole genome sequencing was performed for two M.tuberculosis isolates MTB-476 and MTB-489. 96 M bp with an average read length of 520 bp, approximately 21.8X coverage and 104.2 M bp with an average read length of 589 bp and approximately 23.7X coverage were generated for the MTB-476 and MTB-489, respectively. The genome of MTB-476 consists of 257 contigs, 4204 CDS, 46 tRNAs and 3 rRNAs. MTB

  20. Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Nora Rieber

    Full Text Available The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina's HiSeq2000, Life Technologies' SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics' technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies' platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other

  1. Characterization of Genomic Variants Associated with Scout and Recruit Behavioral Castes in Honey Bees Using Whole-Genome Sequencing

    National Research Council Canada - National Science Library

    Southey, Bruce R; Zhu, Ping; Carr-Markell, Morgan K; Liang, Zhengzheng S; Zayed, Amro; Li, Ruiqiang; Robinson, Gene E; Rodriguez-Zas, Sandra L

    2016-01-01

    .... Whole-genome sequencing of 44 Apis mellifera scouts and recruits was undertaken to detect variants and further understand the genetic architecture underlying the behavioral differences between scouts and recruits...

  2. The role of whole genome sequencing in antimicrobial susceptibility testing of bacteria: report from the EUCAST Subcommittee

    DEFF Research Database (Denmark)

    Ellington, M J; Ekelund, O; Aarestrup, Frank Møller

    2017-01-01

    Whole genome sequencing (WGS) offers the potential to predict antimicrobial susceptibility from a single assay. The European Committee on Antimicrobial Susceptibility Testing established a subcommittee to review the current development status of WGS for bacterial antimicrobial susceptibility test...

  3. Whole-genome sequencing identifies emergence of a quinolone resistance mutation in a case of Stenotrophomonas maltophilia bacteremia.

    Science.gov (United States)

    Pak, Theodore R; Altman, Deena R; Attie, Oliver; Sebra, Robert; Hamula, Camille L; Lewis, Martha; Deikus, Gintaras; Newman, Leah C; Fang, Gang; Hand, Jonathan; Patel, Gopi; Wallach, Fran; Schadt, Eric E; Huprikar, Shirish; van Bakel, Harm; Kasarskis, Andrew; Bashir, Ali

    2015-11-01

    Whole-genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient before and after development of levofloxacin resistance were assembled de novo and differed by one single-nucleotide variant in smeT, a repressor for multidrug efflux operon smeDEF. Along with sequenced isolates from five contemporaneous cases, they displayed considerable diversity compared against all published complete genomes. Whole-genome sequencing and complete assembly can conclusively identify resistance mechanisms emerging in S. maltophilia strains during clinical therapy.

  4. Use of whole genome sequencing to determine the microevolution of Mycobacterium tuberculosis during an outbreak.

    Directory of Open Access Journals (Sweden)

    Midori Kato-Maeda

    Full Text Available RATIONALE: Current tools available to study the molecular epidemiology of tuberculosis do not provide information about the directionality and sequence of transmission for tuberculosis cases occurring over a short period of time, such as during an outbreak. Recently, whole genome sequencing has been used to study molecular epidemiology of Mycobacterium tuberculosis over short time periods. OBJECTIVE: To describe the microevolution of M. tuberculosis during an outbreak caused by one drug-susceptible strain. METHOD AND MEASUREMENTS: We included 9 patients with tuberculosis diagnosed during a period of 22 months, from a population-based study of the molecular epidemiology in San Francisco. Whole genome sequencing was performed using Illumina's sequencing by synthesis technology. A custom program written in Python was used to determine single nucleotide polymorphisms which were confirmed by PCR product Sanger sequencing. MAIN RESULTS: We obtained an average of 95.7% (94.1-96.9% coverage for each isolate and an average fold read depth of 73 (1 to 250. We found 7 single nucleotide polymorphisms among the 9 isolates. The single nucleotide polymorphisms data confirmed all except one known epidemiological link. The outbreak strain resulted in 5 bacterial variants originating from the index case A1 with 0-2 mutations per transmission event that resulted in a secondary case. CONCLUSIONS: Whole genome sequencing analysis from a recent outbreak of tuberculosis enabled us to identify microevolutionary events observable during transmission, to determine 0-2 single nucleotide polymorphisms per transmission event that resulted in a secondary case, and to identify new epidemiologic links in the chain of transmission.

  5. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    Energy Technology Data Exchange (ETDEWEB)

    Fröhlich, Eleonore, E-mail: eleonore.froehlich@medunigraz.at [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Meindl, Claudia; Wagner, Karin [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Leitinger, Gerd [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Institute for Cell Biology, Histology and Embryology, Medical University of Graz, Harrachgasse 21, 8010 Graz (Austria); Roblegg, Eva [Institute of Pharmaceutical Sciences, Department of Pharmaceutical Technology, Karl-Franzens-University of Graz, Universitätsplatz 1, 8010 Graz (Austria)

    2014-10-15

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays for NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. - Highlights: • Regulated functions were screened using whole genome expression assays. • Polystyrene particles regulated more genes than short carbon nanotubes. • Protein coating of polystyrene particles did not change regulation pattern. • Functions regulated by microarray were confirmed by cell-based assay.

  6. Whole genome sequence analysis suggests intratumoral heterogeneity in dissemination of breast cancer to lymph nodes.

    Directory of Open Access Journals (Sweden)

    Kevin Blighe

    Full Text Available BACKGROUND: Intratumoral heterogeneity may help drive resistance to targeted therapies in cancer. In breast cancer, the presence of nodal metastases is a key indicator of poorer overall survival. The aim of this study was to identify somatic genetic alterations in early dissemination of breast cancer by whole genome next generation sequencing (NGS of a primary breast tumor, a matched locally-involved axillary lymph node and healthy normal DNA from blood. METHODS: Whole genome NGS was performed on 12 µg (range 11.1-13.3 µg of DNA isolated from fresh-frozen primary breast tumor, axillary lymph node and peripheral blood following the DNA nanoball sequencing protocol. Single nucleotide variants, insertions, deletions, and substitutions were identified through a bioinformatic pipeline and compared to CIN25, a key set of genes associated with tumor metastasis. RESULTS: Whole genome sequencing revealed overlapping variants between the tumor and node, but also variants that were unique to each. Novel mutations unique to the node included those found in two CIN25 targets, TGIF2 and CCNB2, which are related to transcription cyclin activity and chromosomal stability, respectively, and a unique frameshift in PDS5B, which is required for accurate sister chromatid segregation during cell division. We also identified dominant clonal variants that progressed from tumor to node, including SNVs in TP53 and ARAP3, which mediates rearrangements to the cytoskeleton and cell shape, and an insertion in TOP2A, the expression of which is significantly associated with tumor proliferation and can segregate breast cancers by outcome. CONCLUSION: This case study provides preliminary evidence that primary tumor and early nodal metastasis have largely overlapping somatic genetic alterations. There were very few mutations unique to the involved node. However, significant conclusions regarding early dissemination needs analysis of a larger number of patient samples.

  7. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic

    Directory of Open Access Journals (Sweden)

    Sealfon Rachel

    2012-09-01

    Full Text Available Abstract Background Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x; four of the seven isolates were previously sequenced. Results Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961, 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Conclusions Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  8. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling.

    Science.gov (United States)

    Meinel, Thomas; Krause, Antje

    2012-01-01

    In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.

  9. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing

    OpenAIRE

    Bowers, John E.; Pearl, Stephanie A; Burke, John M.

    2016-01-01

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads f...

  10. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation.

    Science.gov (United States)

    Sharma, C; Kumar, N; Pandey, R; Meis, J F; Chowdhary, A

    2016-09-01

    Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris strains from four Indian hospitals were highly related, suggesting clonal transmission. Further, all C. auris isolates originated from cases of fungaemia and were resistant to fluconazole (MIC >64 mg/L).

  11. Whole genome sequences and annotation of Micrococcus luteus SUBG006, a novel phytopathogen of mango

    Directory of Open Access Journals (Sweden)

    Purvi M. Rakhashiya

    2015-12-01

    Full Text Available Actinobaceria, Micrococcus luteus SUBG006 was isolated from infected leaves of Mangifera indica L. vr. Nylon in Rajkot, (22.30°N, 70.78°E, Gujarat, India. The genome size is 3.86 Mb with G + C content of 69.80% and contains 112 rRNA sequences (5S, 16S and 23S. The whole genome sequencing has been deposited in DDBJ/EMBL/GenBank under the accession number JOKP00000000.

  12. Sequence Determination from Overlapping Fragments: A Simple Model of Whole-Genome Shotgun Sequencing

    Science.gov (United States)

    Derrida, Bernard; Fink, Thomas M.

    2002-02-01

    Assembling fragments randomly sampled from along a sequence is the basis of whole-genome shotgun sequencing, a technique used to map the DNA of the human and other genomes. We calculate the probability that a random sequence can be recovered from a collection of overlapping fragments. We provide an exact solution for an infinite alphabet and in the case of constant overlaps. For the general problem we apply two assembly strategies and give the probability that the assembly puzzle can be solved in the limit of infinitely many fragments.

  13. When aging meets microgravity: whole genome promoters and enchancers transcription landscape in zebrafish onboard ISS

    Science.gov (United States)

    Arshanovskii, Kirill; Gusev, Oleg; Sychev, Vladimir; Poddubko, Svetlana; Deviatiiarov, Ruslan

    2016-07-01

    In order to gen new insights of gene regulation changes under conditions of real spaceflight, we have conducted whole-genome analysis of dynamic of promotes and enhancers transcriptional changes in zebrafish during prolonged exposure to real spaceflight. In the frame of Russia-Japan joint experiments "Aquatic Habitat"-"Aquarium" we have conducted Cap Analysis of Gene Expression (CAGE) assay of zebrafish in the rage from 7 to 40 days of real spaceflight onboard ISS. The analysis showed that both gene expression patterns and architecture of shapes and types of the promoters are affected by spaceflight environment.

  14. Whole genome shotgun sequence of Bacillus amyloliquefaciens TF28, a biocontrol entophytic bacterium

    OpenAIRE

    Zhang, Shumei; Jiang, Wei; Li, Jing; Meng, Liqiang; Cao, Xu; Hu, Jihua; Liu, Yushuai; Chen, Jingyu; Sha, Changqing

    2016-01-01

    Bacillus amyloliquefaciens TF28 is a biocontrol endophytic bacterium that is capable of inhibition of a broad range of plant pathogenic fungi. The strain has the potential to be developed into a biocontrol agent for use in agriculture. Here we report the whole-genome shotgun sequence of the strain. The genome size of B. amyloliquefaciens TF28 is 3,987,635?bp which consists of 3754 protein-coding genes, 65 tandem repeat sequences, 47 minisatellite DNA, 2 microsatellite DNA, 63 tRNA, 7rRNA, 6 s...

  15. A green-cotyledon/stay-green mutant exemplifies the ancient whole-genome duplications in soybean.

    Science.gov (United States)

    Nakano, Michiharu; Yamada, Tetsuya; Masuda, Yu; Sato, Yutaka; Kobayashi, Hideki; Ueda, Hiroaki; Morita, Ryouhei; Nishimura, Minoru; Kitamura, Keisuke; Kusaba, Makoto

    2014-10-01

    The recent whole-genome sequencing of soybean (Glycine max) revealed that soybean experienced whole-genome duplications 59 million and 13 million years ago, and it has an octoploid-like genome in spite of its diploid nature. We analyzed a natural green-cotyledon mutant line, Tenshin-daiseitou. The physiological analysis revealed that Tenshin-daiseitou shows a non-functional stay-green phenotype in senescent leaves, which is similar to that of the mutant of Mendel's green-cotyledon gene I, the ortholog of SGR in pea. The identification of gene mutations and genetic segregation analysis suggested that defects in GmSGR1 and GmSGR2 were responsible for the green-cotyledon/stay-green phenotype of Tenshin-daiseitou, which was confirmed by RNA interference (RNAi) transgenic soybean experiments using GmSGR genes. The characterized green-cotyledon double mutant d1d2 was found to have the same mutations, suggesting that GmSGR1 and GmSGR2 are D1 and D2. Among the examined d1d2 strains, the d1d2 strain K144a showed a lower Chl a/b ratio in mature seeds than other strains but not in senescent leaves, suggesting a seed-specific genetic factor of the Chl composition in K144a. Analysis of the soybean genome sequence revealed four genomic regions with microsynteny to the Arabidopsis SGR1 region, which included the GmSGR1 and GmSGR2 regions. The other two regions contained GmSGR3a/GmSGR3b and GmSGR4, respectively, which might be pseudogenes or genes with a function that is unrelated to Chl degradation during seed maturation and leaf senescence. These GmSGR genes were thought to be produced by the two whole-genome duplications, and they provide a good example of such whole-genome duplication events in the evolution of the soybean genome.

  16. Whole genome shotgun sequence of Bacillus amyloliquefaciens TF28, a biocontrol entophytic bacterium.

    Science.gov (United States)

    Zhang, Shumei; Jiang, Wei; Li, Jing; Meng, Liqiang; Cao, Xu; Hu, Jihua; Liu, Yushuai; Chen, Jingyu; Sha, Changqing

    2016-01-01

    Bacillus amyloliquefaciens TF28 is a biocontrol endophytic bacterium that is capable of inhibition of a broad range of plant pathogenic fungi. The strain has the potential to be developed into a biocontrol agent for use in agriculture. Here we report the whole-genome shotgun sequence of the strain. The genome size of B. amyloliquefaciens TF28 is 3,987,635 bp which consists of 3754 protein-coding genes, 65 tandem repeat sequences, 47 minisatellite DNA, 2 microsatellite DNA, 63 tRNA, 7rRNA, 6 sRNA, 3 prophage and CRISPR domains.

  17. Microfluidic screening and whole-genome sequencing identifies mutations associated with improved protein secretion by yeast

    DEFF Research Database (Denmark)

    Huang, Mingtao; Bai, Yunpeng; Sjostrom, Staffan L.

    2015-01-01

    interest in improving its protein secretion capacity. Due to the complexity of the secretory machinery in eukaryotic cells, it is difficult to apply rational engineering for construction of improved strains. Here we used high-throughput microfluidics for the screening of yeast libraries, generated by UV...... to construct efficient cell factories for protein secretion. The combined use of microfluidics screening and whole-genome sequencing to map the mutations associated with the improved phenotype can easily be adapted for other products and cell types to identify novel engineering targets, and this approach could...

  18. Increase of ethanol tolerance of Saccharomyces cerevisiae by error-prone whole genome amplification.

    Science.gov (United States)

    Luhe, Annette Lin; Tan, Lily; Wu, Jinchuan; Zhao, Hua

    2011-05-01

    Saccharomyces cerevisiae was transformed for higher ethanol tolerance by error-prone whole genome amplification. The resulting PCR products were transformed back to the parental strain for homologous recombination to create a library of mutants with the perturbed genomic networks. A few rounds of transformation led to the isolation of mutants that grew in 9% (v/v) ethanol and 100 g glucose l(-1) compared to untransformed yeast which grew only at 6% (v/v) ethanol and 100 g glucose l(-1). © Springer Science+Business Media B.V. 2011

  19. Reflections on the cost of "low-cost" whole genome sequencing: framing the health policy debate.

    Directory of Open Access Journals (Sweden)

    Timothy Caulfield

    2013-11-01

    Full Text Available The cost of whole genome sequencing is dropping rapidly. There has been a great deal of enthusiasm about the potential for this technological advance to transform clinical care. Given the interest and significant investment in genomics, this seems an ideal time to consider what the evidence tells us about potential benefits and harms, particularly in the context of health care policy. The scale and pace of adoption of this powerful new technology should be driven by clinical need, clinical evidence, and a commitment to put patients at the centre of health care policy.

  20. The effect of whole genome amplification on samples originating from more than one donor

    DEFF Research Database (Denmark)

    Thacker, C.R.; Balogh, M.K.; Børsting, Claus;

    2006-01-01

    In this study, the GenomiPhi(TM) DNA Amplification Kit (Amersham Biosciences) was used to investigate the potential of whole genome amplification (WGA) when considering samples originating from more than one donor. DNA was extracted from blood samples, quantified and normalised before being mixed...... found to match the expected peak ratios regardless of the starting concentration of DNA. With samples mixed in the ratio of 1:7 and 1:15, and when the concentration of starting material was at the manufacturer's lower limit, too few minor component peaks were found to allow for statistical analysis...

  1. Whole genome duplication affects evolvability of flowering time in an autotetraploid plant.

    Directory of Open Access Journals (Sweden)

    Sara L Martin

    Full Text Available Whole genome duplications have occurred recurrently throughout the evolutionary history of eukaryotes. The resulting genetic and phenotypic changes can influence physiological and ecological responses to the environment; however, the impact of genome copy number on evolvability has rarely been examined experimentally. Here, we evaluate the effect of genome duplication on the ability to respond to selection for early flowering time in lines drawn from naturally occurring diploid and autotetraploid populations of the plant Chamerion angustifolium (fireweed. We contrast this with the result of four generations of selection on synthesized neoautotetraploids, whose genic variability is similar to diploids but genome copy number is similar to autotetraploids. In addition, we examine correlated responses to selection in all three groups. Diploid and both extant tetraploid and neoautotetraploid lines responded to selection with significant reductions in time to flowering. Evolvability, measured as realized heritability, was significantly lower in extant tetraploids (^b(T =  0.31 than diploids (^b(T =  0.40. Neotetraploids exhibited the highest evolutionary response (^b(T  =  0.55. The rapid shift in flowering time in neotetraploids was associated with an increase in phenotypic variability across generations, but not with change in genome size or phenotypic correlations among traits. Our results suggest that whole genome duplications, without hybridization, may initially alter evolutionary rate, and that the dynamic nature of neoautopolyploids may contribute to the prevalence of polyploidy throughout eukaryotes.

  2. Whole-genome single-nucleotide-polymorphism analysis for discrimination of Clostridium botulinum group I strains.

    Science.gov (United States)

    Gonzalez-Escalona, Narjol; Timme, Ruth; Raphael, Brian H; Zink, Donald; Sharma, Shashi K

    2014-04-01

    Clostridium botulinum is a genetically diverse Gram-positive bacterium producing extremely potent neurotoxins (botulinum neurotoxins A through G [BoNT/A-G]). The complete genome sequences of three strains harboring only the BoNT/A1 nucleotide sequence are publicly available. Although these strains contain a toxin cluster (HA(+) OrfX(-)) associated with hemagglutinin genes, little is known about the genomes of subtype A1 strains (termed HA(-) OrfX(+)) that lack hemagglutinin genes in the toxin gene cluster. We sequenced the genomes of three BoNT/A1-producing C. botulinum strains: two strains with the HA(+) OrfX(-) cluster (69A and 32A) and one strain with the HA(-) OrfX(+) cluster (CDC297). Whole-genome phylogenic single-nucleotide-polymorphism (SNP) analysis of these strains along with other publicly available C. botulinum group I strains revealed five distinct lineages. Strains 69A and 32A clustered with the C. botulinum type A1 Hall group, and strain CDC297 clustered with the C. botulinum type Ba4 strain 657. This study reports the use of whole-genome SNP sequence analysis for discrimination of C. botulinum group I strains and demonstrates the utility of this analysis in quickly differentiating C. botulinum strains harboring identical toxin gene subtypes. This analysis further supports previous work showing that strains CDC297 and 657 likely evolved from a common ancestor and independently acquired separate BoNT/A1 toxin gene clusters at distinct genomic locations.

  3. Inference of homologous recombination in bacteria using whole-genome sequences.

    Science.gov (United States)

    Didelot, Xavier; Lawson, Daniel; Darling, Aaron; Falush, Daniel

    2010-12-01

    Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available from http://code.google.com/p/clonalorigin/.

  4. Kernel-based whole-genome prediction of complex traits: a review

    Directory of Open Access Journals (Sweden)

    Gota eMorota

    2014-10-01

    Full Text Available Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways, thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  5. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L. via Whole-Genome Resequencing

    Directory of Open Access Journals (Sweden)

    John E. Bowers

    2016-07-01

    Full Text Available Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs of a cross between safflower (Carthamus tinctorius L. and its wild progenitor (C. palaestinus Eig. We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species.

  6. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing.

    Science.gov (United States)

    Bowers, John E; Pearl, Stephanie A; Burke, John M

    2016-07-07

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species.

  7. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce.

    Science.gov (United States)

    Reyes-Chin-Wo, Sebastian; Wang, Zhiwen; Yang, Xinhua; Kozik, Alexander; Arikit, Siwaret; Song, Chi; Xia, Liangfeng; Froenicke, Lutz; Lavelle, Dean O; Truco, María-José; Xia, Rui; Zhu, Shilin; Xu, Chunyan; Xu, Huaqin; Xu, Xun; Cox, Kyle; Korf, Ian; Meyers, Blake C; Michelmore, Richard W

    2017-04-12

    Lettuce (Lactuca sativa) is a major crop and a member of the large, highly successful Compositae family of flowering plants. Here we present a reference assembly for the species and family. This was generated using whole-genome shotgun Illumina reads plus in vitro proximity ligation data to create large superscaffolds; it was validated genetically and superscaffolds were oriented in genetic bins ordered along nine chromosomal pseudomolecules. We identify several genomic features that may have contributed to the success of the family, including genes encoding Cycloidea-like transcription factors, kinases, enzymes involved in rubber biosynthesis and disease resistance proteins that are expanded in the genome. We characterize 21 novel microRNAs, one of which may trigger phasiRNAs from numerous kinase transcripts. We provide evidence for a whole-genome triplication event specific but basal to the Compositae. We detect 26% of the genome in triplicated regions containing 30% of all genes that are enriched for regulatory sequences and depleted for genes involved in defence.

  8. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    Science.gov (United States)

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  9. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

    Directory of Open Access Journals (Sweden)

    Shea N. Gardner

    2014-01-01

    Full Text Available Background. Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results. A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Each group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions. This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.

  10. Whole-Genome Sequencing Analysis of Sapovirus Detected in South Korea.

    Science.gov (United States)

    Choi, Hye Lim; Suh, Chang-Il; Park, Seung-Won; Jin, Ji-Young; Cho, Han-Gil; Paik, Soon-Young

    2015-01-01

    Sapovirus (SaV), a virus residing in the intestines, is one of the important causes of gastroenteritis in human beings. Human SaV genomes are classified into various genogroups and genotypes. Whole-genome analysis and phylogenetic analysis of ROK62, the SaV isolated in South Korea, were carried out. The ROK62 genome of 7429 nucleotides contains 3 open-reading frames (ORF). The genotype of ROK62 is SaV GI-1, and 94% of its nucleotide sequence is identical with other SaVs, namely Manchester and Mc114. Recently, SaV infection has been on the rise throughout the world, particularly in countries neighboring South Korea; however, very few academic studies have been done nationally. As the first whole-genome sequence analysis of SaV in South Korea, this research will help provide reference for the detection of recombination, tracking of epidemic spread, and development of diagnosis methods for SaV.

  11. Whole-Genome Sequencing Analysis of Sapovirus Detected in South Korea.

    Directory of Open Access Journals (Sweden)

    Hye Lim Choi

    Full Text Available Sapovirus (SaV, a virus residing in the intestines, is one of the important causes of gastroenteritis in human beings. Human SaV genomes are classified into various genogroups and genotypes. Whole-genome analysis and phylogenetic analysis of ROK62, the SaV isolated in South Korea, were carried out. The ROK62 genome of 7429 nucleotides contains 3 open-reading frames (ORF. The genotype of ROK62 is SaV GI-1, and 94% of its nucleotide sequence is identical with other SaVs, namely Manchester and Mc114. Recently, SaV infection has been on the rise throughout the world, particularly in countries neighboring South Korea; however, very few academic studies have been done nationally. As the first whole-genome sequence analysis of SaV in South Korea, this research will help provide reference for the detection of recombination, tracking of epidemic spread, and development of diagnosis methods for SaV.

  12. Whole genome sequence typing to investigate the Apophysomyces outbreak following a tornado in Joplin, Missouri, 2011.

    Science.gov (United States)

    Etienne, Kizee A; Gillece, John; Hilsabeck, Remy; Schupp, Jim M; Colman, Rebecca; Lockhart, Shawn R; Gade, Lalitha; Thompson, Elizabeth H; Sutton, Deanna A; Neblett-Fanfair, Robyn; Park, Benjamin J; Turabelidze, George; Keim, Paul; Brandt, Mary E; Deak, Eszter; Engelthaler, David M

    2012-01-01

    Case reports of Apophysomyces spp. in immunocompetent hosts have been a result of traumatic deep implantation of Apophysomyces spp. spore-contaminated soil or debris. On May 22, 2011 a tornado occurred in Joplin, MO, leaving 13 tornado victims with Apophysomyces trapeziformis infections as a result of lacerations from airborne material. We used whole genome sequence typing (WGST) for high-resolution phylogenetic SNP analysis of 17 outbreak Apophysomyces isolates and five additional temporally and spatially diverse Apophysomyces control isolates (three A. trapeziformis and two A. variabilis isolates). Whole genome SNP phylogenetic analysis revealed three clusters of genotypically related or identical A. trapeziformis isolates and multiple distinct isolates among the Joplin group; this indicated multiple genotypes from a single or multiple sources. Though no linkage between genotype and location of exposure was observed, WGST analysis determined that the Joplin isolates were more closely related to each other than to the control isolates, suggesting local population structure. Additionally, species delineation based on WGST demonstrated the need to reassess currently accepted taxonomic classifications of phylogenetic species within the genus Apophysomyces.

  13. Rediscovery by Whole Genome Sequencing: Classical Mutations and Genome Polymorphisms in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    McCluskey, Kevin; Wiest, Aric E.; Grigoriev, Igor V.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Baker, Scott E.

    2011-06-02

    Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.

  14. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce

    Science.gov (United States)

    Reyes-Chin-Wo, Sebastian; Wang, Zhiwen; Yang, Xinhua; Kozik, Alexander; Arikit, Siwaret; Song, Chi; Xia, Liangfeng; Froenicke, Lutz; Lavelle, Dean O.; Truco, María-José; Xia, Rui; Zhu, Shilin; Xu, Chunyan; Xu, Huaqin; Xu, Xun; Cox, Kyle; Korf, Ian; Meyers, Blake C.; Michelmore, Richard W.

    2017-01-01

    Lettuce (Lactuca sativa) is a major crop and a member of the large, highly successful Compositae family of flowering plants. Here we present a reference assembly for the species and family. This was generated using whole-genome shotgun Illumina reads plus in vitro proximity ligation data to create large superscaffolds; it was validated genetically and superscaffolds were oriented in genetic bins ordered along nine chromosomal pseudomolecules. We identify several genomic features that may have contributed to the success of the family, including genes encoding Cycloidea-like transcription factors, kinases, enzymes involved in rubber biosynthesis and disease resistance proteins that are expanded in the genome. We characterize 21 novel microRNAs, one of which may trigger phasiRNAs from numerous kinase transcripts. We provide evidence for a whole-genome triplication event specific but basal to the Compositae. We detect 26% of the genome in triplicated regions containing 30% of all genes that are enriched for regulatory sequences and depleted for genes involved in defence. PMID:28401891

  15. Integration of transcriptome and whole genomic resequencing data to identify key genes affecting swine fat deposition.

    Directory of Open Access Journals (Sweden)

    Kai Xing

    Full Text Available Fat deposition is highly correlated with the growth, meat quality, reproductive performance and immunity of pigs. Fatty acid synthesis takes place mainly in the adipose tissue of pigs; therefore, in this study, a high-throughput massively parallel sequencing approach was used to generate adipose tissue transcriptomes from two groups of Songliao black pigs that had opposite backfat thickness phenotypes. The total number of paired-end reads produced for each sample was in the range of 39.29-49.36 millions. Approximately 188 genes were differentially expressed in adipose tissue and were enriched for metabolic processes, such as fatty acid biosynthesis, lipid synthesis, metabolism of fatty acids, etinol, caffeine and arachidonic acid and immunity. Additionally, many genetic variations were detected between the two groups through pooled whole-genome resequencing. Integration of transcriptome and whole-genome resequencing data revealed important genomic variations among the differentially expressed genes for fat deposition, for example, the lipogenic genes. Further studies are required to investigate the roles of candidate genes in fat deposition to improve pig breeding programs.

  16. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing

    Science.gov (United States)

    Ranjan, Ravi; Rani, Asha; Metwally, Ahmed; McGee, Halvor S.; Perkins, David L.

    2016-01-01

    The human microbiome has emerged as a major player in regulating human health and disease. Translation studies of the microbiome have the potential to indicate clinical applications such as fecal transplants and probiotics. However, one major issue is accurate identification of microbes constituting the microbiota. Studies of the microbiome have frequently utilized sequencing of the conserved 16S ribosomal RNA (rRNA) gene. We present a comparative study of an alternative approach using shotgun whole genome sequencing (WGS). In the present study, we analyzed the human fecal microbiome compiling a total of 194.1×106 reads from a single sample using multiple sequencing methods and platforms. Specifically, after establishing the reproducibility of our methods with extensive multiplexing, we compared: 1) The 16S rRNA amplicon versus the WGS method, 2) the Illumina HiSeq versus MiSeq platforms, 3) the analysis of reads versus de novo assembled contigs, and 4) the effect of shorter versus longer reads. Our study demonstrates that shotgun whole genome sequencing has multiple advantages compared with the 16S amplicon method including enhanced detection of bacterial species, increased detection of diversity and increased prediction of genes. In addition, increased length, either due to longer reads or the assembly of contigs, improved the accuracy of species detection. PMID:26718401

  17. Comparative whole genome sequence analysis of wild-type and cidofovir-resistant monkeypoxvirus

    Directory of Open Access Journals (Sweden)

    Huggins John

    2010-05-01

    Full Text Available Abstract We performed whole genome sequencing of a cidofovir {[(S-1-(3-hydroxy-2-phosphonylmethoxy-propyl cytosine] [HPMPC]}-resistant (CDV-R strain of Monkeypoxvirus (MPV. Whole-genome comparison with the wild-type (WT strain revealed 55 single-nucleotide polymorphisms (SNPs and one tandem-repeat contraction. Over one-third of all identified SNPs were located within genes comprising the poxvirus replication complex, including the DNA polymerase, RNA polymerase, mRNA capping methyltransferase, DNA processivity factor, and poly-A polymerase. Four polymorphic sites were found within the DNA polymerase gene. DNA polymerase mutations observed at positions 314 and 684 in MPV were consistent with CDV-R loci previously identified in Vaccinia virus (VACV. These data suggest the mechanism of CDV resistance may be highly conserved across Orthopoxvirus (OPV species. SNPs were also identified within virulence genes such as the A-type inclusion protein, serine protease inhibitor-like protein SPI-3, Schlafen ATPase and thymidylate kinase, among others. Aberrant chain extension induced by CDV may lead to diverse alterations in gene expression and viral replication that may result in both adaptive and attenuating mutations. Defining the potential contribution of substitutions in the replication complex and RNA processing machinery reported here may yield further insight into CDV resistance and may augment current therapeutic development strategies.

  18. Whole genome analysis of linezolid resistance in Streptococcus pneumoniae reveals resistance and compensatory mutations

    Directory of Open Access Journals (Sweden)

    Légaré Danielle

    2011-10-01

    Full Text Available Abstract Background Several mutations were present in the genome of Streptococcus pneumoniae linezolid-resistant strains but the role of several of these mutations had not been experimentally tested. To analyze the role of these mutations, we reconstituted resistance by serial whole genome transformation of a novel resistant isolate into two strains with sensitive background. We sequenced the parent mutant and two independent transformants exhibiting similar minimum inhibitory concentration to linezolid. Results Comparative genomic analyses revealed that transformants acquired G2576T transversions in every gene copy of 23S rRNA and that the number of altered copies correlated with the level of linezolid resistance and cross-resistance to florfenicol and chloramphenicol. One of the transformants also acquired a mutation present in the parent mutant leading to the overexpression of an ABC transporter (spr1021. The acquisition of these mutations conferred a fitness cost however, which was further enhanced by the acquisition of a mutation in a RNA methyltransferase implicated in resistance. Interestingly, the fitness of the transformants could be restored in part by the acquisition of altered copies of the L3 and L16 ribosomal proteins and by mutations leading to the overexpression of the spr1887 ABC transporter that were present in the original linezolid-resistant mutant. Conclusions Our results demonstrate the usefulness of whole genome approaches at detecting major determinants of resistance as well as compensatory mutations that alleviate the fitness cost associated with resistance.

  19. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence.

    Directory of Open Access Journals (Sweden)

    Frederick E Dewey

    2011-09-01

    Full Text Available Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs. We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.

  20. Computel: computation of mean telomere length from whole-genome next-generation sequencing data.

    Science.gov (United States)

    Nersisyan, Lilit; Arakelyan, Arsen

    2015-01-01

    Telomeres are the ends of eukaryotic chromosomes, consisting of consecutive short repeats that protect chromosome ends from degradation. Telomeres shorten with each cell division, leading to replicative cell senescence. Deregulation of telomere length homeostasis is associated with the development of various age-related diseases and cancers. A number of experimental techniques exist for telomere length measurement; however, until recently, the absence of tools for extracting telomere lengths from high-throughput sequencing data has significantly obscured the association of telomere length with molecular processes in normal and diseased conditions. We have developed Computel, a program in R for computing mean telomere length from whole-genome next-generation sequencing data. Computel is open source, and is freely available at https://github.com/lilit-nersisyan/computel. It utilizes a short-read alignment-based approach and integrates various popular tools for sequencing data analysis. We validated it with synthetic and experimental data, and compared its performance with the previously available software. The results have shown that Computel outperforms existing software in accuracy, independence of results from sequencing conditions, stability against inherent sequencing errors, and better ability to distinguish pure telomeric sequences from interstitial telomeric repeats. By providing a highly reliable methodology for determining telomere lengths from whole-genome sequencing data, Computel should help to elucidate the role of telomeres in cellular health and disease.

  1. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing.

    Science.gov (United States)

    Ranjan, Ravi; Rani, Asha; Metwally, Ahmed; McGee, Halvor S; Perkins, David L

    2016-01-22

    The human microbiome has emerged as a major player in regulating human health and disease. Translational studies of the microbiome have the potential to indicate clinical applications such as fecal transplants and probiotics. However, one major issue is accurate identification of microbes constituting the microbiota. Studies of the microbiome have frequently utilized sequencing of the conserved 16S ribosomal RNA (rRNA) gene. We present a comparative study of an alternative approach using whole genome shotgun sequencing (WGS). In the present study, we analyzed the human fecal microbiome compiling a total of 194.1 × 10(6) reads from a single sample using multiple sequencing methods and platforms. Specifically, after establishing the reproducibility of our methods with extensive multiplexing, we compared: 1) The 16S rRNA amplicon versus the WGS method, 2) the Illumina HiSeq versus MiSeq platforms, 3) the analysis of reads versus de novo assembled contigs, and 4) the effect of shorter versus longer reads. Our study demonstrates that whole genome shotgun sequencing has multiple advantages compared with the 16S amplicon method including enhanced detection of bacterial species, increased detection of diversity and increased prediction of genes. In addition, increased length, either due to longer reads or the assembly of contigs, improved the accuracy of species detection.

  2. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  3. Whole genome phylogeny of Prochlorococcus marinus group of cyanobacteria: genome alignment and overlapping gene approach.

    Science.gov (United States)

    Prabha, Ratna; Singh, Dhananjaya P; Gupta, Shailendra K; Rai, Anil

    2014-06-01

    Prochlorococcus is the smallest known oxygenic phototrophic marine cyanobacterium dominating the mid-latitude oceans. Physiologically and genetically distinct P. marinus isolates from many oceans in the world were assigned two different groups, a tightly clustered high-light (HL)-adapted and a divergent low-light (LL-) adapted clade. Phylogenetic analysis of this cyanobacterium on the basis of 16S rRNA and other conserved genes did not show consistency with its phenotypic behavior. We analyzed phylogeny of this genus on the basis of complete genome sequences through genome alignment, overlapping-gene content and gene-order approach. Phylogenetic tree of P. marinus obtained by comparing whole genome sequences in contrast to that based on 16S rRNA gene, corresponded well with the HL/LL ecotypic distinction of twelve strains and showed consistency with phenotypic classification of P. marinus. Evidence for the horizontal descent and acquisition of genes within and across the genus was observed. Many genes involved in metabolic functions were found to be conserved across these genomes and many were continuously gained by different strains as per their needs during the course of their evolution. Consistency in the physiological and genetic phylogeny based on whole genome sequence is established. These observations improve our understanding about the adaptation and diversification of these organisms under evolutionary pressure.

  4. Genetic diagnosis of a Chinese multiple endocrine neoplasia type 2A family through whole genome sequencing

    Indian Academy of Sciences (India)

    ZHEN-FANG DU; PENG-FEI LI; JIAN-QIANG ZHAO; ZHI-LIE CAO; FENG LI; JU-MING MA; XIAO-PING QI

    2017-06-01

    Approximately 98% of patients with multiple endocrine neoplasia type 2A (MEN 2A) have an identifiable RETmutation. Prophylactic or early total thyroidectomy or pheochromocytoma/parathyroid removal in patients can bepreventative or curative and has become standard management. The general strategy for RET screening on familymembers at risk is to sequence the most commonly affected exons and, if negative, to extend sequencing to additionalexons. However, different families with MEN 2A due to the same RET mutation often have significant variability inthe clinical exhibition of disease and aggressiveness of the MTC, which implies additional genetic loci exsit beyondRET coding region. Whole genome sequencing (WGS) greatly expands the breadth of screening from genes associatedwith a particular disease to the whole genome and, potentially, all the information that the genome containsabout diseases or traits. This is presumably due to additive effect of disease modifying factors. In this study, weperformed WGS on a typical Chinese MEN 2A proband and identified the pathogenic RET p.C634R mutation. Wealso identified several neutral variants within RET and pheochromocytoma-related genes. Moreover, we found severalinteresting structural variants including genetic deletions (RSPO1, OVCH2 and AP3S1, etc.) and fusion transcripts(FSIP1-BAZ2A, etc.).

  5. Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

    Directory of Open Access Journals (Sweden)

    Roy Michael Robins-Browne

    2016-11-01

    Full Text Available The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E. coli, including biotyping, serotyping and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods.

  6. Independent Evolution of Winner Traits without Whole Genome Duplication in Dekkera Yeasts.

    Directory of Open Access Journals (Sweden)

    Yi-Cheng Guo

    Full Text Available Dekkera yeasts have often been considered as alternative sources of ethanol production that could compete with S. cerevisiae. The two lineages of yeasts independently evolved traits that include high glucose and ethanol tolerance, aerobic fermentation, and a rapid ethanol fermentation rate. The Saccharomyces yeasts attained these traits mainly through whole genome duplication approximately 100 million years ago (Mya. However, the Dekkera yeasts, which were separated from S. cerevisiae approximately 200 Mya, did not undergo whole genome duplication (WGD but still occupy a niche similar to S. cerevisiae. Upon analysis of two Dekkera yeasts and five closely related non-WGD yeasts, we found that a massive loss of cis-regulatory elements occurred in an ancestor of the Dekkera yeasts, which led to improved mitochondrial functions similar to the S. cerevisiae yeasts. The evolutionary analysis indicated that genes involved in the transcription and translation process exhibited faster evolution in the Dekkera yeasts. We detected 90 positively selected genes, suggesting that the Dekkera yeasts evolved an efficient translation system to facilitate adaptive evolution. Moreover, we identified that 12 vacuolar H+-ATPase (V-ATPase function genes that were under positive selection, which assists in developing tolerance to high alcohol and high sugar stress. We also revealed that the enzyme PGK1 is responsible for the increased rate of glycolysis in the Dekkera yeasts. These results provide important insights to understand the independent adaptive evolution of the Dekkera yeasts and provide tools for genetic modification promoting industrial usage.

  7. Computel: computation of mean telomere length from whole-genome next-generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Lilit Nersisyan

    Full Text Available Telomeres are the ends of eukaryotic chromosomes, consisting of consecutive short repeats that protect chromosome ends from degradation. Telomeres shorten with each cell division, leading to replicative cell senescence. Deregulation of telomere length homeostasis is associated with the development of various age-related diseases and cancers. A number of experimental techniques exist for telomere length measurement; however, until recently, the absence of tools for extracting telomere lengths from high-throughput sequencing data has significantly obscured the association of telomere length with molecular processes in normal and diseased conditions. We have developed Computel, a program in R for computing mean telomere length from whole-genome next-generation sequencing data. Computel is open source, and is freely available at https://github.com/lilit-nersisyan/computel. It utilizes a short-read alignment-based approach and integrates various popular tools for sequencing data analysis. We validated it with synthetic and experimental data, and compared its performance with the previously available software. The results have shown that Computel outperforms existing software in accuracy, independence of results from sequencing conditions, stability against inherent sequencing errors, and better ability to distinguish pure telomeric sequences from interstitial telomeric repeats. By providing a highly reliable methodology for determining telomere lengths from whole-genome sequencing data, Computel should help to elucidate the role of telomeres in cellular health and disease.

  8. From days to hours: reporting clinically actionable variants from whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Sumit Middha

    Full Text Available As the cost of whole genome sequencing (WGS decreases, clinical laboratories will be looking at broadly adopting this technology to screen for variants of clinical significance. To fully leverage this technology in a clinical setting, results need to be reported quickly, as the turnaround rate could potentially impact patient care. The latest sequencers can sequence a whole human genome in about 24 hours. However, depending on the computing infrastructure available, the processing of data can take several days, with the majority of computing time devoted to aligning reads to genomics regions that are to date not clinically interpretable. In an attempt to accelerate the reporting of clinically actionable variants, we have investigated the utility of a multi-step alignment algorithm focused on aligning reads and calling variants in genomic regions of clinical relevance prior to processing the remaining reads on the whole genome. This iterative workflow significantly accelerates the reporting of clinically actionable variants with no loss of accuracy when compared to genotypes obtained with the OMNI SNP platform or to variants detected with a standard workflow that combines Novoalign and GATK.

  9. Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion.

    Science.gov (United States)

    Xi, Ruibin; Hadjipanayis, Angela G; Luquette, Lovelace J; Kim, Tae-Min; Lee, Eunjung; Zhang, Jianhua; Johnson, Mark D; Muzny, Donna M; Wheeler, David A; Gibbs, Richard A; Kucherlapati, Raju; Park, Peter J

    2011-11-15

    DNA copy number variations (CNVs) play an important role in the pathogenesis and progression of cancer and confer susceptibility to a variety of human disorders. Array comparative genomic hybridization has been used widely to identify CNVs genome wide, but the next-generation sequencing technology provides an opportunity to characterize CNVs genome wide with unprecedented resolution. In this study, we developed an algorithm to detect CNVs from whole-genome sequencing data and applied it to a newly sequenced glioblastoma genome with a matched control. This read-depth algorithm, called BIC-seq, can accurately and efficiently identify CNVs via minimizing the Bayesian information criterion. Using BIC-seq, we identified hundreds of CNVs as small as 40 bp in the cancer genome sequenced at 10× coverage, whereas we could only detect large CNVs (> 15 kb) in the array comparative genomic hybridization profiles for the same genome. Eighty percent (14/16) of the small variants tested (110 bp to 14 kb) were experimentally validated by quantitative PCR, demonstrating high sensitivity and true positive rate of the algorithm. We also extended the algorithm to detect recurrent CNVs in multiple samples as well as deriving error bars for breakpoints using a Gibbs sampling approach. We propose this statistical approach as a principled yet practical and efficient method to estimate CNVs in whole-genome sequencing data.

  10. Whole-Genome Mapping as a Novel High-Resolution Typing Tool for Legionella pneumophila.

    Science.gov (United States)

    Bosch, Thijs; Euser, Sjoerd M; Landman, Fabian; Bruin, Jacob P; IJzerman, Ed P; den Boer, Jeroen W; Schouls, Leo M

    2015-10-01

    Legionella is the causative agent for Legionnaires' disease (LD) and is responsible for several large outbreaks in the world. More than 90% of LD cases are caused by Legionella pneumophila, and studies on the origin and transmission routes of this pathogen rely on adequate molecular characterization of isolates. Current typing of L. pneumophila mainly depends on sequence-based typing (SBT). However, studies have shown that in some outbreak situations, SBT does not have sufficient discriminatory power to distinguish between related and nonrelated L. pneumophila isolates. In this study, we used a novel high-resolution typing technique, called whole-genome mapping (WGM), to differentiate between epidemiologically related and nonrelated L. pneumophila isolates. Assessment of the method by various validation experiments showed highly reproducible results, and WGM was able to confirm two well-documented Dutch L. pneumophila outbreaks. Comparison of whole-genome maps of the two outbreaks together with WGMs of epidemiologically nonrelated L. pneumophila isolates showed major differences between the maps, and WGM yielded a higher discriminatory power than SBT. In conclusion, WGM can be a valuable alternative to perform outbreak investigations of L. pneumophila in real time since the turnaround time from culture to comparison of the L. pneumophila maps is less than 24 h.

  11. Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data.

    Directory of Open Access Journals (Sweden)

    Frederick E Dewey

    2015-10-01

    Full Text Available High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

  12. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Directory of Open Access Journals (Sweden)

    Can Alkan

    2007-09-01

    Full Text Available The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  13. Kernel-based whole-genome prediction of complex traits: a review

    Science.gov (United States)

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics. PMID:25360145

  14. HomologMiner: looking for homologous genomic groups in whole genomes.

    Science.gov (United States)

    Hou, Minmei; Berman, Piotr; Hsu, Chih-Hao; Harris, Robert S

    2007-04-15

    Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain. We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families. All programs and datasets are downloadable from www.bx.psu.edu/miller_lab.

  15. Fast and low-cost decentralized surveillance of transmission of tuberculosis based on strain-specific PCRs tailored from whole genome sequencing data: a pilot study.

    Science.gov (United States)

    Pérez-Lago, L; Martínez Lirola, M; Herranz, M; Comas, I; Bouza, E; García-de-Viedma, D

    2015-03-01

    Molecular epidemiology has transformed our knowledge of how tuberculosis (TB) is transmitted. Whole genome sequencing (WGS) has reached unprecedented levels of accuracy. However, it has increased technical requirements and costs, and analysis of data delays results. Our objective was to find a way to reconcile speed and ease of implementation with the high resolution of WGS. The targeted regional allele-specific oligonucleotide PCR (TRAP) assay presented here is based on allele-specific PCR targeting strain-specific single nucleotide polymorphisms, identified from WGS, and makes it possible to track actively transmitted Mycobacterium tuberculosis strains. A TRAP assay was optimized to track the most actively transmitted strains in a population in Almería, Southeast Spain, with high rates of TB. TRAP was transferred to the local laboratory where transmission was occurring. It performed well from cultured isolates and directly from sputa, enabling new secondary cases of infection from the actively transmitted strains to be detected. TRAP constitutes a fast, simple and low-cost tool that could modify surveillance of TB transmission. This pilot study could help to define a new model to survey TB transmission based on a decentralized multinodal network of local laboratories applying fast and low-cost TRAPs, which are developed by central reference centres, tailored to the specific demands of transmission at each local node.

  16. Ichthyosis vulgaris

    DEFF Research Database (Denmark)

    Thyssen, J P; Godoy-Gijon, E; Elias, P M

    2013-01-01

    Ichthyosis vulgaris is caused by loss-of-function mutations in the filaggrin gene (FLG) and is characterized clinically by xerosis, scaling, keratosis pilaris, palmar and plantar hyperlinearity, and a strong association with atopic disorders. According to the published studies presented...... in this review article, FLG mutations are observed in approximately 7·7% of Europeans and 3·0% of Asians, but appear to be infrequent in darker-skinned populations. This clinical review article provides an overview of ichthyosis vulgaris epidemiology, related disorders and pathomechanisms. Not only does...... ichthyosis vulgaris possess a wide clinical spectrum, recent studies suggest that carriers of FLG mutations may have a generally altered risk of developing common diseases, even beyond atopic disorders. Mechanistic studies have shown increased penetration of allergens and chemicals in filaggrin...

  17. Pemphigus vulgaris

    Directory of Open Access Journals (Sweden)

    Sandhya Tamgadge

    2011-01-01

    Full Text Available Pemphigus vulgaris is a chronic autoimmune mucocutaneous disease that initially manifests in the form of intraoral lesions, which spread to other mucous membranes and the skin. The etiology of pemphigus vulgaris is still unknown, although the disease has attracted considerable interest. The pemphigus group of disease is characterized by the production of autoantibodies against intercellular substances and is thus classified as autoimmune diseases. Most patients are initially misdiagnosed and improperly treated for months or even years. Dental professionals must be sufficiently familiar with the clinical manifestations of pemphigus vulgaris to ensure early diagnosis and treatment, since this in turn determines the prognosis and course of the disease. This article presents a case report with unknown etiology along with an overview of the disease.

  18. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-10-24

    Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity

  19. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    Directory of Open Access Journals (Sweden)

    Alexander T Dilthey

    2016-10-01

    Full Text Available Genetic variation at the Human Leucocyte Antigen (HLA genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG framework. First, we construct a PRG for 46 (mostly HLA genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1 and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data. Of 158 alleles tested, we correctly infer 157 alleles (99.4%. We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample remain a

  20. Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data

    Directory of Open Access Journals (Sweden)

    Fujiyama Asao

    2010-04-01

    Full Text Available Abstract Background Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. Results We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for γ-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. Conclusions The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B

  1. Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data.

    Science.gov (United States)

    Nishito, Yukari; Osana, Yasunori; Hachiya, Tsuyoshi; Popendorf, Kris; Toyoda, Atsushi; Fujiyama, Asao; Itaya, Mitsuhiro; Sakakibara, Yasubumi

    2010-04-16

    Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for gamma-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks

  2. Whole-genome thermodynamic analysis reduces siRNA off-target effects.

    Directory of Open Access Journals (Sweden)

    Xi Chen

    Full Text Available Small interfering RNAs (siRNAs are important tools for knocking down targeted genes, and have been widely applied to biological and biomedical research. To design siRNAs, two important aspects must be considered: the potency in knocking down target genes and the off-target effect on any nontarget genes. Although many studies have produced useful tools to design potent siRNAs, off-target prevention has mostly been delegated to sequence-level alignment tools such as BLAST. We hypothesize that whole-genome thermodynamic analysis can identify potential off-targets with higher precision and help us avoid siRNAs that may have strong off-target effects. To validate this hypothesis, two siRNA sets were designed to target three human genes IDH1, ITPR2 and TRIM28. They were selected from the output of two popular siRNA design tools, siDirect and siDesign. Both siRNA design tools have incorporated sequence-level screening to avoid off-targets, thus their output is believed to be optimal. However, one of the sets we tested has off-target genes predicted by Picky, a whole-genome thermodynamic analysis tool. Picky can identify off-target genes that may hybridize to a siRNA within a user-specified melting temperature range. Our experiments validated that some off-target genes predicted by Picky can indeed be inhibited by siRNAs. Similar experiments were performed using commercially available siRNAs and a few off-target genes were also found to be inhibited as predicted by Picky. In summary, we demonstrate that whole-genome thermodynamic analysis can identify off-target genes that are missed in sequence-level screening. Because Picky prediction is deterministic according to thermodynamics, if a siRNA candidate has no Picky predicted off-targets, it is unlikely to cause off-target effects. Therefore, we recommend including Picky as an additional screening step in siRNA design.

  3. Supplementary Material for: Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-01-01

    Abstract Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic

  4. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    Science.gov (United States)

    Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

    2016-10-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant

  5. Tolerance of Whole-Genome Doubling Propagates Chromosomal Instability and Accelerates Cancer Genome Evolution

    DEFF Research Database (Denmark)

    Dewhurst, Sally M.; McGranahan, Nicholas; Burrell, Rebecca A.;

    2014-01-01

    The contribution of whole-genome doubling to chromosomal instability (CIN) and tumor evolution is unclear. We use long-term culture of isogenic tetraploid cells from a stable diploid colon cancer progenitor to investigate how a genome-doubling event affects genome stability over time. Rare cells...... that survive genome doubling demonstrate increased tolerance to chromosome aberrations. Tetraploid cells do not exhibit increased frequencies of structural or numerical CIN per chromosome. However, the tolerant phenotype in tetraploid cells, coupled with a doubling of chromosome aberrations per cell, allows...... chromosome abnormalities to evolve specifically in tetraploids, recapitulating chromosomal changes in genomically complex colorectal tumors. Finally, a genome-doubling event is independently predictive of poor relapse-free survival in early-stage disease in two independent cohorts in multivariate analyses...

  6. Clostridium botulinum Group II Isolate Phylogenomic Profiling Using Whole-Genome Sequence Data.

    Science.gov (United States)

    Weedmark, K A; Mabon, P; Hayden, K L; Lambert, D; Van Domselaar, G; Austin, J W; Corbett, C R

    2015-09-01

    Clostridium botulinum group II isolates (n = 163) from different geographic regions, outbreaks, and neurotoxin types and subtypes were characterized in silico using whole-genome sequence data. Two clusters representing a variety of botulinum neurotoxin (BoNT) types and subtypes were identified by multilocus sequence typing (MLST) and core single nucleotide polymorphism (SNP) analysis. While one cluster included BoNT/B4/F6/E9 and nontoxigenic members, the other comprised a wide variety of different BoNT/E subtype isolates and a nontoxigenic strain. In silico MLST and core SNP methods were consistent in terms of clade-level isolate classification; however, core SNP analysis showed higher resolution capability. Furthermore, core SNP analysis correctly distinguished isolates by outbreak and location. This study illustrated the utility of next-generation sequence-based typing approaches for isolate characterization and source attribution and identified discrete SNP loci and MLST alleles for isolate comparison.

  7. Whole-genome linkage analysis in mapping alcoholism genes using single-nucleotide polymorphisms and microsatellites.

    Science.gov (United States)

    Wang, Shuang; Huang, Song; Liu, Nianjun; Chen, Liang; Oh, Cheongeun; Zhao, Hongyu

    2005-12-30

    There is currently a great interest in using single-nucleotide polymorphisms (SNPs) in genetic linkage and association studies because of the abundance of SNPs as well as the availability of high-throughput genotyping technologies. In this study, we compared the performance of whole-genome scans using SNPs with microsatellites on 143 pedigrees from the Collaborative Studies on Genetics of Alcoholism provided by Genetic Analysis Workshop 14. A total of 315 microsatellites and 10,081 SNPs from Affymetrix on 22 autosomal chromosomes were used in our analyses. We found that the results from the two scans had good overall concordance. One region on chromosome 2 and two regions on chromosome 7 showed significant linkage signals (i.e., NPL >or= 2) for alcoholism from both the SNP and microsatellite scans. The different results observed between the two scans may be explained by the difference observed in information content between the SNPs and the microsatellites.

  8. Whole-genome sequencing of a malignant granular cell tumor with metabolic response to pazopanib

    Science.gov (United States)

    Wei, Lei; Liu, Song; Conroy, Jeffrey; Wang, Jianmin; Papanicolau-Sengos, Antonios; Glenn, Sean T.; Murakami, Mitsuko; Liu, Lu; Hu, Qiang; Conroy, Jacob; Miles, Kiersten Marie; Nowak, David E.; Liu, Biao; Qin, Maochun; Bshara, Wiam; Omilian, Angela R.; Head, Karen; Bianchi, Michael; Burgher, Blake; Darlak, Christopher; Kane, John; Merzianu, Mihai; Cheney, Richard; Fabiano, Andrew; Salerno, Kilian; Talati, Chetasi; Khushalani, Nikhil I.; Trump, Donald L.; Johnson, Candace S.; Morrison, Carl D.

    2015-01-01

    Granular cell tumors are an uncommon soft tissue neoplasm. Malignant granular cell tumors comprise T transitions, particularly when immediately preceded by a 5′ G. A loss-of-function mutation was detected in a newly recognized tumor suppressor candidate, BRD7. No mutations were found in known targets of pazopanib. However, we identified a receptor tyrosine kinase pathway mutation in GFRA2 that warrants further evaluation. To the best of our knowledge, this is only the second reported case of a malignant granular cell tumor exhibiting a response to pazopanib, and the first whole-genome sequencing of this uncommon tumor type. The findings provide insight into the genetic basis of malignant granular cell tumors and identify potential targets for further investigation. PMID:27148567

  9. Hepatitis C virus whole genome sequencing: Current methods/issues and future challenges.

    Science.gov (United States)

    Trémeaux, Pauline; Caporossi, Alban; Thélu, Marie-Ange; Blum, Michael; Leroy, Vincent; Morand, Patrice; Larrat, Sylvie

    2016-10-01

    Therapy for hepatitis C is currently undergoing a revolution. The arrival of new antiviral agents targeting viral proteins reinforces the need for a better knowledge of the viral strains infecting each patient. Hepatitis C virus (HCV) whole genome sequencing provides essential information for precise typing, study of the viral natural history or identification of resistance-associated variants. First performed with Sanger sequencing, the arrival of next-generation sequencing (NGS) has simplified the technical process and provided more detailed data on the nature and evolution of viral quasi-species. We will review the different techniques used for HCV complete genome sequencing and their applications, both before and after the apparition of NGS. The progress brought by new and future technologies will also be discussed, as well as the remaining difficulties, largely due to the genomic variability.

  10. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation

    Directory of Open Access Journals (Sweden)

    C. Sharma

    2016-09-01

    Full Text Available Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris strains from four Indian hospitals were highly related, suggesting clonal transmission. Further, all C. auris isolates originated from cases of fungaemia and were resistant to fluconazole (MIC >64 mg/L.

  11. Whole genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing

    Science.gov (United States)

    Harris, Simon R.; Clarke, Ian N.; Seth-Smith, Helena M. B.; Solomon, Anthony W.; Cutcliffe, Lesley T.; Marsh, Peter; Skilton, Rachel J.; Holland, Martin J.; Mabey, David; Peeling, Rosanna W.; Lewis, David A.; Spratt, Brian G.; Unemo, Magnus; Persson, Kenneth; Bjartling, Carina; Brunham, Robert; de Vries, Henry J.C.; Morré, Servaas A.; Speksnijder, Arjen; Bébéar, Cécile M.; Clerc, Maïté; de Barbeyrac, Bertille; Parkhill, Julian; Thomson, Nicholas R.

    2012-01-01

    Chlamydia trachomatis is responsible for both trachoma and sexually transmitted infections causing substantial morbidity and economic cost globally. Despite this, our knowledge of its population and evolutionary genetics is limited. Here we present a detailed whole genome phylogeny from representative strains of both trachoma and lymphogranuloma venereum (LGV) biovars from temporally and geographically diverse sources. Our analysis demonstrates that predicting phylogenetic structure using the ompA gene, traditionally used to classify Chlamydia, is misleading because extensive recombination in this region masks true relationships. We show that in many instances ompA is a chimera that can be exchanged in part or whole, both within and between biovars. We also provide evidence for exchange of, and recombination within, the cryptic plasmid, another important diagnostic target. We have used our phylogenetic framework to show how genetic exchange has manifested itself in ocular, urogenital and LGV C. trachomatis strains, including the epidemic LGV serotype L2b. PMID:22406642

  12. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected...... itself. Depending on the trait’s economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage...... was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from...

  13. Evaluating potential for whole-genome studies in Kosrae, an isolated population in Micronesia.

    Science.gov (United States)

    Bonnen, Penelope E; Pe'er, Itsik; Plenge, Robert M; Salit, Jackie; Lowe, Jennifer K; Shapero, Michael H; Lifton, Richard P; Breslow, Jan L; Daly, Mark J; Reich, David E; Jones, Keith W; Stoffel, Markus; Altshuler, David; Friedman, Jeffrey M

    2006-02-01

    Whole-genome association studies are predicted to be especially powerful in isolated populations owing to increased linkage disequilibrium (LD) and decreased allelic diversity, but this possibility has not been empirically tested. We compared genome-wide data on 113,240 SNPs typed on 30 trios from the Pacific island of Kosrae to the same markers typed in the 270 samples from the International HapMap Project. The extent of LD is longer and haplotype diversity is lower in Kosrae than in the HapMap populations. More than 98% of Kosraen haplotypes are present in HapMap populations, indicating that HapMap will be useful for genetic studies on Kosrae. The long-range LD around common alleles and limited diversity result in improved efficiency in genetic studies in this population and augments the power to detect association of 'hidden SNPs'.

  14. A Danish Salmonella Bareilly outbreak investigated by the use of whole genome sequencing

    DEFF Research Database (Denmark)

    Torpdahl, M.; Kiil, K.; Litrup, E.

    2013-01-01

    for cluster analysis, outbreak investigations, comparison with food and animal isolates as well as making international inquiries. In this case, we found Bareilly with the same PFGE profile from 8 patients. Seven of the cases could be traced back to an unknown food source served at a specific restaurant....... At the same time four broiler flocks flocks were tested positive for Bareilly. Bareilly is also rare in the Danish food production, and it was the first time in more than 10 years that Bareilly was isolated in broiler flocks. PFGE was performed on these isolates as well and the profiles from humans...... with several band changes and others are defined by one PFGE profile thereby excluding closely related profiles. We decided to investigate whether whole genome sequencing (WGS) could resolve this issue and be useful in outbreak investigations. Several analyses were performed, including a SNP tree based...

  15. Whole-genome sequence comparisons reveal the evolution of Vibrio cholerae O1.

    Science.gov (United States)

    Kim, Eun Jin; Lee, Chan Hee; Nair, G Balakrish; Kim, Dong Wook

    2015-08-01

    The analysis of the whole-genome sequences of Vibrio cholerae strains from previous and current cholera pandemics has demonstrated that genomic changes and alterations in phage CTX (particularly in the gene encoding the B subunit of cholera toxin) were major features in the evolution of V. cholerae. Recent studies have revealed the genetic mechanisms in these bacteria by which new variants of V. cholerae are generated from type-specific strains; these mechanisms suggest that certain strains are selected by environmental or human factors over time. By understanding the mechanisms and driving forces of historical and current changes in the V. cholerae population, it would be possible to predict the direction of such changes and the evolution of new variants; this has implications for the battle against cholera. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Landscape of somatic mutations in 560 breast cancer whole genome sequences

    Science.gov (United States)

    Nik-Zainal, Serena; Davies, Helen; Staaf, Johan; Ramakrishna, Manasa; Glodzik, Dominik; Zou, Xueqing; Martincorena, Inigo; Alexandrov, Ludmil B.; Martin, Sancha; Wedge, David C.; Van Loo, Peter; Ju, Young Seok; Smid, Marcel; Brinkman, Arie B; Morganella, Sandro; Aure, Miriam R.; Lingjærde, Ole Christian; Langerød, Anita; Ringnér, Markus; Ahn, Sung-Min; Boyault, Sandrine; Brock, Jane E.; Broeks, Annegien; Butler, Adam; Desmedt, Christine; Dirix, Luc; Dronov, Serge; Fatima, Aquila; Foekens, John A.; Gerstung, Moritz; Hooijer, Gerrit KJ; Jang, Se Jin; Jones, David R.; Kim, Hyung-Yong; King, Tari A.; Krishnamurthy, Savitri; Lee, Hee Jin; Lee, Jeong-Yeon; Li, Yilong; McLaren, Stuart; Menzies, Andrew; Mustonen, Ville; O’Meara, Sarah; Pauporté, Iris; Pivot, Xavier; Purdie, Colin A.; Raine, Keiran; Ramakrishnan, Kamna; Rodríguez-González, F. Germán; Romieu, Gilles; Sieuwerts, Anieta M.; Simpson, Peter T; Shepherd, Rebecca; Stebbings, Lucy; Stefansson, Olafur A; Teague, Jon; Tommasi, Stefania; Treilleux, Isabelle; Van den Eynden, Gert G.; Vermeulen, Peter; Vincent-Salomon, Anne; Yates, Lucy; Caldas, Carlos; van’t Veer, Laura; Tutt, Andrew; Knappskog, Stian; Tan, Benita Kiat Tee; Jonkers, Jos; Borg, Åke; Ueno, Naoto T; Sotiriou, Christos; Viari, Alain; Futreal, P. Andrew; Campbell, Peter J; Span, Paul N.; Van Laere, Steven; Lakhani, Sunil R; Eyfjord, Jorunn E.; Thompson, Alastair M.; Birney, Ewan; Stunnenberg, Hendrik G; van de Vijver, Marc J; Martens, John W.M.; Børresen-Dale, Anne-Lise; Richardson, Andrea L.; Kong, Gu; Thomas, Gilles; Stratton, Michael R.

    2016-01-01

    We analysed whole genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. 93 protein-coding cancer genes carried likely driver mutations. Some non-coding regions exhibited high mutation frequencies but most have distinctive structural features probably causing elevated mutation rates and do not harbour driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed 12 base substitution and six rearrangement signatures. Three rearrangement signatures, characterised by tandem duplications or deletions, appear associated with defective homologous recombination based DNA repair: one with deficient BRCA1 function; another with deficient BRCA1 or BRCA2 function; the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operative, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer. PMID:27135926

  17. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci.

    Science.gov (United States)

    Rasmussen, L H; Dargis, R; Højholt, K; Christensen, J J; Skovgaard, O; Justesen, U S; Rosenvinge, F S; Moser, C; Lukjancenko, O; Rasmussen, S; Nielsen, X C

    2016-10-01

    Identification of Mitis group streptococci (MGS) to the species level is challenging for routine microbiology laboratories. Correct identification is crucial for the diagnosis of infective endocarditis, identification of treatment failure, and/or infection relapse. Eighty MGS from Danish patients with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were observed in single gene analyses. Species identification based on single gene analysis showed their limitations when more strains were included. In contrast, analyses incorporating more sequence data, like MLSA, SNPs and core-genome analyses, provided more distinct clustering. The core-genome tree showed the most distinct clustering.

  18. High-resolution whole-genome association study of Parkinson disease.

    Science.gov (United States)

    Maraganore, Demetrius M; de Andrade, Mariza; Lesnick, Timothy G; Strain, Kari J; Farrer, Matthew J; Rocca, Walter A; Pant, P V Krishna; Frazer, Kelly A; Cox, David R; Ballinger, Dennis G

    2005-11-01

    We performed a two-tiered, whole-genome association study of Parkinson disease (PD). For tier 1, we individually genotyped 198,345 uniformly spaced and informative single-nucleotide polymorphisms (SNPs) in 443 sibling pairs discordant for PD. For tier 2a, we individually genotyped 1,793 PD-associated SNPs (Pgenetic hypotheses regarding susceptibility to PD (n=941 SNPs). In analysis of the combined tier 1 and tier 2b data, the two SNPs with the lowest P values (P=9.07 x 10(-6); P=2.96 x 10(-5)) tagged the PARK10 late-onset PD susceptibility locus. Independent replication across populations will clarify the role of the genomic loci tagged by these SNPs in conferring PD susceptibility.

  19. Whole-Genome Sequencing to Determine Origin of Multinational Outbreak of Sarocladium kiliense Bloodstream Infections.

    Science.gov (United States)

    Etienne, Kizee A; Roe, Chandler C; Smith, Rachel M; Vallabhaneni, Snigdha; Duarte, Carolina; Escadon, Patricia; Castaneda, Elizabeth; Gomez, Beatriz L; de Bedout, Catalina; López, Luisa F; Salas, Valentina; Hederra, Luz Maria; Fernandez, Jorge; Pidal, Paola; Hormazabel, Juan Carlos; Otaiza, Fernando; Vannberg, Fredrik O; Gillece, John; Lemmer, Darrin; Driebe, Elizabeth M; Englethaler, David M; Litvintseva, Anastasia P

    2016-03-01

    We used whole-genome sequence typing (WGST) to investigate an outbreak of Sarocladium kiliense bloodstream infections (BSI) associated with receipt of contaminated antinausea medication among oncology patients in Colombia and Chile during 2013-2014. Twenty-five outbreak isolates (18 from patients and 7 from medication vials) and 11 control isolates unrelated to this outbreak were subjected to WGST to elucidate a source of infection. All outbreak isolates were nearly indistinguishable (21,000 single-nucleotide polymorphisms were identified from unrelated control isolates, suggesting a point source for this outbreak. S. kiliense has been previously implicated in healthcare-related infections; however, the lack of available typing methods has precluded the ability to substantiate point sources. WGST for outbreak investigation caused by eukaryotic pathogens without reference genomes or existing genotyping methods enables accurate source identification to guide implementation of appropriate control and prevention measures.

  20. Whole-genome sequence analysis of Zika virus, amplified from urine of traveler from the Philippines.

    Science.gov (United States)

    Gu, Se Hun; Song, Dong Hyun; Lee, Daesang; Jang, Jeyoun; Kim, Min Young; Jung, Jaehun; Woo, Koung In; Kim, Mirang; Seog, Woong; Oh, Hong Sang; Choi, Byung Seop; Ahn, Jong-Seong; Park, Quehn; Jeong, Seong Tae

    2017-08-09

    Zika virus (ZIKV) (genus Flavivirus, family Flaviviridae) is an emerging pathogen associated with microcephaly and Guillain-Barré syndrome. The rapid spread of ZIKV disease in over 60 countries and the large numbers of travel-associated cases have caused worldwide concern. Thus, intensified surveillance of cases among immigrants and tourists from ZIKV-endemic areas is important for disease control and prevention. In this study, using Next Generation Sequencing, we reported the first whole-genome sequence of ZIKV strain AFMC-U, amplified from the urine of a traveler returning to Korea from the Philippines. Phylogenetic analysis showed geographic-specific clustering. Our results underscore the importance of examining urine in the diagnosis of ZIKV infection.

  1. Lysis of a Single Cyanobacterium for Whole Genome Amplification

    Directory of Open Access Journals (Sweden)

    Richard N. Zare

    2013-08-01

    Full Text Available Bacterial species from natural environments, exhibiting a great degree of genetic diversity that has yet to be characterized, pose a specific challenge to whole genome amplification (WGA from single cells. A major challenge is establishing an effective, compatible, and controlled lysis protocol. We present a novel lysis protocol that can be used to extract genomic information from a single cyanobacterium of Synechocystis sp. PCC 6803 known to have multilayer cell wall structures that resist conventional lysis methods. Simple but effective strategies for releasing genomic DNA from captured cells while retaining cellular identities for single-cell analysis are presented. Successful sequencing of genetic elements from single-cell amplicons prepared by multiple displacement amplification (MDA is demonstrated for selected genes (15 loci nearly equally spaced throughout the main chromosome.

  2. Small homologous blocks in phytophthora genomes do not point to an ancient whole-genome duplication.

    Science.gov (United States)

    van Hooff, Jolien J E; Snel, Berend; Seidl, Michael F

    2014-05-01

    Genomes of the plant-pathogenic genus Phytophthora are characterized by small duplicated blocks consisting of two consecutive genes (2HOM blocks) and by an elevated abundance of similarly aged gene duplicates. Both properties, in particular the presence of 2HOM blocks, have been attributed to a whole-genome duplication (WGD) at the last common ancestor of Phytophthora. However, large intraspecies synteny-compelling evidence for a WGD-has not been detected. Here, we revisited the WGD hypothesis by deducing the age of 2HOM blocks. Two independent timing methods reveal that the majority of 2HOM blocks arose after divergence of the Phytophthora lineages. In addition, a large proportion of the 2HOM block copies colocalize on the same scaffold. Therefore, the presence of 2HOM blocks does not support a WGD at the last common ancestor of Phytophthora. Thus, genome evolution of Phytophthora is likely driven by alternative mechanisms, such as bursts of transposon activity.

  3. Determining the repertoire of immunodominant proteins via whole-genome amplification of intracellular pathogens.

    Directory of Open Access Journals (Sweden)

    Michael J Dark

    Full Text Available Culturing many obligate intracellular bacteria is difficult or impossible. However, these organisms have numerous adaptations allowing for infection persistence and immune system evasion, making them some of the most interesting to study. Recent advancements in genome sequencing, pyrosequencing and Phi29 amplification, have allowed for examination of whole-genome sequences of intracellular bacteria without culture. We have applied both techniques to the model obligate intracellular pathogen Anaplasma marginale and the human pathogen Anaplasma phagocytophilum, in order to examine the ability of phi29 amplification to determine the sequence of genes allowing for immune system evasion and long-term persistence in the host. When compared to traditional pyrosequencing, phi29-mediated genome amplification had similar genome coverage, with no additional gaps in coverage. Additionally, all msp2 functional pseudogenes from two strains of A. marginale were detected and extracted from the phi29-amplified genomes, highlighting its utility in determining the full complement of genes involved in immune evasion.

  4. Overview of HBV whole genome data in public repositories and the Chinese HBV reference sequences

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The number of Hepatitis B virus (HBV) whole genomic sequences in public nucleotide databases (GenBank, EMBL, and DDBJ) had reached 866 by January 1, 2007. Coming from 46 countries and regions, these sequences were categorized as eight genotypes (A-H). With the statistical and phylogenetic analysis on all available complete genomic data of HBV, we here present an overview of HBV sequences in public databases. From all registered 229 HBV genomes in Chinese regions as well as 59 sequencing data from our research group, we report the establishment of reference sequences of HBV strains prevailing in China. These analyses provide clues for the effects of HBV genotypes in host clinical progressions, geographic distribution of the infection, and the viral evolutionary history. Moreover, the viral sequence reference would be helpful in the identification of various HBV mutations. Based on the analysis of various public databases,we suggest that the Chinese HBV database with the clinical information should be constructed.

  5. Clinical Decision Support for Whole Genome Sequence Information Leveraging a Service-Oriented Architecture: a Prototype

    Science.gov (United States)

    Welch, Brandon M.; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

    2014-01-01

    Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time. PMID:25954430

  6. A strategic stakeholder approach for addressing further analysis requests in whole genome sequencing research.

    Science.gov (United States)

    Thornock, Bradley Steven O

    2016-01-01

    Whole genome sequencing (WGS) can be a cost-effective and efficient means of diagnosis for some children, but it also raises a number of ethical concerns. One such concern is how researchers derive and communicate results from WGS, including future requests for further analysis of stored sequences. The purpose of this paper is to think about what is at stake, and for whom, in any solution that is developed to deal with such requests. To accomplish this task, this paper will utilize stakeholder theory, a common method used in business ethics. Several scenarios that connect stakeholder concerns and WGS will also posited and analyzed. This paper concludes by developing criteria composed of a series of questions that researchers can answer in order to more effectively address requests for further analysis of stored sequences.

  7. Methylated DNA is over-represented in whole-genome bisulfite sequencing data

    Directory of Open Access Journals (Sweden)

    Lexiang eJi

    2014-10-01

    Full Text Available The development of whole-genome bisulfite sequencing (WGBS has led to a number of exciting discoveries about the role of DNA methylation leading to a plethora of novel testable hypotheses. Methods for constructing sodium bisulfite-converted and amplified libraries have recently advanced to the point that the bottleneck for experiments that use WGBS has shifted to data analysis and interpretation. Here we present empirical evidence for an over-representation of reads from methylated DNA in WGBS. This enrichment for methylated DNA is exacerbated by higher cycles of PCR and is influenced by the type of uracil-insensitive DNA polymerase used for amplifying the sequencing library. Future efforts to computationally correct for this enrichment bias will be essential to increasing the accuracy of determining methylation levels for individual cytosines. It is especially critical for studies that seek to accurately quantify DNA methylation levels in populations that may segregate for allelic DNA methylation states.

  8. Bioinformatics Workflow for Clinical Whole Genome Sequencing at Partners HealthCare Personalized Medicine

    Directory of Open Access Journals (Sweden)

    Ellen A. Tsai

    2016-02-01

    Full Text Available Effective implementation of precision medicine will be enhanced by a thorough understanding of each patient’s genetic composition to better treat his or her presenting symptoms or mitigate the onset of disease. This ideally includes the sequence information of a complete genome for each individual. At Partners HealthCare Personalized Medicine, we have developed a clinical process for whole genome sequencing (WGS with application in both healthy individuals and those with disease. In this manuscript, we will describe our bioinformatics strategy to efficiently process and deliver genomic data to geneticists for clinical interpretation. We describe the handling of data from FASTQ to the final variant list for clinical review for the final report. We will also discuss our methodology for validating this workflow and the cost implications of running WGS.

  9. Whole genome sequence of Pantoea ananatis R100, an antagonistic bacterium isolated from rice seed.

    Science.gov (United States)

    Wu, Liwen; Liu, Ruifang; Niu, Yaofang; Lin, Haiyan; Ye, Weijun; Guo, Longbiao; Hu, Xingming

    2016-05-10

    Pantoea ananatis is a group of bacteria, which was first reported as plant pathogen. Recently, several papers also described its biocontrol ability. In 2003, P. ananatis R100, which showed strong antagonism against several plant pathogens, was isolated from rice seeds. In this study, whole genome sequence of this strain was determined by SMRT Cell technology. The total genome size of R100 is 4,857,861bp with 4659 coding genes (CDS), 82 tRNAs and 22 rRNAs. The genome sequence of R100 may shed a light on the research of antagonism P. ananatis. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  10. Draft whole genome sequence of the cyanide-degrading bacterium Pseudomonas pseudoalcaligenes CECT5344.

    Science.gov (United States)

    Luque-Almagro, Víctor M; Acera, Felipe; Igeño, Ma Isabel; Wibberg, Daniel; Roldán, Ma Dolores; Sáez, Lara P; Hennig, Magdalena; Quesada, Alberto; Huertas, Ma José; Blom, Jochen; Merchán, Faustino; Escribano, Ma Paz; Jaenicke, Sebastian; Estepa, Jessica; Guijo, Ma Isabel; Martínez-Luque, Manuel; Macías, Daniel; Szczepanowski, Rafael; Becerra, Gracia; Ramirez, Silvia; Carmona, Ma Isabel; Gutiérrez, Oscar; Manso, Isabel; Pühler, Alfred; Castillo, Francisco; Moreno-Vivián, Conrado; Schlüter, Andreas; Blasco, Rafael

    2013-01-01

    Pseudomonas pseudoalcaligenes CECT5344 is a Gram-negative bacterium able to tolerate cyanide and to use it as the sole nitrogen source. We report here the first draft of the whole genome sequence of a P. pseudoalcaligenes strain that assimilates cyanide. Three aspects are specially emphasized in this manuscript. First, some generalities of the genome are shown and discussed in the context of other Pseudomonadaceae genomes, including genome size, G + C content, core genome and singletons among other features. Second, the genome is analysed in the context of cyanide metabolism, describing genes probably involved in cyanide assimilation, like those encoding nitrilases, and genes related to cyanide resistance, like the cio genes encoding the cyanide insensitive oxidases. Finally, the presence of genes probably involved in other processes with a great biotechnological potential like production of bioplastics and biodegradation of pollutants also is discussed.

  11. Bioinformatics tools and databases for whole genome sequence analysis of Mycobacterium tuberculosis.

    Science.gov (United States)

    Faksri, Kiatichai; Tan, Jun Hao; Chaiprasert, Angkana; Teo, Yik-Ying; Ong, Rick Twee-Hee

    2016-11-01

    Tuberculosis (TB) is an infectious disease of global public health importance caused by Mycobacterium tuberculosis complex (MTC) in which M. tuberculosis (Mtb) is the major causative agent. Recent advancements in genomic technologies such as next generation sequencing have enabled high throughput cost-effective generation of whole genome sequence information from Mtb clinical isolates, providing new insights into the evolution, genomic diversity and transmission of the Mtb bacteria, including molecular mechanisms of antibiotic resistance. The large volume of sequencing data generated however necessitated effective and efficient management, storage, analysis and visualization of the data and results through development of novel and customized bioinformatics software tools and databases. In this review, we aim to provide a comprehensive survey of the current freely available bioinformatics software tools and publicly accessible databases for genomic analysis of Mtb for identifying disease transmission in molecular epidemiology and in rapid determination of the antibiotic profiles of clinical isolates for prompt and optimal patient treatment.

  12. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences

    KAUST Repository

    Coll, Francesc

    2015-05-27

    Mycobacterium tuberculosis drug resistance (DR) challenges effective tuberculosis disease control. Current molecular tests examine limited numbers of mutations, and although whole genome sequencing approaches could fully characterise DR, data complexity has restricted their clinical application. A library (1,325 mutations) predictive of DR for 15 anti-tuberculosis drugs was compiled and validated for 11 of them using genomic-phenotypic data from 792 strains. A rapid online ‘TB-Profiler’ tool was developed to report DR and strain-type profiles directly from raw sequences. Using our DR mutation library, in silico diagnostic accuracy was superior to some commercial diagnostics and alternative databases. The library will facilitate sequence-based drug-susceptibility testing.

  13. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  14. CCor: A whole genome network-based similarity measure between two genes.

    Science.gov (United States)

    Hu, Yiming; Zhao, Hongyu

    2016-12-01

    Measuring the similarity between genes is often the starting point for building gene regulatory networks. Most similarity measures used in practice only consider pairwise information with a few also consider network structure. Although theoretical properties of pairwise measures are well understood in the statistics literature, little is known about their statistical properties of those similarity measures based on network structure. In this article, we consider a new whole genome network-based similarity measure, called CCor, that makes use of information of all the genes in the network. We derive a concentration inequality of CCor and compare it with the commonly used Pearson correlation coefficient for inferring network modules. Both theoretical analysis and real data example demonstrate the advantages of CCor over existing measures for inferring gene modules.

  15. Rapid whole genome sequencing for the detection and characterization of microorganisms directly from clinical samples

    DEFF Research Database (Denmark)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Pontén, Thomas;

    2014-01-01

    Whole genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples this could further reduce diagnostic time and thereby improve control and treatment. A major bottle-neck is the availability of fast and reliable bioinformatics...... information and drastically reduce diagnostic time. This may prove very useful, but the need for data analysis is still a hurdle to clinical implementation. To overcome this problem a publicly available bioinformatics tool was developed in this study....... tools. This study was conducted to evaluate the applicability of WGS directly on clinical samples and to develop easy-to-use bioinformatics tools for analysis of the sequencing data. Thirty-five random urine samples from patients with suspected urinary tract infections were examined using conventional...

  16. Use of Whole Genome Sequencing and Patient Interviews To Link a Case of Sporadic Listeriosis to Consumption of Prepackaged Lettuce.

    Science.gov (United States)

    Jackson, K A; Stroika, S; Katz, L S; Beal, J; Brandt, E; Nadon, C; Reimer, A; Major, B; Conrad, A; Tarr, C; Jackson, B R; Mody, R K

    2016-05-01

    We report on a case of listeriosis in a patient who probably consumed a prepackaged romaine lettuce-containing product recalled for Listeria monocytogenes contamination. Although definitive epidemiological information demonstrating exposure to the specific recalled product was lacking, the patient reported consumption of a prepackaged romaine lettuce-containing product of either the recalled brand or a different brand. A multinational investigation found that patient and food isolates from the recalled product were indistinguishable by pulsed-field gel electrophoresis and were highly related by whole genome sequencing, differing by four alleles by whole genome multilocus sequence typing and by five high-quality single nucleotide polymorphisms, suggesting a common source. To our knowledge, this is the first time prepackaged lettuce has been identified as a likely source for listeriosis. This investigation highlights the power of whole genome sequencing, as well as the continued need for timely and thorough epidemiological exposure data to identify sources of foodborne infections.

  17. [Pathological Diagnoses and Whole-genome Sequence Analyses of the Jaagsiekte Sheep Retrovirus in Xinjiang, China].

    Science.gov (United States)

    Yang, Sufang; Liang, Tian; Zhao, Qingliang; Zhang, Dianqing; Si Junqiang; Zhang, Jing; Yang, Xia; Sheng, Jinliang

    2015-05-01

    To carry out pathologic diagnoses and whole-genome sequence analyses of the Jaagsiekte sheep retrovirus (JSRV) in Xinjiang, China, we first observed sheep suspected to have the JSRV. Then, the extracted virus suspension was observed by transmission electron microscopy (TEM). Total RNAs from lungs of JSRV-infected sheep were extracted and reverse-transcribed using a cDNA synthesis kit. Six pairs of primers were designed according to the exogenous reference virus strain (AF105220). Reverse transcription-polymerase chain reaction was carried out from JSRV-infected tissue, and the whole genome of the JSRV sequenced. Our results showed: flow of nasal fluid ("wheelbarrow test"); different sizes of adenoma lesions in the lungs; papillary hyperplasia of alveolar epithelial cells; alveolar cavity filled with macrophages; dissolute nuclei in central lesions. TEM revealed JSRV particles with a diameter of 88 nm to 125. 4 nm. The full-length of the viral genome sequence was 7456 bp. BLAST analyses showed nucleotide homology of 96% and 95% compared with that of the representative strain from the USA (AF105220) and UK (AF357971). Nucleotide homology was 89.8% and 89.9% compared with the endogenous Jaagsiekte sheep retrovirus, Inner Mongolia strain (DQ838493) and USA strain (EF680300). The specific pathogenic amino-acid sequence "YXXM" was found in the TM district, similar to the exogenous JSRV: this gene has been reported to be oncogenic. This is the first report of the complete genomic sequence of the exogenous JSRV from Xinjiang, and could lay the foundation for study of the biological characteristics and pathogenic mechanisms of the pulmonary adenomatosis virus in sheep.

  18. Comparative Whole-Genome Mapping To Determine Staphylococcus aureus Genome Size, Virulence Motifs, and Clonality

    Science.gov (United States)

    Pantrang, Madhulatha; Stahl, Buffy; Briska, Adam M.; Stemper, Mary E.; Wagner, Trevor K.; Zentz, Emily B.; Callister, Steven M.; Lovrich, Steven D.; Henkhaus, John K.; Dykes, Colin W.

    2012-01-01

    Despite being a clonal pathogen, Staphylococcus aureus continues to acquire virulence and antibiotic-resistant genes located on mobile genetic elements such as genomic islands, prophages, pathogenicity islands, and the staphylococcal chromosomal cassette mec (SCCmec) by horizontal gene transfer from other staphylococci. The potential virulence of a S. aureus strain is often determined by comparing its pulsed-field gel electrophoresis (PFGE) or multilocus sequence typing profiles to that of known epidemic or virulent clones and by PCR of the toxin genes. Whole-genome mapping (formerly optical mapping), which is a high-resolution ordered restriction mapping of a bacterial genome, is a relatively new genomic tool that allows comparative analysis across entire bacterial genomes to identify regions of genomic similarities and dissimilarities, including small and large insertions and deletions. We explored whether whole-genome maps (WGMs) of methicillin-resistant S. aureus (MRSA) could be used to predict the presence of methicillin resistance, SCCmec type, and Panton-Valentine leukocidin (PVL)-producing genes on an S. aureus genome. We determined the WGMs of 47 diverse clinical isolates of S. aureus, including well-characterized reference MRSA strains, and annotated the signature restriction pattern in SCCmec types, arginine catabolic mobile element (ACME), and PVL-carrying prophage, PhiSa2 or PhiSa2-like regions on the genome. WGMs of these isolates accurately characterized them as MRSA or methicillin-sensitive S. aureus based on the presence or absence of the SCCmec motif, ACME and the unique signature pattern for the prophage insertion that harbored the PVL genes. Susceptibility to methicillin resistance and the presence of mecA, SCCmec types, and PVL genes were confirmed by PCR. A WGM clustering approach was further able to discriminate isolates within the same PFGE clonal group. These results showed that WGMs could be used not only to genotype S. aureus but also to

  19. The genome BLASTatlas-a GeneWiz extension for visualization of whole-genome homology.

    Science.gov (United States)

    Hallin, Peter F; Binnewies, Tim T; Ussery, David W

    2008-05-01

    The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including the Clostridium tetani plasmid p88, where homologues for toxin genes can be easily visualized in other sequenced Clostridium genomes, and for a Clostridium botulinum genome, compared to 14 other Clostridium genomes. DNA structural information is also included in the atlas to visualize the DNA chromosomal context of regions. Additional information can be added to these plots, and as an example we have added circles showing the probability of the DNA helix opening up under superhelical tension. The tool is SOAP compliant and WSDL (web services description language) files are located on our website: (http://www.cbs.dtu.dk/ws/BLASTatlas), where programming examples are available in Perl. By providing an interoperable method to carry out whole genome visualization of homology, this service offers bioinformaticians as well as biologists an easy-to-adopt workflow that can be directly called from the programming language of the user, hence enabling automation of repeated tasks. This tool can be relevant in many pangenomic as well as in metagenomic studies, by giving a quick overview of clusters of insertion sites, genomic islands and overall homology between a reference sequence and a data set.

  20. Comparison of whole genome sequences from human and non-human Escherichia coli O26 strains

    Directory of Open Access Journals (Sweden)

    Keri N Norman

    2015-03-01

    Full Text Available Shiga toxin-producing Escherichia coli (STEC O26 is the second leading E. coli serogroup responsible for human illness outbreaks behind E. coli O157:H7. Recent outbreaks have been linked to emerging pathogenic O26:H11 strains harboring stx2 only. Cattle have been recognized as an important reservoir of O26 strains harboring stx1; however the reservoir of these emerging stx2 strains is unknown. The objective of this study was to identify nucleotide polymorphisms in human and cattle-derived strains in order to compare differences in polymorphism derived genotypes and virulence gene profiles between the two host species. Whole genome sequencing was performed on 182 epidemiologically unrelated O26 strains, including 109 human-derived strains and 73 non-human-derived strains. A panel of 289 O26 strains (241 STEC and 48 non-STEC was subsequently genotyped using a set of 283 polymorphisms identified by whole genome sequencing, resulting in 64 unique genotypes. Phylogenetic analyses identified seven clusters within the O26 strains. The seven clusters did not distinguish between isolates originating from humans or cattle; however, clusters did correspond with particular virulence gene profiles. Human and non-human-derived strains harboring stx1 clustered separately from strains harboring stx2, strains harboring eae, and non-STEC strains. Strains harboring stx2 were more closely related to non-STEC strains and strains harboring eae than to strains harboring stx1. The finding of human and cattle-derived strains with the same polymorphism derived genotypes and similar virulence gene profiles, provides evidence that similar strains are found in cattle and humans and transmission between the two species may occur.

  1. Whole-genome sequencing approaches for conservation biology: advantages, limitations, and practical recommendations.

    Science.gov (United States)

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-07-26

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved (huWGR) or resolved haplotypes (hrWGR), the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq), and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in non-model species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons, and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g. structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently though, no single WGR approach fulfills all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many non-model species and fields including conservation biology. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  2. Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs)

    Science.gov (United States)

    Sims, Gregory E.; Kim, Sung-Hou

    2011-01-01

    A whole-genome phylogeny of the Escherichia coli/Shigella group was constructed by using the feature frequency profile (FFP) method. This alignment-free approach uses the frequencies of l-mer features of whole genomes to infer phylogenic distances. We present two phylogenies that accentuate different aspects of E. coli/Shigella genomic evolution: (i) one based on the compositions of all possible features of length l = 24 (∼8.4 million features), which are likely to reveal the phenetic grouping and relationship among the organisms and (ii) the other based on the compositions of core features with low frequency and low variability (∼0.56 million features), which account for ∼69% of all commonly shared features among 38 taxa examined and are likely to have genome-wide lineal evolutionary signal. Shigella appears as a single clade when all possible features are used without filtering of noncore features. However, results using core features show that Shigella consists of at least two distantly related subclades, implying that the subclades evolved into a single clade because of a high degree of convergence influenced by mobile genetic elements and niche adaptation. In both FFP trees, the basal group of the E. coli/Shigella phylogeny is the B2 phylogroup, which contains primarily uropathogenic strains, suggesting that the E. coli/Shigella ancestor was likely a facultative or opportunistic pathogen. The extant commensal strains diverged relatively late and appear to be the result of reductive evolution of genomes. We also identify clade distinguishing features and their associated genomic regions within each phylogroup. Such features may provide useful information for understanding evolution of the groups and for quick diagnostic identification of each phylogroup. PMID:21536867

  3. Integrated clinical, whole-genome, and transcriptome analysis of multisampled lethal metastatic prostate cancer

    Science.gov (United States)

    Bova, G. Steven; Kallio, Heini M.L.; Annala, Matti; Kivinummi, Kati; Högnäs, Gunilla; Häyrynen, Sergei; Rantapero, Tommi; Kivinen, Virpi; Isaacs, William B.; Tolonen, Teemu; Nykter, Matti; Visakorpi, Tapio

    2016-01-01

    We report the first combined analysis of whole-genome sequence, detailed clinical history, and transcriptome sequence of multiple prostate cancer metastases in a single patient (A21). Whole-genome and transcriptome sequence was obtained from nine anatomically separate metastases, and targeted DNA sequencing was performed in cancerous and noncancerous foci within the primary tumor specimen removed 5 yr before death. Transcriptome analysis revealed increased expression of androgen receptor (AR)-regulated genes in liver metastases that harbored an AR p.L702H mutation, suggesting a dominant effect by the mutation despite being present in only one of an estimated 16 copies per cell. The metastases harbored several alterations to the PI3K/AKT pathway, including a clonal truncal mutation in PIK3CG and present in all metastatic sites studied. The list of truncal genomic alterations shared by all metastases included homozygous deletion of TP53, hemizygous deletion of RB1 and CHD1, and amplification of FGFR1. If the patient were treated today, given this knowledge, the use of second-generation androgen-directed therapies, cessation of glucocorticoid administration, and therapeutic inhibition of the PI3K/AKT pathway or FGFR1 receptor could provide personalized benefit. Three previously unreported truncal clonal missense mutations (ABCC4 p.R891L, ALDH9A1 p.W89R, and ASNA1 p.P75R) were expressed at the RNA level and assessed as druggable. The truncal status of mutations may be critical for effective actionability and merit further study. Our findings suggest that a large set of deeply analyzed cases could serve as a powerful guide to more effective prostate cancer basic science and personalized cancer medicine clinical trials. PMID:27148588

  4. Whole-genome analysis of multienvironment or multitrait QTL in MAGIC.

    Science.gov (United States)

    Verbyla, Arūnas P; Cavanagh, Colin R; Verbyla, Klara L

    2014-09-18

    Multiparent Advanced Generation Inter-Cross (MAGIC) populations are now being utilized to more accurately identify the underlying genetic basis of quantitative traits through quantitative trait loci (QTL) analyses and subsequent gene discovery. The expanded genetic diversity present in such populations and the amplified number of recombination events mean that QTL can be identified at a higher resolution. Most QTL analyses are conducted separately for each trait within a single environment. Separate analysis does not take advantage of the underlying correlation structure found in multienvironment or multitrait data. By using this information in a joint analysis-be it multienvironment or multitrait - it is possible to gain a greater understanding of genotype- or QTL-by-environment interactions or of pleiotropic effects across traits. Furthermore, this can result in improvements in accuracy for a range of traits or in a specific target environment and can influence selection decisions. Data derived from MAGIC populations allow for founder probabilities of all founder alleles to be calculated for each individual within the population. This presents an additional layer of complexity and information that can be utilized to identify QTL. A whole-genome approach is proposed for multienvironment and multitrait QTL analysis in MAGIC. The whole-genome approach simultaneously incorporates all founder probabilities at each marker for all individuals in the analysis, rather than using a genome scan. A dimension reduction technique is implemented, which allows for high-dimensional genetic data. For each QTL identified, sizes of effects for each founder allele, the percentage of genetic variance explained, and a score to reflect the strength of the QTL are found. The approach was demonstrated to perform well in a small simulation study and for two experiments, using a wheat MAGIC population. Copyright © 2014 Verbyla et al.

  5. A first generation whole genome RH map of the river buffalo with comparison to domestic cattle

    Directory of Open Access Journals (Sweden)

    Tantia Madhu S

    2008-12-01

    Full Text Available Abstract Background The recently constructed river buffalo whole-genome radiation hybrid panel (BBURH5000 has already been used to generate preliminary radiation hybrid (RH maps for several chromosomes, and buffalo-bovine comparative chromosome maps have been constructed. Here, we present the first-generation whole genome RH map (WG-RH of the river buffalo generated from cattle-derived markers. The RH maps aligned to bovine genome sequence assembly Btau_4.0, providing valuable comparative mapping information for both species. Results A total of 3990 markers were typed on the BBURH5000 panel, of which 3072 were cattle derived SNPs. The remaining 918 were classified as cattle sequence tagged site (STS, including coding genes, ESTs, and microsatellites. Average retention frequency per chromosome was 27.3% calculated with 3093 scorable markers distributed in 43 linkage groups covering all autosomes (24 and the X chromosomes at a LOD ≥ 8. The estimated total length of the WG-RH map is 36,933 cR5000. Fewer than 15% of the markers (472 could not be placed within any linkage group at a LOD score ≥ 8. Linkage group order for each chromosome was determined by incorporation of markers previously assigned by FISH and by alignment with the bovine genome sequence assembly (Btau_4.0. Conclusion We obtained radiation hybrid chromosome maps for the entire river buffalo genome based on cattle-derived markers. The alignments of our RH maps to the current bovine genome sequence assembly (Btau_4.0 indicate regions of possible rearrangements between the chromosomes of both species. The river buffalo represents an important agricultural species whose genetic improvement has lagged behind other species due to limited prior genomic characterization. We present the first-generation RH map which provides a more extensive resource for positional candidate cloning of genes associated with complex traits and also for large-scale physical mapping of the river buffalo

  6. Views of American OB/GYNs on the ethics of prenatal whole-genome sequencing.

    Science.gov (United States)

    Bayefsky, Michelle J; White, Amina; Wakim, Paul; Hull, Sara Chandros; Wasserman, David; Chen, Stephanie; Berkman, Benjamin E

    2016-12-01

    Given public demand for genetic information, the potential to perform prenatal whole-genome sequencing (PWGS) non-invasively in the future, and decreasing costs of whole-genome sequencing, it is likely that OB/GYN practice will include PWGS. The goal of this project was to explore OB/GYNs' views on the ethical issues surrounding PWGS and their preparedness for counseling patients on its use. A national survey was administered to 2500 members of American Congress of Obstetricians and Gynecologists. A total of 1114 respondents completed the survey (response rate = 45%). OB/GYNs are most concerned with ordering non-medical fetal genetic information, are worried about increasing parental anxiety, and feel it is appropriate to be directive when counseling parents about PWGS. Furthermore, most OB/GYNs have limited knowledge of genetics, rely heavily on genetic counselors and would like more guidance regarding the clinical adoption of PWGS. OB/GYNs do not completely accept or reject PWGS, but a substantial number have significant ethical and practical concerns. They are most concerned with issues that will directly affect their practices and interactions with patients, such as increasing parental anxiety and costs of care. Professional guidance would be instrumental in directing the adoption of PWGS and alleviating the ethical burden posed by PWGS on individual OB/GYNs. Published 2016. This article is a U.S. Government work and is in the public domain in the USA. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

  7. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  8. Rapid bacterial whole-genome sequencing to enhance diagnostic and public health microbiology.

    Science.gov (United States)

    Reuter, Sandra; Ellington, Matthew J; Cartwright, Edward J P; Köser, Claudio U; Török, M Estée; Gouliouris, Theodore; Harris, Simon R; Brown, Nicholas M; Holden, Matthew T G; Quail, Mike; Parkhill, Julian; Smith, Geoffrey P; Bentley, Stephen D; Peacock, Sharon J

    2013-08-12

    The latest generation of benchtop DNA sequencing platforms can provide an accurate whole-genome sequence (WGS) for a broad range of bacteria in less than a day. These could be used to more effectively contain the spread of multidrug-resistant pathogens. To compare WGS with standard clinical microbiology practice for the investigation of nosocomial outbreaks caused by multidrug-resistant bacteria, the identification of genetic determinants of antimicrobial resistance, and typing of other clinically important pathogens. A laboratory-based study of hospital inpatients with a range of bacterial infections at Cambridge University Hospitals NHS Foundation Trust, a secondary and tertiary referral center in England, comparing WGS with standard diagnostic microbiology using stored bacterial isolates and clinical information. Specimens were taken and processed as part of routine clinical care, and cultured isolates stored and referred for additional reference laboratory testing as necessary. Isolates underwent DNA extraction and library preparation prior to sequencing on the Illumina MiSeq platform. Bioinformatic analyses were performed by persons blinded to the clinical, epidemiologic, and antimicrobial susceptibility data. We investigated 2 putative nosocomial outbreaks, one caused by vancomycin-resistant Enterococcus faecium and the other by carbapenem-resistant Enterobacter cloacae; WGS accurately discriminated between outbreak and nonoutbreak isolates and was superior to conventional typing methods. We compared WGS with standard methods for the identification of the mechanism of carbapenem resistance in a range of gram-negative bacteria (Acinetobacter baumannii, E cloacae, Escherichia coli, and Klebsiella pneumoniae). This demonstrated concordance between phenotypic and genotypic results, and the ability to determine whether resistance was attributable to the presence of carbapenemases or other resistance mechanisms. Whole-genome sequencing was used to recapitulate

  9. Phylogenetics and differentiation of Salmonella Newport lineages by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Guojie Cao

    Full Text Available Salmonella Newport has ranked in the top three Salmonella serotypes associated with foodborne outbreaks from 1995 to 2011 in the United States. In the current study, we selected 26 S. Newport strains isolated from diverse sources and geographic locations and then conducted 454 shotgun pyrosequencing procedures to obtain 16-24 × coverage of high quality draft genomes for each strain. Comparative genomic analysis of 28 S. Newport strains (including 2 reference genomes and 15 outgroup genomes identified more than 140,000 informative SNPs. A resulting phylogenetic tree consisted of four sublineages and indicated that S. Newport had a clear geographic structure. Strains from Asia were divergent from those from the Americas. Our findings demonstrated that analysis using whole genome sequencing data resulted in a more accurate picture of phylogeny compared to that using single genes or small sets of genes. We selected loci around the mutS gene of S. Newport to differentiate distinct lineages, including those between invH and mutS genes at the 3' end of Salmonella Pathogenicity Island 1 (SPI-1, ste fimbrial operon, and Clustered, Regularly Interspaced, Short Palindromic Repeats (CRISPR associated-proteins (cas. These genes in the outgroup genomes held high similarity with either S. Newport Lineage II or III at the same loci. S. Newport Lineages II and III have different evolutionary histories in this region and our data demonstrated genetic flow and homologous recombination events around mutS. The findings suggested that S. Newport Lineages II and III diverged early in the serotype evolution and have evolved largely independently. Moreover, we identified genes that could delineate sublineages within the phylogenetic tree and that could be used as potential biomarkers for trace-back investigations during outbreaks. Thus, whole genome sequencing data enabled us to better understand the genetic background of pathogenicity and evolutionary history of S

  10. Light whole genome sequence for SNP discovery across domestic cat breeds

    Directory of Open Access Journals (Sweden)

    Driscoll Carlos

    2010-06-01

    Full Text Available Abstract Background The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline coronavirus -- FECV, feline immunodeficiency virus - FIV that are homologues to human scourges (cancer, SARS, and AIDS respectively. However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP map is required in order to accomplish disease and phenotype association discovery. Description To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. Conclusions These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.

  11. A model for carbohydrate metabolism in the diatom Phaeodactylum tricornutum deduced from comparative whole genome analysis.

    Directory of Open Access Journals (Sweden)

    Peter G Kroth

    Full Text Available BACKGROUND: Diatoms are unicellular algae responsible for approximately 20% of global carbon fixation. Their evolution by secondary endocytobiosis resulted in a complex cellular structure and metabolism compared to algae with primary plastids. METHODOLOGY/PRINCIPAL FINDINGS: The whole genome sequence of the diatom Phaeodactylum tricornutum has recently been completed. We identified and annotated genes for enzymes involved in carbohydrate pathways based on extensive EST support and comparison to the whole genome sequence of a second diatom, Thalassiosira pseudonana. Protein localization to mitochondria was predicted based on identified similarities to mitochondrial localization motifs in other eukaryotes, whereas protein localization to plastids was based on the presence of signal peptide motifs in combination with plastid localization motifs previously shown to be required in diatoms. We identified genes potentially involved in a C4-like photosynthesis in P. tricornutum and, on the basis of sequence-based putative localization of relevant proteins, discuss possible differences in carbon concentrating mechanisms and CO(2 fixation between the two diatoms. We also identified genes encoding enzymes involved in photorespiration with one interesting exception: glycerate kinase was not found in either P. tricornutum or T. pseudonana. Various Calvin cycle enzymes were found in up to five different isoforms, distributed between plastids, mitochondria and the cytosol. Diatoms store energy either as lipids or as chrysolaminaran (a beta-1,3-glucan outside of the plastids. We identified various beta-glucanases and large membrane-bound glucan synthases. Interestingly most of the glucanases appear to contain C-terminal anchor domains that may attach the enzymes to membranes. CONCLUSIONS/SIGNIFICANCE: Here we present a detailed synthesis of carbohydrate metabolism in diatoms based on the genome sequences of Thalassiosira pseudonana and Phaeodactylum tricornutum

  12. Whole genome investigation of a divergent clade of the pathogen Streptococcus suis

    Directory of Open Access Journals (Sweden)

    Abiyad eBaig

    2015-11-01

    Full Text Available Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN and cpn60 did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70, of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species.

  13. Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes

    Directory of Open Access Journals (Sweden)

    Feltus F

    2011-04-01

    Full Text Available Abstract Background We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. Results The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size reads (15L-5P on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. Conclusions BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.

  14. Whole genome sequencing of Mycobacterium tuberculosis for detection of drug resistance: a systematic review.

    Science.gov (United States)

    Papaventsis, D; Casali, N; Kontsevaya, I; Drobniewski, F; Cirillo, D M; Nikolayevskyy, V

    2017-02-01

    We conducted a systematic review to determine the diagnostic accuracy of whole genome sequencing (WGS) of Mycobacterium tuberculosis for the detection of resistance to first- and second-line anti-tuberculosis (TB) drugs. The study was conducted according to the criteria of the Preferred Reporting Items for Systematic Reviews group. A total of 20 publications were included. The sensitivity, specificity, positive-predictive value and negative-predictive value of WGS using phenotypic drug susceptibility testing methods as a reference standard were determined. Anti-TB agents tested included all first-line drugs, a variety of reserve drugs, as well as new drugs. Polymorphisms in a total of 53 genes were tested for associations with drug resistance. Pooled sensitivity and specificity values for detection of resistance to selected first-line drugs were 0.98 (95% CI 0.93-0.98) and 0.98 (95% CI 0.98-1.00) for rifampicin and 0.97 (95% CI 0.94-0.99) and 0.93 (95% CI 0.91-0.96) for isoniazid, respectively. Due to high heterogeneity in study designs, lack of data, knowledge of resistance mechanisms and clarity on exclusion of phylogenetic markers, there was a significant variation in analytical performance of WGS for the remaining first-line, reserved drugs and new drugs. Whole genome sequencing could be considered a promising alternative to existing phenotypic and molecular drug susceptibility testing methods for rifampicin and isoniazid pending standardization of analytical pipelines. To ensure clinical relevance of WGS for detection of M. tuberculosis complex drug resistance, future studies should include information on clinical outcomes. Crown Copyright © 2016. Published by Elsevier Ltd. All rights reserved.

  15. Tumor Touch Imprints as Source for Whole Genome Analysis of Neuroblastoma Tumors

    Science.gov (United States)

    Brunner, Clemens; Brunner-Herglotz, Bettina; Ziegler, Andrea; Frech, Christian; Amann, Gabriele; Ladenstein, Ruth; Ambros, Inge M.; Ambros, Peter F.

    2016-01-01

    Introduction Tumor touch imprints (TTIs) are routinely used for the molecular diagnosis of neuroblastomas by interphase fluorescence in-situ hybridization (I-FISH). However, in order to facilitate a comprehensive, up-to-date molecular diagnosis of neuroblastomas and to identify new markers to refine risk and therapy stratification methods, whole genome approaches are needed. We examined the applicability of an ultra-high density SNP array platform that identifies copy number changes of varying sizes down to a few exons for the detection of genomic changes in tumor DNA extracted from TTIs. Material and Methods DNAs were extracted from TTIs of 46 neuroblastoma and 4 other pediatric tumors. The DNAs were analyzed on the Cytoscan HD SNP array platform to evaluate numerical and structural genomic aberrations. The quality of the data obtained from TTIs was compared to that from randomly chosen fresh or fresh frozen solid tumors (n = 212) and I-FISH validation was performed. Results SNP array profiles were obtained from 48 (out of 50) TTI DNAs of which 47 showed genomic aberrations. The high marker density allowed for single gene analysis, e.g. loss of nine exons in the ATRX gene and the visualization of chromothripsis. Data quality was comparable to fresh or fresh frozen tumor SNP profiles. SNP array results were confirmed by I-FISH. Conclusion TTIs are an excellent source for SNP array processing with the advantage of simple handling, distribution and storage of tumor tissue on glass slides. The minimal amount of tumor tissue needed to analyze whole genomes makes TTIs an economic surrogate source in the molecular diagnostic work up of tumor samples. PMID:27560999

  16. Acne vulgaris

    OpenAIRE

    Purdy, Sarah; DeBerker, David

    2008-01-01

    Acne vulgaris affects over 80% of teenagers, and persists beyond the age of 25 years in 3% of men and 12% of women. Typical lesions of acne include comedones, inflammatory papules, and pustules. Nodules and cysts occur in more severe acne, and can cause scarring and psychological distress.

  17. Acne vulgaris

    OpenAIRE

    Purdy, Sarah; de Berker, David

    2011-01-01

    Acne vulgaris affects over 80% of teenagers, and persists beyond the age of 25 years in 3% of men and 12% of women. Typical lesions of acne include comedones, inflammatory papules, and pustules. Nodules and cysts occur in more severe acne, and can cause scarring and psychological distress.

  18. Whole-genome sequences of 13 endophytic bacteria isolated from shrub willow (salix) grown in geneva, new york.

    Science.gov (United States)

    Gan, Huan You; Gan, Han Ming; Savka, Michael A; Triassi, Alexander J; Wheatley, Matthew S; Smart, Lawrence B; Fabio, Eric S; Hudson, André O

    2014-05-08

    Shrub willow, Salix spp. and hybrids, is an important bioenergy crop. Here we report the whole-genome sequences and annotation of 13 endophytic bacteria from stem tissues of Salix purpurea grown in nature and from commercial cultivars and Salix viminalis × Salix miyabeana grown in bioenergy fields in Geneva, New York.

  19. Whole-genome pyrosequencing of an epidemic multidrug-resistant Acinetobacter baumannii strain belonging to the European clone II group

    DEFF Research Database (Denmark)

    Iacono, M.; Villa, L.; Fortini, D.

    2008-01-01

    The whole-genome sequence of an epidemic, multidrug-resistant Acinetobacter baumannii strain (strain ACICU) belonging to the European clone II group and carrying the plasmid-mediated bla(OXA-58) carbapenem resistance gene was determined. The A. baumannii ACICU genome was compared with the genomes...

  20. Whole-Genome Sequencing for Routine Pathogen Surveillance in Public Health : a Population Snapshot of Invasive Staphylococcus aureus in Europe

    NARCIS (Netherlands)

    Aanensen, David M; Feil, Edward J; Holden, Matthew T G; Dordel, Janina; Yeats, Corin A; Fedosejev, Artemij; Goater, Richard; Castillo-Ramírez, Santiago; Corander, Jukka; Colijn, Caroline; Chlebowicz, Monika A; Schouls, Leo; Heck, Max; Pluister, Gerlinde; Ruimy, Raymond; Kahlmeter, Gunnar; Åhman, Jenny; Matuschek, Erika; Friedrich, Alexander W; Parkhill, Julian; Bentley, Stephen D; Spratt, Brian G; Grundmann, Hajo

    2016-01-01

    The implementation of routine whole-genome sequencing (WGS) promises to transform our ability to monitor the emergence and spread of bacterial pathogens. Here we combined WGS data from 308 invasive Staphylococcus aureus isolates corresponding to a pan-European population snapshot, with

  1. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli

    DEFF Research Database (Denmark)

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming c...

  2. A phylogenetic strategy based on a legume-specific whole genome duplication yields symbiotic cytokinin type-A Response Regulators

    NARCIS (Netherlands)

    Camp, Op den R.; Mita, De S.; Lillo, A.; Cao, Q.; Limpens, E.H.M.; Bisseling, T.; Geurts, R.

    2011-01-01

    Legumes host their rhizobium symbiont in novel root organs, called nodules. Nodules originate from differentiated root cortical cells that de-differentiate and subsequently form nodule primordia, a process controlled by cytokinin. A whole genome duplication (WGD) has occurred at the root of the legu

  3. A phylogenetic strategy based on a legume-specific whole genome duplication yields symbiotic cytokinin type-A Response Regulators

    NARCIS (Netherlands)

    Camp, Op den R.; Mita, De S.; Lillo, A.; Cao, Q.; Limpens, E.H.M.; Bisseling, T.; Geurts, R.

    2011-01-01

    Legumes host their rhizobium symbiont in novel root organs, called nodules. Nodules originate from differentiated root cortical cells that de-differentiate and subsequently form nodule primordia, a process controlled by cytokinin. A whole genome duplication (WGD) has occurred at the root of the

  4. Direct DNA Extraction from Mycobacterium tuberculosis Frozen Stocks as a Reculture-Independent Approach to Whole-Genome Sequencing.

    Science.gov (United States)

    Bjorn-Mortensen, K; Zallet, J; Lillebaek, T; Andersen, A B; Niemann, S; Rasmussen, E M; Kohl, T A

    2015-08-01

    Culturing before DNA extraction represents a major time-consuming step in whole-genome sequencing of slow-growing bacteria, such as Mycobacterium tuberculosis. We report a workflow to extract DNA from frozen isolates without reculturing. Prepared libraries and sequence data were comparable with results from recultured aliquots of the same stocks. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  5. Direct DNA Extraction from Mycobacterium tuberculosis Frozen Stocks as a Reculture-Independent Approach to Whole-Genome Sequencing

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Zallet, J; Lillebaek, T

    2015-01-01

    Culturing before DNA extraction represents a major time-consuming step in whole-genome sequencing of slow-growing bacteria, such as Mycobacterium tuberculosis. We report a workflow to extract DNA from frozen isolates without reculturing. Prepared libraries and sequence data were comparable...

  6. Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Guldbrandtsen, Bernt; Sahana, Goutam

    2014-01-01

    Background The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from...

  7. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...

  8. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections.

    Directory of Open Access Journals (Sweden)

    Pimlapas Leekitcharoenphon

    Full Text Available Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections.

  9. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Thorup Nielsen, Mette

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely...

  10. Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

    NARCIS (Netherlands)

    Pandit, Aridaman; de Boer, Rob J

    2014-01-01

    BACKGROUND: Following transmission, HIV-1 evolves into a diverse population, and next generation sequencing enables us to detect variants occurring at low frequencies. Studying viral evolution at the level of whole genomes was hitherto not possible because next generation sequencing delivers relativ

  11. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland

    Science.gov (United States)

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten

    2016-01-01

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture.

  12. Draft Whole-Genome Sequence of a Haemophilus quentini Strain Isolated from an Infant in the United Kingdom

    Science.gov (United States)

    Baxter, Laura; Thompson, Sarah; Collery, Mark M.; Hand, Daniel C.; Fink, Colin G.

    2016-01-01

    Haemophilus quentini is a rare and distinct genospecies of Haemophilus that has been suggested as a cause of neonatal bacteremia and urinary tract infections in men. We present the draft whole-genome sequence of H. quentini MP1 isolated from an infant in the United Kingdom, aiding future identification and detection of this pathogen.

  13. Selection of Unique Escherichia coli Clones by Random Amplified Polymorphic DNA (RAPD): Evaluation by Whole Genome Sequencing

    Science.gov (United States)

    Nielsen, Karen L.; Godfrey, Paul A.; Stegger, Marc; Andersen, Paal S.; Feldgarden, Michael; Frimodt-Møller, Niels

    2014-01-01

    Identifying and characterizing clonal diversity is important when analysing fecal flora. We evaluated random amplified polymorphic DNA (RAPD) PCR, applied for selection of Escherichia coli isolates, by whole genome sequencing. RAPD was fast, and reproducible as screening method for selection of distinct E. coli clones in fecal swabs. PMID:24912108

  14. High-Quality Draft Whole-Genome Sequences of Three Strains of Enterobacter Isolated from Jamaican Dioscorea cayenensis (Yellow Yam)

    OpenAIRE

    Gan, Han Ming; Triassi, Alexander J.; Wheatley, Matthew S.; Savka, Michael A.; Hudson, André O.

    2014-01-01

    Here we report the whole-genome sequences of three endophytic bacteria, Enterobacter sp. strain DC1, Enterobacter sp. strain DC3, and Enterobacter sp. strain DC4, from root tubers of the yellow yam plant, Dioscorea cayenensis. Preliminary analyses suggest that the genomes of the three bacteria contain genes involved in acetoin and indole-3-acetic acid metabolism.

  15. Whole-genome amplified DNA from stored dried blood spots is reliable in high resolution melting curve and sequencing analysis

    DEFF Research Database (Denmark)

    Winkel, Bo G; Hollegaard, Mads Vilhelm; Olesen, Morten S;

    2011-01-01

    The use of dried blood spots (DBS) samples in genomic workup has been limited by the relative low amounts of genomic DNA (gDNA) they contain. It remains to be proven that whole genome amplified DNA (wgaDNA) from stored DBS samples, constitutes a reliable alternative to gDNA.We wanted to compare m...

  16. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland.

    Science.gov (United States)

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten; Lefort, François

    2016-10-06

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture. Copyright © 2016 Crovadore et al.

  17. Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods

    DEFF Research Database (Denmark)

    Ahrenfeldt, Johanne; Skaarup, Carina; Hasman, Henrik;

    2017-01-01

    , consensus whole-genome sequences, as well as descriptions of the known phylogeny in a variety of formats) publicly available, with the hope that other groups may find this data useful for benchmarking and exploring the performance of epidemiological methods. All data is freely available at: https://cge.cbs.dtu.dk/services/evolution_data.php....

  18. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture.

    Science.gov (United States)

    Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang; Estrada, Karol; Rosello-Diez, Alberto; Leo, Paul J; Dahia, Chitra L; Park-Min, Kyung Hyun; Tobias, Jonathan H; Kooperberg, Charles; Kleinman, Aaron; Styrkarsdottir, Unnur; Liu, Ching-Ti; Uggla, Charlotta; Evans, Daniel S; Nielson, Carrie M; Walter, Klaudia; Pettersson-Kymmer, Ulrika; McCarthy, Shane; Eriksson, Joel; Kwan, Tony; Jhamai, Mila; Trajanoska, Katerina; Memari, Yasin; Min, Josine; Huang, Jie; Danecek, Petr; Wilmot, Beth; Li, Rui; Chou, Wen-Chi; Mokry, Lauren E; Moayyeri, Alireza; Claussnitzer, Melina; Cheng, Chia-Ho; Cheung, Warren; Medina-Gómez, Carolina; Ge, Bing; Chen, Shu-Huang; Choi, Kwangbom; Oei, Ling; Fraser, James; Kraaij, Robert; Hibbs, Matthew A; Gregson, Celia L; Paquette, Denis; Hofman, Albert; Wibom, Carl; Tranah, Gregory J; Marshall, Mhairi; Gardiner, Brooke B; Cremin, Katie; Auer, Paul; Hsu, Li; Ring, Sue; Tung, Joyce Y; Thorleifsson, Gudmar; Enneman, Anke W; van Schoor, Natasja M; de Groot, Lisette C P G M; van der Velde, Nathalie; Melin, Beatrice; Kemp, John P; Christiansen, Claus; Sayers, Adrian; Zhou, Yanhua; Calderari, Sophie; van Rooij, Jeroen; Carlson, Chris; Peters, Ulrike; Berlivet, Soizik; Dostie, Josée; Uitterlinden, Andre G; Williams, Stephen R; Farber, Charles; Grinberg, Daniel; LaCroix, Andrea Z; Haessler, Jeff; Chasman, Daniel I; Giulianini, Franco; Rose, Lynda M; Ridker, Paul M; Eisman, John A; Nguyen, Tuan V; Center, Jacqueline R; Nogues, Xavier; Garcia-Giralt, Natalia; Launer, Lenore L; Gudnason, Vilmunder; Mellström, Dan; Vandenput, Liesbeth; Amin, Najaf; van Duijn, Cornelia M; Karlsson, Magnus K; Ljunggren, Östen; Svensson, Olle; Hallmans, Göran; Rousseau, François; Giroux, Sylvie; Bussière, Johanne; Arp, Pascal P; Koromani, Fjorda; Prince, Richard L; Lewis, Joshua R; Langdahl, Bente L; Hermann, A Pernille; Jensen, Jens-Erik B; Kaptoge, Stephen; Khaw, Kay-Tee; Reeve, Jonathan; Formosa, Melissa M; Xuereb-Anastasi, Angela; Åkesson, Kristina; McGuigan, Fiona E; Garg, Gaurav; Olmos, Jose M; Zarrabeitia, Maria T; Riancho, Jose A; Ralston, Stuart H; Alonso, Nerea; Jiang, Xi; Goltzman, David; Pastinen, Tomi; Grundberg, Elin; Gauguier, Dominique; Orwoll, Eric S; Karasik, David; Davey-Smith, George; Smith, Albert V; Siggeirsdottir, Kristin; Harris, Tamara B; Zillikens, M Carola; van Meurs, Joyce B J; Thorsteinsdottir, Unnur; Maurano, Matthew T; Timpson, Nicholas J; Soranzo, Nicole; Durbin, Richard; Wilson, Scott G; Ntzani, Evangelia E; Brown, Matthew A; Stefansson, Kari; Hinds, David A; Spector, Tim; Cupples, L Adrienne; Ohlsson, Claes; Greenwood, Celia M T; Jackson, Rebecca D; Rowe, David W; Loomis, Cynthia A; Evans, David M; Ackert-Bicknell, Cheryl L; Joyner, Alexandra L; Duncan, Emma L; Kiel, Douglas P; Rivadeneira, Fernando; Richards, J Brent

    2015-10-01

    The extent to which low-frequency (minor allele frequency (MAF) between 1-5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is mainly unknown. Bone mineral density (BMD) is highly heritable, a major predictor of osteoporotic fractures, and has been previously associated with common genetic variants, as well as rare, population-specific, coding variants. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication genotyping (n = 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size fourfold larger than the mean of previously reported common variants for lumbar spine BMD (rs11692564(T), MAF = 1.6%, replication effect size = +0.20 s.d., Pmeta = 2 × 10(-14)), which was also associated with a decreased risk of fracture (odds ratio = 0.85; P = 2 × 10(-11); ncases = 98,742 and ncontrols = 409,511). Using an En1(cre/flox) mouse model, we observed that conditional loss of En1 results in low bone mass, probably as a consequence of high bone turnover. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817(T), MAF = 1.2%, replication effect size = +0.41 s.d., Pmeta = 1 × 10(-11)). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture

  19. Assessment of REPLI-g Multiple Displacement Whole Genome Amplification (WGA) Techniques for Metagenomic Applications.

    Science.gov (United States)

    Ahsanuddin, Sofia; Afshinnekoo, Ebrahim; Gandara, Jorge; Hakyemezoğlu, Mustafa; Bezdan, Daniela; Minot, Samuel; Greenfield, Nick; Mason, Christopher E

    2017-04-01

    Amplification of minute quantities of DNA is a fundamental challenge in low-biomass metagenomic and microbiome studies because of potential biases in coverage, guanine-cytosine (GC) content, and altered species abundances. Whole genome amplification (WGA), although widely used, is notorious for introducing artifact sequences, either by amplifying laboratory contaminants or by nonrandom amplification of a sample's DNA. In this study, we investigate the effect of REPLI-g multiple displacement amplification (MDA; Qiagen, Valencia, CA, USA) on sequencing data quality and species abundance detection in 8 paired metagenomic samples and 1 titrated, mixed control sample. We extracted and sequenced genomic DNA (gDNA) from 8 environmental samples and compared the quality of the sequencing data for the MDA and their corresponding non-MDA samples. The degree of REPLI-g MDA bias was evaluated by sequence metrics, species composition, and cross-validating observed species abundance and species diversity estimates using the One Codex and MetaPhlAn taxonomic classification tools. Here, we provide evidence of the overall efficacy of REPLI-g MDA on retaining sequencing data quality and species abundance measurements while providing increased yields of high-fidelity DNA. We find that species abundance estimates are largely consistent across samples, even with REPLI-g amplification, as demonstrated by the Spearman's rank order coefficient (R(2) > 0.8). However, REPLI-g MDA often produced fewer classified reads at the species, genera, and family level, resulting in decreased species diversity. We also observed some areas with the PCR "jackpot effect," with varying input DNA values for the Metagenomics Research Group (MGRG) controls at specific genomic loci. We visualize this effect in whole genome coverage plots and with sequence composition analyses and note these caveats of the MDA method. Despite overall concordance of species abundance between the amplified and unamplified samples

  20. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction.

    Science.gov (United States)

    Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S

    2015-06-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome

  1. Whole-genome gene expression profiling of formalin-fixed, paraffin-embedded tissue samples.

    Directory of Open Access Journals (Sweden)

    Craig April

    Full Text Available BACKGROUND: We have developed a gene expression assay (Whole-Genome DASL, capable of generating whole-genome gene expression profiles from degraded samples such as formalin-fixed, paraffin-embedded (FFPE specimens. METHODOLOGY/PRINCIPAL FINDINGS: We demonstrated a similar level of sensitivity in gene detection between matched fresh-frozen (FF and FFPE samples, with the number and overlap of probes detected in the FFPE samples being approximately 88% and 95% of that in the corresponding FF samples, respectively; 74% of the differentially expressed probes overlapped between the FF and FFPE pairs. The WG-DASL assay is also able to detect 1.3-1.5 and 1.5-2 -fold changes in intact and FFPE samples, respectively. The dynamic range for the assay is approximately 3 logs. Comparing the WG-DASL assay with an in vitro transcription-based labeling method yielded fold-change correlations of R(2 approximately 0.83, while fold-change comparisons with quantitative RT-PCR assays yielded R(2 approximately 0.86 and R(2 approximately 0.55 for intact and FFPE samples, respectively. Additionally, the WG-DASL assay yielded high self-correlations (R(2>0.98 with low intact RNA inputs ranging from 1 ng to 100 ng; reproducible expression profiles were also obtained with 250 pg total RNA (R(2 approximately 0.92, with approximately 71% of the probes detected in 100 ng total RNA also detected at the 250 pg level. When FFPE samples were assayed, 1 ng total RNA yielded self-correlations of R(2 approximately 0.80, while still maintaining a correlation of R(2 approximately 0.75 with standard FFPE inputs (200 ng. CONCLUSIONS/SIGNIFICANCE: Taken together, these results show that WG-DASL assay provides a reliable platform for genome-wide expression profiling in archived materials. It also possesses utility within clinical settings where only limited quantities of samples may be available (e.g. microdissected material or when minimally invasive procedures are performed (e

  2. Assessment of REPLI-g Multiple Displacement Whole Genome Amplification (WGA) Techniques for Metagenomic Applications

    Science.gov (United States)

    Ahsanuddin, Sofia; Afshinnekoo, Ebrahim; Gandara, Jorge; Hakyemezoğlu, Mustafa; Bezdan, Daniela; Minot, Samuel; Greenfield, Nick; Mason, Christopher E.

    2017-01-01

    Amplification of minute quantities of DNA is a fundamental challenge in low-biomass metagenomic and microbiome studies because of potential biases in coverage, guanine-cytosine (GC) content, and altered species abundances. Whole genome amplification (WGA), although widely used, is notorious for introducing artifact sequences, either by amplifying laboratory contaminants or by nonrandom amplification of a sample’s DNA. In this study, we investigate the effect of REPLI-g multiple displacement amplification (MDA; Qiagen, Valencia, CA, USA) on sequencing data quality and species abundance detection in 8 paired metagenomic samples and 1 titrated, mixed control sample. We extracted and sequenced genomic DNA (gDNA) from 8 environmental samples and compared the quality of the sequencing data for the MDA and their corresponding non-MDA samples. The degree of REPLI-g MDA bias was evaluated by sequence metrics, species composition, and cross-validating observed species abundance and species diversity estimates using the One Codex and MetaPhlAn taxonomic classification tools. Here, we provide evidence of the overall efficacy of REPLI-g MDA on retaining sequencing data quality and species abundance measurements while providing increased yields of high-fidelity DNA. We find that species abundance estimates are largely consistent across samples, even with REPLI-g amplification, as demonstrated by the Spearman’s rank order coefficient (R2 > 0.8). However, REPLI-g MDA often produced fewer classified reads at the species, genera, and family level, resulting in decreased species diversity. We also observed some areas with the PCR “jackpot effect,” with varying input DNA values for the Metagenomics Research Group (MGRG) controls at specific genomic loci. We visualize this effect in whole genome coverage plots and with sequence composition analyses and note these caveats of the MDA method. Despite overall concordance of species abundance between the amplified and unamplified

  3. Whole genome sequencing of Saccharomyces cerevisiae: from genotype to phenotype for improved metabolic engineering applications

    Directory of Open Access Journals (Sweden)

    Asadollahi Mohammad A

    2010-12-01

    Full Text Available Abstract Background The need for rapid and efficient microbial cell factory design and construction are possible through the enabling technology, metabolic engineering, which is now being facilitated by systems biology approaches. Metabolic engineering is often complimented by directed evolution, where selective pressure is applied to a partially genetically engineered strain to confer a desirable phenotype. The exact genetic modification or resulting genotype that leads to the improved phenotype is often not identified or understood to enable further metabolic engineering. Results In this work we performed whole genome high-throughput sequencing and annotation can be used to identify single nucleotide polymorphisms (SNPs between Saccharomyces cerevisiae strains S288c and CEN.PK113-7D. The yeast strain S288c was the first eukaryote sequenced, serving as the reference genome for the Saccharomyces Genome Database, while CEN.PK113-7D is a preferred laboratory strain for industrial biotechnology research. A total of 13,787 high-quality SNPs were detected between both strains (reference strain: S288c. Considering only metabolic genes (782 of 5,596 annotated genes, a total of 219 metabolism specific SNPs are distributed across 158 metabolic genes, with 85 of the SNPs being nonsynonymous (e.g., encoding amino acid modifications. Amongst metabolic SNPs detected, there was pathway enrichment in the galactose uptake pathway (GAL1, GAL10 and ergosterol biosynthetic pathway (ERG8, ERG9. Physiological characterization confirmed a strong deficiency in galactose uptake and metabolism in S288c compared to CEN.PK113-7D, and similarly, ergosterol content in CEN.PK113-7D was significantly higher in both glucose and galactose supplemented cultivations compared to S288c. Furthermore, DNA microarray profiling of S288c and CEN.PK113-7D in both glucose and galactose batch cultures did not provide a clear hypothesis for major phenotypes observed, suggesting that

  4. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  5. Whole Genome Expression Profiling and Signal Pathway Screening of MSCs in Ankylosing Spondylitis

    Directory of Open Access Journals (Sweden)

    Yuxi Li

    2014-01-01

    Full Text Available The pathogenesis of dysfunctional immunoregulation of mesenchymal stem cells (MSCs in ankylosing spondylitis (AS is thought to be a complex process that involves multiple genetic alterations. In this study, MSCs derived from both healthy donors and AS patients were cultured in normal media or media mimicking an inflammatory environment. Whole genome expression profiling analysis of 33,351 genes was performed and differentially expressed genes related to AS were analyzed by GO term analysis and KEGG pathway analysis. Our results showed that in normal media 676 genes were differentially expressed in AS, 354 upregulated and 322 downregulated, while in an inflammatory environment 1767 genes were differentially expressed in AS, 1230 upregulated and 537 downregulated. GO analysis showed that these genes were mainly related to cellular processes, physiological processes, biological regulation, regulation of biological processes, and binding. In addition, by KEGG pathway analysis, 14 key genes from the MAPK signaling and 8 key genes from the TLR signaling pathway were identified as differentially regulated. The results of qRT-PCR verified the expression variation of the 9 genes mentioned above. Our study found that in an inflammatory environment ankylosing spondylitis pathogenesis may be related to activation of the MAPK and TLR signaling pathways.

  6. Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications.

    Science.gov (United States)

    Tank, David C; Eastman, Jonathan M; Pennell, Matthew W; Soltis, Pamela S; Soltis, Douglas E; Hinchliff, Cody E; Brown, Joseph W; Sessa, Emily B; Harmon, Luke J

    2015-07-01

    Our growing understanding of the plant tree of life provides a novel opportunity to uncover the major drivers of angiosperm diversity. Using a time-calibrated phylogeny, we characterized hot and cold spots of lineage diversification across the angiosperm tree of life by modeling evolutionary diversification using stepwise AIC (MEDUSA). We also tested the whole-genome duplication (WGD) radiation lag-time model, which postulates that increases in diversification tend to lag behind established WGD events. Diversification rates have been incredibly heterogeneous throughout the evolutionary history of angiosperms and reveal a pattern of 'nested radiations' - increases in net diversification nested within other radiations. This pattern in turn generates a negative relationship between clade age and diversity across both families and orders. We suggest that stochastically changing diversification rates across the phylogeny explain these patterns. Finally, we demonstrate significant statistical support for the WGD radiation lag-time model. Across angiosperms, nested shifts in diversification led to an overall increasing rate of net diversification and declining relative extinction rates through time. These diversification shifts are only rarely perfectly associated with WGD events, but commonly follow them after a lag period. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  7. Whole genomic prediction of growth and carcass traits in a Chinese quality chicken population.

    Science.gov (United States)

    Zhang, Z; Xu, Z-Q; Luo, Y-Y; Zhang, H-B; Gao, N; He, J-L; Ji, C-L; Zhang, D-X; Li, J-Q; Zhang, X-Q

    2017-01-01

    By incorporating high-density markers into breeding value prediction models, the whole genomic prediction (WGP) method can effectively accelerate genetic improvement in livestock breeding. However, the performance of WGP varies across species and populations and is affected by the underlying genetic architecture. In particular, very little is known about the performance of WGP for many chicken breeds. Here we estimate the genetic parameters and evaluate the performance of WGP for 18 growth and carcass traits in a Chinese quality chicken population. In total, 435 chickens were systematically phenotyped and genotyped using a 600K genotyping array. Two variance component estimation scenarios, 3 breeding value prediction methods, and 2 validation procedures were compared. The results showed that the heritability of these 18 traits was medium to high (ranging from 0.28 to 0.60) and that deviations existed between the heritability estimated from pedigrees and markers. Compared with conventional breeding methods, WGP could potentially increase the selection accuracy by 20% or more depending on the prediction model used, the trait under consideration, and the genetic connectedness between the training and validation individuals. Our results showed the potential of implementing genomic selection in small breeding herds.

  8. Kuwaiti population subgroup of nomadic Bedouin ancestry—Whole genome sequence and analysis

    Directory of Open Access Journals (Sweden)

    Sumi Elsa John

    2015-03-01

    Full Text Available Kuwaiti native population comprises three distinct genetic subgroups of Persian, “city-dwelling” Saudi Arabian tribe, and nomadic “tent-dwelling” Bedouin ancestry. Bedouin subgroup is characterized by presence of 17% African ancestry; it owes it origin to nomadic tribes of the deserts of Arabian Peninsula and North Africa. By sequencing whole genome of a Kuwaiti male from this subgroup at 41X coverage, we report 3,752,878 SNPs, 411,839 indels, and 8451 structural variations. Neighbor-joining tree, based on shared variant positions carrying disease-risk alleles between the Bedouin and other continental genomes, places Bedouin genome at the nexus of African, Asian, and European genomes in concordance with geographical location of Kuwait and Peninsula. In congruence with participant's medical history for morbid obesity and bronchial asthma, risk alleles are seen at deleterious SNPs associated with obesity and asthma. Many of the observed deleterious ‘novel’ variants lie in genes associated with autosomal recessive disorders characteristic of the region.

  9. Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits.

    Science.gov (United States)

    Kessner, Darren; Novembre, John

    2015-04-01

    Evolve and resequence studies combine artificial selection experiments with massively parallel sequencing technology to study the genetic basis for complex traits. In these experiments, individuals are selected for extreme values of a trait, causing alleles at quantitative trait loci (QTL) to increase or decrease in frequency in the experimental population. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework that explicitly models whole genomes of individuals, quantitative traits, and selection based on individual trait values. We find that explicitly modeling QTL provides qualitatively different insights than considering independent loci with constant selection coefficients. Specifically, we observe how interference between QTL under selection affects the trajectories and lengthens the fixation times of selected alleles. We also show that a substantial portion of the genetic variance of the trait (50-100%) can be explained by detected QTL in as little as 20 generations of selection, depending on the trait architecture and experimental design. Furthermore, we show that power depends crucially on the opportunity for recombination during the experiment. Finally, we show that an increase in power is obtained by leveraging founder haplotype information to obtain allele frequency estimates.

  10. Whole genome data for omics-based research on the self-fertilizing fish Kryptolebias marmoratus.

    Science.gov (United States)

    Rhee, Jae-Sung; Lee, Jae-Seong

    2014-08-30

    Genome resources have advantages for understanding diverse areas such as biological patterns and functioning of organisms. Omics platforms are useful approaches for the study of organs and organisms. These approaches can be powerful screening tools for whole genome, proteome, and metabolome profiling, and can be used to understand molecular changes in response to internal and external stimuli. This methodology has been applied successfully in freshwater model fish such as the zebrafish Danio rerio and the Japanese medaka Oryzias latipes in research areas such as basic physiology, developmental biology, genetics, and environmental biology. However, information is still scarce about model fish that inhabit brackish water or seawater. To develop the self-fertilizing killifish Kryptolebias marmoratus as a potential model species with unique characteristics and research merits, we obtained genomic information about K. marmoratus. We address ways to use these data for genome-based molecular mechanistic studies. We review the current state of genome information on K. marmoratus to initiate omics approaches. We evaluate the potential applications of integrated omics platforms for future studies in environmental science, developmental biology, and biomedical research. We conclude that information about the K. marmoratus genome will provide a better understanding of the molecular functions of genes, proteins, and metabolites that are involved in the biological functions of this species. Omics platforms, particularly combined technologies that make effective use of bioinformatics, will provide powerful tools for hypothesis-driven investigations and discovery-driven discussions on diverse aspects of this species and on fish and vertebrates in general.

  11. Whole genome sequencing of Gir cattle for identifying polymorphisms and loci under selection.

    Science.gov (United States)

    Liao, Xiaoping; Peng, Fred; Forni, Selma; McLaren, David; Plastow, Graham; Stothard, Paul

    2013-10-01

    Genetic variation in Gir cattle (Bos indicus) has so far not been well characterized. In this study, we used whole genome sequencing of three Gir bulls and a pooled sample from another 11 bulls to identify polymorphisms and loci under selection. A total of 9 990 733 single nucleotide polymorphisms (SNPs) and 604 308 insertion/deletions (indels) were discovered in Gir samples, of which 62.34% and 83.62%, respectively, are previously unknown. Moreover, we detected 79 putative selective sweeps using the sequence data of the pooled sample. One of the most striking sweeps harbours several genes belonging to the cathelicidin gene family, such as CAMP, CATHL1, CATHL2, and CATHL3, which are related to pathogen- and parasite-resistance. Another interesting region harbours genes encoding mitogen-activated protein kinases, which are involved in directing cellular responses to a variety of stimuli, such as osmotic stress and heat shock. These findings are particularly interesting because Gir is resistant to hot temperatures and tropical diseases. This initial selective sweep analysis of Gir cattle has revealed a number of loci that could be important for their adaptation to tropical climates.

  12. New Perspectives on Microbial Community Distortion after Whole-Genome Amplification

    Science.gov (United States)

    DeSantis, Todd Z.; Santo Domingo, Jorge W.; Ashbolt, Nicholas

    2015-01-01

    Whole-genome amplification (WGA) has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the effects of WGA on 31 different microbial communities from five biotopes that also included low-biomass samples from drinking water and groundwater. Our findings provide evidence that microbiome segregation by biotope was possible despite WGA treatment. Nevertheless, samples from different biotopes revealed different levels of distortion, with genomic GC content significantly correlated with WGA perturbation. Certain phylogenetic clades revealed a homogenous trend across various sample types, for instance Alpha- and Betaproteobacteria showed a decrease in their abundance after WGA treatment. On the other hand, Enterobacteriaceae, an important biomarker group for fecal contamination in groundwater and drinking water, were strongly affected by WGA treatment without a predictable pattern. These novel results describe the impact of WGA on low-biomass samples and may highlight issues to be aware of when designing future metagenomic studies that necessitate preceding WGA treatment. PMID:26010362

  13. Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank; Platt, Darren

    2006-02-06

    The DOE Joint Genome Institute has sequenced over 50 eukaryotic genomes, ranging in size from 15 MB to 1.6 GB, over a wide range of organism types. In the course of doing so, it has become clear that a substantial fraction of these data sets contains bonus organisms, usually prokaryotes, in addition to the desired genome. While some of these additional organisms are extraneous contamination, they are sometimes symbionts, and so can be of biological interest. Therefore, it is desirable to assemble the bonus organisms along with the main genome. This transforms the problem into one of metagenomic assembly, which is considerably more challenging than traditional whole-genome shotgun (WGS) assembly. The different organisms will usually be present at different sequence depths, which is difficult to handle in most WGS assemblers. In addition, with multiple distinct genomes present, chimerism can produce cross-organism combinations. Finally, there is no guarantee that only a single bonus organism will be present. For example, one JGI project contained at least two different prokaryotic contaminants, plus a 145 KB plasmid of unknown origin. We have developed techniques to routinely identify and handle such bonus organisms in a high-throughput sequencing environment. Approaches include screening and partitioning the unassembled data, and iterative subassemblies. These methods are applicable not only to bonus organisms, but also to desired components such as organelles. These procedures have the additional benefit of identifying, and allowing for the removal of, cloning artifacts such as E.coli and spurious vector inclusions.

  14. Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks

    Science.gov (United States)

    Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.

    2017-05-01

    Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.

  15. Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Margaret Staton

    Full Text Available Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence.

  16. RepARK--de novo creation of repeat libraries from whole-genome NGS reads.

    Science.gov (United States)

    Koch, Philipp; Platzer, Matthias; Downie, Bryan R

    2014-05-01

    Generation of repeat libraries is a critical step for analysis of complex genomes. In the era of next-generation sequencing (NGS), such libraries are usually produced using a whole-genome shotgun (WGS) derived reference sequence whose completeness greatly influences the quality of derived repeat libraries. We describe here a de novo repeat assembly method--RepARK (Repetitive motif detection by Assembly of Repetitive K-mers)--which avoids potential biases by using abundant k-mers of NGS WGS reads without requiring a reference genome. For validation, repeat consensuses derived from simulated and real Drosophila melanogaster NGS WGS reads were compared to repeat libraries generated by four established methods. RepARK is orders of magnitude faster than the other methods and generates libraries that are: (i) composed almost entirely of repetitive motifs, (ii) more comprehensive and (iii) almost completely annotated by TEclass. Additionally, we show that the RepARK method is applicable to complex genomes like human and can even serve as a diagnostic tool to identify repetitive sequences contaminating NGS datasets.

  17. A novel strategy for clustering major depression individuals using whole-genome sequencing variant data

    Science.gov (United States)

    Yu, Chenglong; Baune, Bernhard T.; Licinio, Julio; Wong, Ma-Li

    2017-01-01

    Major depressive disorder (MDD) is highly prevalent, resulting in an exceedingly high disease burden. The identification of generic risk factors could lead to advance prevention and therapeutics. Current approaches examine genotyping data to identify specific variations between cases and controls. Compared to genotyping, whole-genome sequencing (WGS) allows for the detection of private mutations. In this proof-of-concept study, we establish a conceptually novel computational approach that clusters subjects based on the entirety of their WGS. Those clusters predicted MDD diagnosis. This strategy yielded encouraging results, showing that depressed Mexican-American participants were grouped closer; in contrast ethnically-matched controls grouped away from MDD patients. This implies that within the same ancestry, the WGS data of an individual can be used to check whether this individual is within or closer to MDD subjects or to controls. We propose a novel strategy to apply WGS data to clinical medicine by facilitating diagnosis through genetic clustering. Further studies utilising our method should examine larger WGS datasets on other ethnical groups. PMID:28287625

  18. Insights into three whole-genome duplications gleaned from the Paramecium caudatum genome sequence.

    Science.gov (United States)

    McGrath, Casey L; Gout, Jean-Francois; Doak, Thomas G; Yanagi, Akira; Lynch, Michael

    2014-08-01

    Paramecium has long been a model eukaryote. The sequence of the Paramecium tetraurelia genome reveals a history of three successive whole-genome duplications (WGDs), and the sequences of P. biaurelia and P. sexaurelia suggest that these WGDs are shared by all members of the aurelia species complex. Here, we present the genome sequence of P. caudatum, a species closely related to the P. aurelia species group. P. caudatum shares only the most ancient of the three WGDs with the aurelia complex. We found that P. caudatum maintains twice as many paralogs from this early event as the P. aurelia species, suggesting that post-WGD gene retention is influenced by subsequent WGDs and supporting the importance of selection for dosage in gene retention. The availability of P. caudatum as an outgroup allows an expanded analysis of the aurelia intermediate and recent WGD events. Both the Guanine+Cytosine (GC) content and the expression level of preduplication genes are significant predictors of duplicate retention. We find widespread asymmetrical evolution among aurelia paralogs, which is likely caused by gradual pseudogenization rather than by neofunctionalization. Finally, cases of divergent resolution of intermediate WGD duplicates between aurelia species implicate this process acts as an ongoing reinforcement mechanism of reproductive isolation long after a WGD event.

  19. Whole genome grey and white matter DNA methylation profiles in dorsolateral prefrontal cortex.

    Science.gov (United States)

    Sanchez-Mut, Jose Vicente; Heyn, Holger; Vidal, Enrique; Delgado-Morales, Raúl; Moran, Sebastian; Sayols, Sergi; Sandoval, Juan; Ferrer, Isidre; Esteller, Manel; Gräff, Johannes

    2017-01-20

    The brain's neocortex is anatomically organized into grey and white matter, which are mainly composed by neuronal and glial cells, respectively. The neocortex can be further divided in different Brodmann areas according to their cytoarchitectural organization, which are associated with distinct cortical functions. There is increasing evidence that brain development and function are governed by epigenetic processes, yet their contribution to the functional organization of the neocortex remains incompletely understood. Herein, we determined the DNA methylation patterns of grey and white matter of dorsolateral prefrontal cortex (Brodmann area 9), an important region for higher cognitive skills that is particularly affected in various neurological diseases. For avoiding interindividual differences, we analyzed white and grey matter from the same donor using whole genome bisulfite sequencing, and for validating their biological significance, we used Infinium HumanMethylation450 BeadChip and pyrosequencing in ten and twenty independent samples, respectively. The combination of these analysis indicated robust grey-white matter differences in DNA methylation. What is more, cell type-specific markers were enriched among the most differentially methylated genes. Interestingly, we also found an outstanding number of grey-white matter differentially methylated genes that have previously been associated with Alzheimer's, Parkinson's, and Huntington's disease, as well as Multiple and Amyotrophic lateral sclerosis. The data presented here thus constitute an important resource for future studies not only to gain insight into brain regional as well as grey and white matter differences, but also to unmask epigenetic alterations that might underlie neurological and neurodegenerative diseases.

  20. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.

    Science.gov (United States)

    Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G

    2012-12-07

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.

  1. Whole-Genome Sequencing of Native Sheep Provides Insights into Rapid Adaptations to Extreme Environments.

    Science.gov (United States)

    Yang, Ji; Li, Wen-Rong; Lv, Feng-Hua; He, San-Gang; Tian, Shi-Lin; Peng, Wei-Feng; Sun, Ya-Wei; Zhao, Yong-Xin; Tu, Xiao-Long; Zhang, Min; Xie, Xing-Long; Wang, Yu-Tao; Li, Jin-Quan; Liu, Yong-Gang; Shen, Zhi-Qiang; Wang, Feng; Liu, Guang-Jian; Lu, Hong-Feng; Kantanen, Juha; Han, Jian-Lin; Li, Meng-Hua; Liu, Ming-Jun

    2016-10-01

    Global climate change has a significant effect on extreme environments and a profound influence on species survival. However, little is known of the genome-wide pattern of livestock adaptations to extreme environments over a short time frame following domestication. Sheep (Ovis aries) have become well adapted to a diverse range of agroecological zones, including certain extreme environments (e.g., plateaus and deserts), during their post-domestication (approximately 8-9 kya) migration and differentiation. Here, we generated whole-genome sequences from 77 native sheep, with an average effective sequencing depth of ∼5× for 75 samples and ∼42× for 2 samples. Comparative genomic analyses among sheep in contrasting environments, that is, plateau (>4,000 m above sea level) versus lowland (1500 m) versus low-altitude region (600 mm), and arid zone (400 mm), detected a novel set of candidate genes as well as pathways and GO categories that are putatively associated with hypoxia responses at high altitudes and water reabsorption in arid environments. In addition, candidate genes and GO terms functionally related to energy metabolism and body size variations were identified. This study offers novel insights into rapid genomic adaptations to extreme environments in sheep and other animals, and provides a valuable resource for future research on livestock breeding in response to climate change.

  2. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing

    Science.gov (United States)

    Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J.; Szatkiewicz, Jin P.

    2015-01-01

    Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. PMID:25883151

  3. Sensitive and specific KRAS somatic mutation analysis on whole-genome amplified DNA from archival tissues.

    Science.gov (United States)

    van Eijk, Ronald; van Puijenbroek, Marjo; Chhatta, Amiet R; Gupta, Nisha; Vossen, Rolf H A M; Lips, Esther H; Cleton-Jansen, Anne-Marie; Morreau, Hans; van Wezel, Tom

    2010-01-01

    Kirsten RAS (KRAS) is a small GTPase that plays a key role in Ras/mitogen-activated protein kinase signaling; somatic mutations in KRAS are frequently found in many cancers. The most common KRAS mutations result in a constitutively active protein. Accurate detection of KRAS mutations is pivotal to the molecular diagnosis of cancer and may guide proper treatment selection. Here, we describe a two-step KRAS mutation screening protocol that combines whole-genome amplification (WGA), high-resolution melting analysis (HRM) as a prescreen method for mutation carrying samples, and direct Sanger sequencing of DNA from formalin-fixed, paraffin-embedded (FFPE) tissue, from which limited amounts of DNA are available. We developed target-specific primers, thereby avoiding amplification of homologous KRAS sequences. The addition of herring sperm DNA facilitated WGA in DNA samples isolated from as few as 100 cells. KRAS mutation screening using high-resolution melting analysis on wgaDNA from formalin-fixed, paraffin-embedded tissue is highly sensitive and specific; additionally, this method is feasible for screening of clinical specimens, as illustrated by our analysis of pancreatic cancers. Furthermore, PCR on wgaDNA does not introduce genotypic changes, as opposed to unamplified genomic DNA. This method can, after validation, be applied to virtually any potentially mutated region in the genome.

  4. Whole-Genome Enrichment Provides Deep Insights into Vibrio cholerae Metagenome from an African River.

    Science.gov (United States)

    Vezzulli, L; Grande, C; Tassistro, G; Brettar, I; Höfle, M G; Pereira, R P A; Mushi, D; Pallavicini, A; Vassallo, P; Pruzzo, C

    2017-04-01

    The detection and typing of Vibrio cholerae in natural aquatic environments encounter major methodological challenges related to the fact that the bacterium is often present in environmental matrices at very low abundance in nonculturable state. This study applied, for the first time to our knowledge, a whole-genome enrichment (WGE) and next-generation sequencing (NGS) approach for direct genotyping and metagenomic analysis of low abundant V. cholerae DNA (cholerae metagenomic DNA via hybridization. An enriched V. cholerae metagenome library was generated and sequenced on an Illumina MiSeq platform. Up to 1.8 × 10(7) bp (4.5× mean read depth) were found to map against V. cholerae reference genome sequences representing an increase of about 2500 times in target DNA coverage compared to theoretical calculations of performance for shotgun metagenomics. Analysis of metagenomic data revealed the presence of several V. cholerae virulence and virulence associated genes in river water including major virulence regions (e.g. CTX prophage and Vibrio pathogenicity island-1) and genetic markers of epidemic strains (e.g. O1-antigen biosynthesis gene cluster) that were not detectable by standard culture and molecular techniques. Overall, besides providing a powerful tool for direct genotyping of V. cholerae in complex environmental matrices, this study provides a 'proof of concept' on the methodological gap that might currently preclude a more comprehensive understanding of toxigenic V. cholerae emergence from natural aquatic environments.

  5. Whole genome amplification and de novo assembly of single bacterial cells.

    Directory of Open Access Journals (Sweden)

    Sébastien Rodrigue

    Full Text Available BACKGROUND: Single-cell genome sequencing has the potential to allow the in-depth exploration of the vast genetic diversity found in uncultured microbes. We used the marine cyanobacterium Prochlorococcus as a model system for addressing important challenges facing high-throughput whole genome amplification (WGA and complete genome sequencing of individual cells. METHODOLOGY/PRINCIPAL FINDINGS: We describe a pipeline that enables single-cell WGA on hundreds of cells at a time while virtually eliminating non-target DNA from the reactions. We further developed a post-amplification normalization procedure that mitigates extreme variations in sequencing coverage associated with multiple displacement amplification (MDA, and demonstrated that the procedure increased sequencing efficiency and facilitated genome assembly. We report genome recovery as high as 99.6% with reference-guided assembly, and 95% with de novo assembly starting from a single cell. We also analyzed the impact of chimera formation during MDA on de novo assembly, and discuss strategies to minimize the presence of incorrectly joined regions in contigs. CONCLUSIONS/SIGNIFICANCE: The methods describe in this paper will be useful for sequencing genomes of individual cells from a variety of samples.

  6. Novel Altered Region for Biomarker Discovery in Hepatocellular Carcinoma (HCC Using Whole Genome SNP Array

    Directory of Open Access Journals (Sweden)

    Esraa M. Hashem

    2016-04-01

    Full Text Available cancer represents one of the greatest medical causes of mortality. The majority of Hepatocellular carcinoma arises from the accumulation of genetic abnormalities, and possibly induced by exterior etiological factors especially HCV and HBV infections. There is a need for new tools to analysis the large sum of data to present relevant genetic changes that may be critical for both understanding how cancers develop and determining how they could ultimately be treated. Gene expression profiling may lead to new biomarkers that may help develop diagnostic accuracy for detecting Hepatocellular carcinoma. In this work, statistical technique (discrete stationary wavelet transform for detection of copy number alternations to analysis high-density single-nucleotide polymorphism array of 30 cell lines on specific chromosomes, which are frequently detected in Hepatocellular carcinoma have been proposed. The results demonstrate the feasibility of whole-genome fine mapping of copy number alternations via high-density single-nucleotide polymorphism genotyping, Results revealed that a novel altered chromosomal region is discovered; region amplification (4q22.1 have been detected in 22 out of 30-Hepatocellular carcinoma cell lines (73%. This region strike, AFF1 and DSPP, tumor suppressor genes. This finding has not previously reported to be involved in liver carcinogenesis; it can be used to discover a new HCC biomarker, which helps in a better understanding of hepatocellular carcinoma.

  7. Molecular analysis of single oocyst of Eimeria by whole genome amplification (WGA) based nested PCR.

    Science.gov (United States)

    Wang, Yunzhou; Tao, Geru; Cui, Yujuan; Lv, Qiyao; Xie, Li; Li, Yuan; Suo, Xun; Qin, Yinghe; Xiao, Lihua; Liu, Xianyong

    2014-09-01

    PCR-based molecular tools are widely used for the identification and characterization of protozoa. Here we report the molecular analysis of Eimeria species using combined methods of whole genome amplification (WGA) and nested PCR. Single oocyst of Eimeria stiedai or Eimeriamedia was directly used for random amplification of the genomic DNA with either primer extension preamplification (PEP) or multiple displacement amplification (MDA), and then the WGA product was used as template in nested PCR with species-specific primers for ITS-1, 18S rDNA and 23S rDNA of E. stiedai and E. media. WGA-based PCR was successful for the amplification of these genes from single oocyst. For the species identification of single oocyst isolated from mixed E. stiedai or E. media, the results from WGA-based PCR were exactly in accordance with those from morphological identification, suggesting the availability of this method in molecular analysis of eimerian parasites at the single oocyst level. WGA-based PCR method can also be applied for the identification and genetic characterization of other protists.

  8. Paired tumor and normal whole genome sequencing of metastatic olfactory neuroblastoma.

    Directory of Open Access Journals (Sweden)

    Glen J Weiss

    Full Text Available BACKGROUND: Olfactory neuroblastoma (ONB is a rare cancer of the sinonasal tract with little molecular characterization. We performed whole genome sequencing (WGS on paired normal and tumor DNA from a patient with metastatic-ONB to identify the somatic alterations that might be drivers of tumorigenesis and/or metastatic progression. METHODOLOGY/PRINCIPAL FINDINGS: Genomic DNA was isolated from fresh frozen tissue from a metastatic lesion and whole blood, followed by WGS at >30X depth, alignment and mapping, and mutation analyses. Sanger sequencing was used to confirm selected mutations. Sixty-two somatic short nucleotide variants (SNVs and five deletions were identified inside coding regions, each causing a non-synonymous DNA sequence change. We selected seven SNVs and validated them by Sanger sequencing. In the metastatic ONB samples collected several months prior to WGS, all seven mutations were present. However, in the original surgical resection specimen (prior to evidence of metastatic disease, mutations in KDR, MYC, SIN3B, and NLRC4 genes were not present, suggesting that these were acquired with disease progression and/or as a result of post-treatment effects. CONCLUSIONS/SIGNIFICANCE: This work provides insight into the evolution of ONB cancer cells and provides a window into the more complex factors, including tumor clonality and multiple driver mutations.

  9. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

    Directory of Open Access Journals (Sweden)

    Ruibang Luo

    2014-06-01

    Full Text Available This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels, BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads, or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  10. Mycobacterial DNA extraction for whole-genome sequencing from early positive liquid (MGIT) cultures.

    Science.gov (United States)

    Votintseva, Antonina A; Pankhurst, Louise J; Anson, Luke W; Morgan, Marcus R; Gascoyne-Binzi, Deborah; Walker, Timothy M; Quan, T Phuong; Wyllie, David H; Del Ojo Elias, Carlos; Wilcox, Mark; Walker, A Sarah; Peto, Tim E A; Crook, Derrick W

    2015-04-01

    We developed a low-cost and reliable method of DNA extraction from as little as 1 ml of early positive mycobacterial growth indicator tube (MGIT) cultures that is suitable for whole-genome sequencing to identify mycobacterial species and predict antibiotic resistance in clinical samples. The DNA extraction method is based on ethanol precipitation supplemented by pretreatment steps with a MolYsis kit or saline wash for the removal of human DNA and a final DNA cleanup step with solid-phase reversible immobilization beads. The protocol yielded ≥0.2 ng/μl of DNA for 90% (MolYsis kit) and 83% (saline wash) of positive MGIT cultures. A total of 144 (94%) of the 154 samples sequenced on the MiSeq platform (Illumina) achieved the target of 1 million reads, with 90% coverage achieved. The DNA extraction protocol, therefore, will facilitate fast and accurate identification of mycobacterial species and resistance using a range of bioinformatics tools. Copyright © 2015, Votintseva et al.

  11. Digital Droplet Multiple Displacement Amplification (ddMDA for Whole Genome Sequencing of Limited DNA Samples.

    Directory of Open Access Journals (Sweden)

    Minsoung Rhee

    Full Text Available Multiple displacement amplification (MDA is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently, the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet, ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology.

  12. Whole Genome Sequencing of Field Isolates Reveals Extensive Genetic Diversity in Plasmodium vivax from Colombia.

    Science.gov (United States)

    Winter, David J; Pacheco, M Andreína; Vallejo, Andres F; Schwartz, Rachel S; Arevalo-Herrera, Myriam; Herrera, Socrates; Cartwright, Reed A; Escalante, Ananias A

    2015-12-01

    Plasmodium vivax is the most prevalent malarial species in South America and exerts a substantial burden on the populations it affects. The control and eventual elimination of P. vivax are global health priorities. Genomic research contributes to this objective by improving our understanding of the biology of P. vivax and through the development of new genetic markers that can be used to monitor efforts to reduce malaria transmission. Here we analyze whole-genome data from eight field samples from a region in Cordóba, Colombia where malaria is endemic. We find considerable genetic diversity within this population, a result that contrasts with earlier studies suggesting that P. vivax had limited diversity in the Americas. We also identify a selective sweep around a substitution known to confer resistance to sulphadoxine-pyrimethamine (SP). This is the first observation of a selective sweep for SP resistance in this species. These results indicate that P. vivax has been exposed to SP pressure even when the drug is not in use as a first line treatment for patients afflicted by this parasite. We identify multiple non-synonymous substitutions in three other genes known to be involved with drug resistance in Plasmodium species. Finally, we found extensive microsatellite polymorphisms. Using this information we developed 18 polymorphic and easy to score microsatellite loci that can be used in epidemiological investigations in South America.

  13. Whole-genome duplications spurred the functional diversification of the globin gene superfamily in vertebrates.

    Science.gov (United States)

    Hoffmann, Federico G; Opazo, Juan C; Storz, Jay F

    2012-01-01

    It has been hypothesized that two successive rounds of whole-genome duplication (WGD) in the stem lineage of vertebrates provided genetic raw materials for the evolutionary innovation of many vertebrate-specific features. However, it has seldom been possible to trace such innovations to specific functional differences between paralogous gene products that derive from a WGD event. Here, we report genomic evidence for a direct link between WGD and key physiological innovations in the vertebrate oxygen transport system. Specifically, we demonstrate that key globin proteins that evolved specialized functions in different aspects of oxidative metabolism (hemoglobin, myoglobin, and cytoglobin) represent paralogous products of two WGD events in the vertebrate common ancestor. Analysis of conserved macrosynteny between the genomes of vertebrates and amphioxus (subphylum Cephalochordata) revealed that homologous chromosomal segments defined by myoglobin + globin-E, cytoglobin, and the α-globin gene cluster each descend from the same linkage group in the reconstructed proto-karyotype of the chordate common ancestor. The physiological division of labor between the oxygen transport function of hemoglobin and the oxygen storage function of myoglobin played a pivotal role in the evolution of aerobic energy metabolism, supporting the hypothesis that WGDs helped fuel key innovations in vertebrate evolution.

  14. Automated whole-genome multiple alignment of rat, mouse, and human

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  15. Inference of gorilla demographic and selective history from whole-genome sequence data.

    Science.gov (United States)

    McManus, Kimberly F; Kelley, Joanna L; Song, Shiya; Veeramah, Krishna R; Woerner, August E; Stevison, Laurie S; Ryder, Oliver A; Ape Genome Project, Great; Kidd, Jeffrey M; Wall, Jeffrey D; Bustamante, Carlos D; Hammer, Michael F

    2015-03-01

    Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection.

  16. Whole-genome duplication and molecular evolution in Cornus L. (Cornaceae) – Insights from transcriptome sequences

    Science.gov (United States)

    Yu, Yan; Xiang, Qiuyun; Manos, Paul S.; Soltis, Douglas E.; Soltis, Pamela S.; Song, Bao-Hua; Cheng, Shifeng; Liu, Xin; Wong, Gane

    2017-01-01

    The pattern and rate of genome evolution have profound consequences in organismal evolution. Whole-genome duplication (WGD), or polyploidy, has been recognized as an important evolutionary mechanism of plant diversification. However, in non-model plants the molecular signals of genome duplications have remained largely unexplored. High-throughput transcriptome data from next-generation sequencing have set the stage for novel investigations of genome evolution using new bioinformatic and methodological tools in a phylogenetic framework. Here we compare ten de novo-assembled transcriptomes representing the major lineages of the angiosperm genus Cornus (dogwood) and relevant outgroups using a customized pipeline for analyses. Using three distinct approaches, molecular dating of orthologous genes, analyses of the distribution of synonymous substitutions between paralogous genes, and examination of substitution rates through time, we detected a shared WGD event in the late Cretaceous across all taxa sampled. The inferred doubling event coincides temporally with the paleoclimatic changes associated with the initial divergence of the genus into three major lineages. Analyses also showed an acceleration of rates of molecular evolution after WGD. The highest rates of molecular evolution were observed in the transcriptome of the herbaceous lineage, C. canadensis, a species commonly found at higher latitudes, including the Arctic. Our study demonstrates the value of transcriptome data for understanding genome evolution in closely related species. The results suggest dramatic increase in sea surface temperature in the late Cretaceous may have contributed to the evolution and diversification of flowering plants. PMID:28225773

  17. Whole-genome analyses of Korean native and Holstein cattle breeds by massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Jung-Woo Choi

    Full Text Available A main goal of cattle genomics is to identify DNA differences that account for variations in economically important traits. In this study, we performed whole-genome analyses of three important cattle breeds in Korea--Hanwoo, Jeju Heugu, and Korean Holstein--using the Illumina HiSeq 2000 sequencing platform. We achieved 25.5-, 29.6-, and 29.5-fold coverage of the Hanwoo, Jeju Heugu, and Korean Holstein genomes, respectively, and identified a total of 10.4 million single nucleotide polymorphisms (SNPs, of which 54.12% were found to be novel. We also detected 1,063,267 insertions-deletions (InDels across the genomes (78.92% novel. Annotations of the datasets identified a total of 31,503 nonsynonymous SNPs and 859 frameshift InDels that could affect phenotypic variations in traits of interest. Furthermore, genome-wide copy number variation regions (CNVRs were detected by comparing the Hanwoo, Jeju Heugu, and previously published Chikso genomes against that of Korean Holstein. A total of 992, 284, and 1881 CNVRs, respectively, were detected throughout the genome. Moreover, 53, 65, 45, and 82 putative regions of homozygosity (ROH were identified in Hanwoo, Jeju Heugu, Chikso, and Korean Holstein respectively. The results of this study provide a valuable foundation for further investigations to dissect the molecular mechanisms underlying variation in economically important traits in cattle and to develop genetic markers for use in cattle breeding.

  18. Whole genome sequencing reveals a de novo SHANK3 mutation in familial autism spectrum disorder.

    Directory of Open Access Journals (Sweden)

    Sergio I Nemirovsky

    Full Text Available Clinical genomics promise to be especially suitable for the study of etiologically heterogeneous conditions such as Autism Spectrum Disorder (ASD. Here we present three siblings with ASD where we evaluated the usefulness of Whole Genome Sequencing (WGS for the diagnostic approach to ASD.We identified a family segregating ASD in three siblings with an unidentified cause. We performed WGS in the three probands and used a state-of-the-art comprehensive bioinformatic analysis pipeline and prioritized the identified variants located in genes likely to be related to ASD. We validated the finding by Sanger sequencing in the probands and their parents.Three male siblings presented a syndrome characterized by severe intellectual disability, absence of language, autism spectrum symptoms and epilepsy with negative family history for mental retardation, language disorders, ASD or other psychiatric disorders. We found germline mosaicism for a heterozygous deletion of a cytosine in the exon 21 of the SHANK3 gene, resulting in a missense sequence of 5 codons followed by a premature stop codon (NM_033517:c.3259_3259delC, p.Ser1088Profs*6.We reported an infrequent form of familial ASD where WGS proved useful in the clinic. We identified a mutation in SHANK3 that underscores its relevance in Autism Spectrum Disorder.

  19. Whole genome nucleosome sequencing identifies novel types of forensic markers in degraded DNA samples

    Science.gov (United States)

    Dong, Chun-nan; Yang, Ya-dong; Li, Shu-jin; Yang, Ya-ran; Zhang, Xiao-jing; Fang, Xiang-dong; Yan, Jiang-wei; Cong, Bin

    2016-01-01

    In the case of mass disasters, missing persons and forensic caseworks, highly degraded biological samples are often encountered. It can be a challenge to analyze and interpret the DNA profiles from these samples. Here we provide a new strategy to solve the problem by taking advantage of the intrinsic structural properties of DNA. We have assessed the in vivo positions of more than 35 million putative nucleosome cores in human leukocytes using high-throughput whole genome sequencing, and identified 2,462 single nucleotide variations (SNVs), 128 insertion-deletion polymorphisms (indels). After comparing the sequence reads with 44 STR loci commonly used in forensics, five STRs (TH01, TPOX, D18S51, DYS391, and D10S1248)were matched. We compared these “nucleosome protected STRs” (NPSTRs) with five other non-NPSTRs using mini-STR primer design, real-time PCR, and capillary gel electrophoresis on artificially degraded DNA. Moreover, genotyping performance of the five NPSTRs and five non-NPSTRs was also tested with real casework samples. All results show that loci located in nucleosomes are more likely to be successfully genotyped in degraded samples. In conclusion, after further strict validation, these markers could be incorporated into future forensic and paleontology identification kits, resulting in higher discriminatory power for certain degraded sample types. PMID:27189082

  20. The "most wanted" taxa from the human microbiome for whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Anthony A Fodor

    Full Text Available The goal of the Human Microbiome Project (HMP is to generate a comprehensive catalog of human-associated microorganisms including reference genomes representing the most common species. Toward this goal, the HMP has characterized the microbial communities at 18 body habitats in a cohort of over 200 healthy volunteers using 16S rRNA gene (16S sequencing and has generated nearly 1,000 reference genomes from human-associated microorganisms. To determine how well current reference genome collections capture the diversity observed among the healthy microbiome and to guide isolation and future sequencing of microbiome members, we compared the HMP's 16S data sets to several reference 16S collections to create a 'most wanted' list of taxa for sequencing. Our analysis revealed that the diversity of commonly occurring taxa within the HMP cohort microbiome is relatively modest, few novel taxa are represented by these OTUs and many common taxa among HMP volunteers recur across different populations of healthy humans. Taken together, these results suggest that it should be possible to perform whole-genome sequencing on a large fraction of the human microbiome, including the 'most wanted', and that these sequences should serve to support microbiome studies across multiple cohorts. Also, in stark contrast to other taxa, the 'most wanted' organisms are poorly represented among culture collections suggesting that novel culture- and single-cell-based methods will be required to isolate these organisms for sequencing.

  1. Use of bacterial whole-genome sequencing to investigate local persistence and spread in bovine tuberculosis

    Directory of Open Access Journals (Sweden)

    Hannah Trewby

    2016-03-01

    Full Text Available Mycobacterium bovis is the causal agent of bovine tuberculosis, one of the most important diseases currently facing the UK cattle industry. Here, we use high-density whole genome sequencing (WGS in a defined sub-population of M. bovis in 145 cattle across 66 herd breakdowns to gain insights into local spread and persistence. We show that despite low divergence among isolates, WGS can in principle expose contributions of under-sampled host populations to M. bovis transmission. However, we demonstrate that in our data such a signal is due to molecular type switching, which had been previously undocumented for M. bovis. Isolates from farms with a known history of direct cattle movement between them did not show a statistical signal of higher genetic similarity. Despite an overall signal of genetic isolation by distance, genetic distances also showed no apparent relationship with spatial distance among affected farms over distances <5 km. Using simulations, we find that even over the brief evolutionary timescale covered by our data, Bayesian phylogeographic approaches are feasible. Applying such approaches showed that M. bovis dispersal in this system is heterogeneous but slow overall, averaging 2 km/year. These results confirm that widespread application of WGS to M. bovis will bring novel and important insights into the dynamics of M. bovis spread and persistence, but that the current questions most pertinent to control will be best addressed using approaches that more directly integrate WGS with additional epidemiological data.

  2. Assembling the 20 Gb white spruce (Picea glauca) genome from whole-genome shotgun sequencing data.

    Science.gov (United States)

    Birol, Inanc; Raymond, Anthony; Jackman, Shaun D; Pleasance, Stephen; Coope, Robin; Taylor, Greg A; Yuen, Macaire Man Saint; Keeling, Christopher I; Brand, Dana; Vandervalk, Benjamin P; Kirk, Heather; Pandoh, Pawan; Moore, Richard A; Zhao, Yongjun; Mungall, Andrew J; Jaquish, Barry; Yanchuk, Alvin; Ritland, Carol; Boyle, Brian; Bousquet, Jean; Ritland, Kermit; Mackay, John; Bohlmann, Jörg; Jones, Steven J M

    2013-06-15

    White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and providing genomics resources for this commercially valuable tree will help improve forest management and conservation efforts. Sequencing and assembling the large and highly repetitive spruce genome though pushes the boundaries of the current technology. Here, we describe a whole-genome shotgun sequencing strategy using two Illumina sequencing platforms and an assembly approach using the ABySS software. We report a 20.8 giga base pairs draft genome in 4.9 million scaffolds, with a scaffold N50 of 20,356 bp. We demonstrate how recent improvements in the sequencing technology, especially increasing read lengths and paired end reads from longer fragments have a major impact on the assembly contiguity. We also note that scalable bioinformatics tools are instrumental in providing rapid draft assemblies. The Picea glauca genome sequencing and assembly data are available through NCBI (Accession#: ALWZ0100000000 PID: PRJNA83435). http://www.ncbi.nlm.nih.gov/bioproject/83435.

  3. Homoeologous chromosomes of Xenopus laevis are highly conserved after whole-genome duplication.

    Science.gov (United States)

    Uno, Y; Nishida, C; Takagi, C; Ueno, N; Matsuda, Y

    2013-11-01

    It has been suggested that whole-genome duplication (WGD) occurred twice during the evolutionary process of vertebrates around 450 and 500 million years ago, which contributed to an increase in the genomic and phenotypic complexities of vertebrates. However, little is still known about the evolutionary process of homoeologous chromosomes after WGD because many duplicate genes have been lost. Therefore, Xenopus laevis (2n=36) and Xenopus (Silurana) tropicalis (2n=20) are good animal models for studying the process of genomic and chromosomal reorganization after WGD because X. laevis is an allotetraploid species that resulted from WGD after the interspecific hybridization of diploid species closely related to X. tropicalis. We constructed a comparative cytogenetic map of X. laevis using 60 complimentary DNA clones that covered the entire chromosomal regions of 10 pairs of X. tropicalis chromosomes. We consequently identified all nine homoeologous chromosome groups of X. laevis. Hybridization signals on two pairs of X. laevis homoeologous chromosomes were detected for 50 of 60 (83%) genes, and the genetic linkage is highly conserved between X. tropicalis and X. laevis chromosomes except for one fusion and one inversion and also between X. laevis homoeologous chromosomes except for two inversions. These results indicate that the loss of duplicated genes and inter- and/or intrachromosomal rearrangements occurred much less frequently in this lineage, suggesting that these events were not essential for diploidization of the allotetraploid genome in X. laevis after WGD.

  4. Whole-genome transcriptional analysis of heavy metal stresses inCaulobacter crescentus

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Ping; Brodie, Eoin L.; Suzuki, Yohey; McAdams, Harley H.; Andersen, Gary L.

    2005-09-21

    The bacterium Caulobacter crescentus and related stalkbacterial species are known for their distinctive ability to live in lownutrient environments, a characteristic of most heavy metal contaminatedsites. Caulobacter crescentus is a model organism for studying cell cycleregulation with well developed genetics. We have identified the pathwaysresponding to heavy metal toxicity in C. crescentus to provide insightsfor possible application of Caulobacter to environmental restoration. Weexposed C. crescentus cells to four heavy metals (chromium, cadmium,selenium and uranium) and analyzed genome wide transcriptional activitiespost exposure using a Affymetrix GeneChip microarray. C. crescentusshowed surprisingly high tolerance to uranium, a possible mechanism forwhich may be formation of extracellular calcium-uranium-phosphateprecipitates. The principal response to these metals was protectionagainst oxidative stress (up-regulation of manganese-dependent superoxidedismutase, sodA). Glutathione S-transferase, thioredoxin, glutaredoxinsand DNA repair enzymes responded most strongly to cadmium and chromate.The cadmium and chromium stress response also focused on reducing theintracellular metal concentration, with multiple efflux pumps employed toremove cadmium while a sulfate transporter was down-regulated to reducenon-specific uptake of chromium. Membrane proteins were also up-regulatedin response to most of the metals tested. A two-component signaltransduction system involved in the uranium response was identified.Several differentially regulated transcripts from regions previously notknown to encode proteins were identified, demonstrating the advantage ofevaluating the transcriptome using whole genome microarrays.

  5. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits.

    Science.gov (United States)

    Tachmazidou, Ioanna; Süveges, Dániel; Min, Josine L; Ritchie, Graham R S; Steinberg, Julia; Walter, Klaudia; Iotchkova, Valentina; Schwartzentruber, Jeremy; Huang, Jie; Memari, Yasin; McCarthy, Shane; Crawford, Andrew A; Bombieri, Cristina; Cocca, Massimiliano; Farmaki, Aliki-Eleni; Gaunt, Tom R; Jousilahti, Pekka; Kooijman, Marjolein N; Lehne, Benjamin; Malerba, Giovanni; Männistö, Satu; Matchan, Angela; Medina-Gomez, Carolina; Metrustry, Sarah J; Nag, Abhishek; Ntalla, Ioanna; Paternoster, Lavinia; Rayner, Nigel W; Sala, Cinzia; Scott, William R; Shihab, Hashem A; Southam, Lorraine; St Pourcain, Beate; Traglia, Michela; Trajanoska, Katerina; Zaza, Gialuigi; Zhang, Weihua; Artigas, María S; Bansal, Narinder; Benn, Marianne; Chen, Zhongsheng; Danecek, Petr; Lin, Wei-Yu; Locke, Adam; Luan, Jian'an; Manning, Alisa K; Mulas, Antonella; Sidore, Carlo; Tybjaerg-Hansen, Anne; Varbo, Anette; Zoledziewska, Magdalena; Finan, Chris; Hatzikotoulas, Konstantinos; Hendricks, Audrey E; Kemp, John P; Moayyeri, Alireza; Panoutsopoulou, Kalliope; Szpak, Michal; Wilson, Scott G; Boehnke, Michael; Cucca, Francesco; Di Angelantonio, Emanuele; Langenberg, Claudia; Lindgren, Cecilia; McCarthy, Mark I; Morris, Andrew P; Nordestgaard, Børge G; Scott, Robert A; Tobin, Martin D; Wareham, Nicholas J; Burton, Paul; Chambers, John C; Smith, George Davey; Dedoussis, George; Felix, Janine F; Franco, Oscar H; Gambaro, Giovanni; Gasparini, Paolo; Hammond, Christopher J; Hofman, Albert; Jaddoe, Vincent W V; Kleber, Marcus; Kooner, Jaspal S; Perola, Markus; Relton, Caroline; Ring, Susan M; Rivadeneira, Fernando; Salomaa, Veikko; Spector, Timothy D; Stegle, Oliver; Toniolo, Daniela; Uitterlinden, André G; Barroso, Inês; Greenwood, Celia M T; Perry, John R B; Walker, Brian R; Butterworth, Adam S; Xue, Yali; Durbin, Richard; Small, Kerrin S; Soranzo, Nicole; Timpson, Nicholas J; Zeggini, Eleftheria

    2017-06-01

    Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader allelic architecture of 12 anthropometric traits associated with height, body mass, and fat distribution in up to 267,616 individuals. We report 106 genome-wide significant signals that have not been previously identified, including 9 low-frequency variants pointing to functional candidates. Of the 106 signals, 6 are in genomic regions that have not been implicated with related traits before, 28 are independent signals at previously reported regions, and 72 represent previously reported signals for a different anthropometric trait. 71% of signals reside within genes and fine mapping resolves 23 signals to one or two likely causal variants. We confirm genetic overlap between human monogenic and polygenic anthropometric traits and find signal enrichment in cis expression QTLs in relevant tissues. Our results highlight the potential of WGS strategies to enhance biologically relevant discoveries across the frequency spectrum. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  6. Comprehensive Analysis of Prokaryotes in Environmental Water Using DNA Microarray Analysis and Whole Genome Amplification

    Directory of Open Access Journals (Sweden)

    Norihisa Ishii

    2013-10-01

    Full Text Available The microflora in environmental water consists of a high density and diversity of bacterial species that form the foundation of the water ecosystem. Because the majority of these species cannot be cultured in vitro, a different approach is needed to identify prokaryotes in environmental water. A novel DNA microarray was developed as a simplified detection protocol. Multiple DNA probes were designed against each of the 97,927 sequences in the DNA Data Bank of Japan and mounted on a glass chip in duplicate. Evaluation of the microarray was performed using the DNA extracted from one liter of environmental water samples collected from seven sites in Japan. The extracted DNA was uniformly amplified using whole genome amplification (WGA, labeled with Cy3-conjugated 16S rRNA specific primers and hybridized to the microarray. The microarray successfully identified soil bacteria and environment-specific bacteria clusters. The DNA microarray described herein can be a useful tool in evaluating the diversity of prokaryotes and assessing environmental changes such as global warming.

  7. Multidrug-resistant Escherichia coli soft tissue infection investigated with bacterial whole genome sequencing

    Science.gov (United States)

    Buchanan, Ruaridh; Stoesser, Nicole; Crook, Derrick; Bowler, Ian C J W

    2014-01-01

    A 45-year-old man with dilated cardiomyopathy presented with acute leg pain and erythema suggestive of necrotising fasciitis. Initial surgical exploration revealed no necrosis and treatment for a soft tissue infection was started. Blood and tissue cultures unexpectedly grew a Gram-negative bacillus, subsequently identified by an automated broth microdilution phenotyping system as an extended-spectrum β-lactamase producing Escherichia coli. The patient was treated with a 3-week course of antibiotics (ertapenem followed by ciprofloxacin) and debridement for small areas of necrosis, followed by skin grafting. The presence of E. coli triggered investigation of both host and pathogen. The patient was found to have previously undiagnosed liver disease, a risk factor for E. coli soft tissue infection. Whole genome sequencing of isolates from all specimens confirmed they were clonal, of sequence type ST131 and associated with a likely plasmid-associated AmpC (CMY-2), several other resistance genes and a number of virulence factors. PMID:25331151

  8. Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

    Science.gov (United States)

    Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

    2016-01-01

    Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335

  9. Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data.

    Science.gov (United States)

    Tsuji, Junko; Weng, Zhiping

    2016-11-01

    Cytosine methylation regulates many biological processes such as gene expression, chromatin structure and chromosome stability. The whole genome bisulfite sequencing (WGBS) technique measures the methylation level at each cytosine throughout the genome. There are an increasing number of publicly available pipelines for analyzing WGBS data, reflecting many choices of read mapping algorithms as well as preprocessing and postprocessing methods. We simulated single-end and paired-end reads based on three experimental data sets, and comprehensively evaluated 192 combinations of three preprocessing, five postprocessing and five widely used read mapping algorithms. We also compared paired-end data with single-end data at the same sequencing depth for performance of read mapping and methylation level estimation. Bismark and LAST were the most robust mapping algorithms. We found that Mott trimming and quality filtering individually improved the performance of both read mapping and methylation level estimation, but combining them did not lead to further improvement. Furthermore, we confirmed that paired-end sequencing reduced error rate and enhanced sensitivity for both read mapping and methylation level estimation, especially for short reads and in repetitive regions of the human genome.

  10. Whole-genome sequencing of a laboratory-evolved yeast strain

    Directory of Open Access Journals (Sweden)

    Dunham Maitreya J

    2010-02-01

    Full Text Available Abstract Background Experimental evolution of microbial populations provides a unique opportunity to study evolutionary adaptation in response to controlled selective pressures. However, until recently it has been difficult to identify the precise genetic changes underlying adaptation at a genome-wide scale. New DNA sequencing technologies now allow the genome of parental and evolved strains of microorganisms to be rapidly determined. Results We sequenced >93.5% of the genome of a laboratory-evolved strain of the yeast Saccharomyces cerevisiae and its ancestor at >28× depth. Both single nucleotide polymorphisms and copy number amplifications were found, with specific gains over array-based methodologies previously used to analyze these genomes. Applying a segmentation algorithm to quantify structural changes, we determined the approximate genomic boundaries of a 5× gene amplification. These boundaries guided the recovery of breakpoint sequences, which provide insights into the nature of a complex genomic rearrangement. Conclusions This study suggests that whole-genome sequencing can provide a rapid approach to uncover the genetic basis of evolutionary adaptations, with further applications in the study of laboratory selections and mutagenesis screens. In addition, we show how single-end, short read sequencing data can provide detailed information about structural rearrangements, and generate predictions about the genomic features and processes that underlie genome plasticity.

  11. Whole genome sequence analysis of Cryptococcus gattii from the Pacific Northwest reveals unexpected diversity.

    Directory of Open Access Journals (Sweden)

    John D Gillece

    Full Text Available A recent emergence of Cryptococcus gattii in the Pacific Northwest involves strains that fall into three primarily clonal molecular subtypes: VGIIa, VGIIb and VGIIc. Multilocus sequence typing (MLST and variable number tandem repeat analysis appear to identify little diversity within these molecular subtypes. Given the apparent expansion of these subtypes into new geographic areas and their ability to cause disease in immunocompetent individuals, differentiation of isolates belonging to these subtypes could be very important from a public health perspective. We used whole genome sequence typing (WGST to perform fine-scale phylogenetic analysis on 20 C. gattii isolates, 18 of which are from the VGII molecular type largely responsible for the Pacific Northwest emergence. Analysis both including and excluding (289,586 SNPs and 56,845 SNPs, respectively molecular types VGI and VGIII isolates resulted in phylogenetic reconstructions consistent, for the most part, with MLST analysis but with far greater resolution among isolates. The WGST analysis presented here resulted in identification of over 100 SNPs among eight VGIIc isolates as well as unique genotypes for each of the VGIIa, VGIIb and VGIIc isolates. Similar levels of genetic diversity were found within each of the molecular subtype isolates, despite the fact that the VGIIb clade is thought to have emerged much earlier. The analysis presented here is the first multi-genome WGST study to focus on the C. gattii molecular subtypes involved in the Pacific Northwest emergence and describes the tools that will further our understanding of this emerging pathogen.

  12. Multidrug-resistant Escherichia coli soft tissue infection investigated with bacterial whole genome sequencing.

    Science.gov (United States)

    Buchanan, Ruaridh; Stoesser, Nicole; Crook, Derrick; Bowler, Ian C J W

    2014-10-19

    A 45-year-old man with dilated cardiomyopathy presented with acute leg pain and erythema suggestive of necrotising fasciitis. Initial surgical exploration revealed no necrosis and treatment for a soft tissue infection was started. Blood and tissue cultures unexpectedly grew a Gram-negative bacillus, subsequently identified by an automated broth microdilution phenotyping system as an extended-spectrum β-lactamase producing Escherichia coli. The patient was treated with a 3-week course of antibiotics (ertapenem followed by ciprofloxacin) and debridement for small areas of necrosis, followed by skin grafting. The presence of E. coli triggered investigation of both host and pathogen. The patient was found to have previously undiagnosed liver disease, a risk factor for E. coli soft tissue infection. Whole genome sequencing of isolates from all specimens confirmed they were clonal, of sequence type ST131 and associated with a likely plasmid-associated AmpC (CMY-2), several other resistance genes and a number of virulence factors. 2014 BMJ Publishing Group Ltd.

  13. Whole-genome fingerprint of the DNA methylome during human B-cell differentiation

    Science.gov (United States)

    Kulis, Marta; Merkel, Angelika; Heath, Simon; Queirós, Ana C.; Schuyler, Ronald P.; Castellano, Giancarlo; Beekman, Renée; Raineri, Emanuele; Esteve, Anna; Clot, Guillem; Verdaguer-Dot, Nuria; Duran-Ferrer, Martí; Russiñol, Nuria; Vilarrasa-Blasi, Roser; Ecker, Simone; Pancaldi, Vera; Rico, Daniel; Agueda, Lidia; Blanc, Julie; Richardson, David; Clarke, Laura; Datta, Avik; Pascual, Marien; Agirre, Xabier; Prosper, Felipe; Alignani, Diego; Paiva, Bruno; Caron, Gersende; Fest, Thierry; Muench, Marcus O.; Fomin, Marina E.; Lee, Seung-Tae; Wiemels, Joseph L.; Valencia, Alfonso; Gut, Marta; Flicek, Paul; Stunnenberg, Hendrik G.; Siebert, Reiner; Küppers, Ralf; Gut, Ivo G.; Campo, Elías; Martín-Subero, José I.

    2017-01-01

    We analyzed the DNA methylome of ten subpopulations spanning the entire B-cell differentiation program by whole-genome bisulfite sequencing and high-density microarrays. We observed that non-CpG methylation disappeared upon B-cell commitment whereas CpG methylation changed extensively during B-cell maturation, showing an accumulative pattern and affecting around 30% of all measured CpGs. Early differentiation stages mainly displayed enhancer demethylation, which was associated with upregulation of key B-cell transcription factors and affected multiple genes involved in B-cell biology. Late differentiation stages, in contrast, showed extensive demethylation of heterochromatin and methylation gain of polycomb-repressed areas, and did not affect genes with apparent functional impact in B cells. This signature, which has been previously linked to aging and cancer, was particularly widespread in mature cells with extended life span. Comparing B-cell neoplasms with their normal counterparts, we identified that they frequently acquire methylation changes in regions undergoing dynamic methylation already during normal B-cell differentiation. PMID:26053498

  14. Whole-genome amplification of single-cell genomes for next-generation sequencing.

    Science.gov (United States)

    Korfhage, Christian; Fisch, Evelyn; Fricke, Evelyn; Baedker, Silke; Loeffert, Dirk

    2013-10-11

    DNA sequence analysis and genotyping of biological samples using next-generation sequencing (NGS), microarrays, or real-time PCR is often limited by the small amount of sample available. A single cell contains only one to four copies of the genomic DNA, depending on the organism (haploid or diploid organism) and the cell-cycle phase. The DNA content of a single cell ranges from a few femtograms in bacteria to picograms in mammalia. In contrast, a deep analysis of the genome currently requires a few hundred nanograms up to micrograms of genomic DNA for library formation necessary for NGS sequencing or labeling protocols (e.g., microarrays). Consequently, accurate whole-genome amplification (WGA) of single-cell DNA is required for reliable genetic analysis (e.g., NGS) and is particularly important when genomic DNA is limited. The use of single-cell WGA has enabled the analysis of genomic heterogeneity of individual cells (e.g., somatic genomic variation in tumor cells). This unit describes how the genome of single cells can be used for WGA for further genomic studies, such as NGS. Recommendations for isolation of single cells are given and common sources of errors are discussed.

  15. Genome management and mismanagement--cell-level opportunities and challenges of whole-genome duplication.

    Science.gov (United States)

    Yant, Levi; Bomblies, Kirsten

    2015-12-01

    Whole-genome duplication (WGD) doubles the DNA content in the nucleus and leads to polyploidy. In whole-organism polyploids, WGD has been implicated in adaptability and the evolution of increased genome complexity, but polyploidy can also arise in somatic cells of otherwise diploid plants and animals, where it plays important roles in development and likely environmental responses. As with whole organisms, WGD can also promote adaptability and diversity in proliferating cell lineages, although whether WGD is beneficial is clearly context-dependent. WGD is also sometimes associated with aging and disease and may be a facilitator of dangerous genetic and karyotypic diversity in tumorigenesis. Scaling changes can affect cell physiology, but problems associated with WGD in large part seem to arise from problems with chromosome segregation in polyploid cells. Here we discuss both the adaptive potential and problems associated with WGD, focusing primarily on cellular effects. We see value in recognizing polyploidy as a key player in generating diversity in development and cell lineage evolution, with intriguing parallels across kingdoms.

  16. Using whole-genome sequencing to determine appropriate streptomycin epidemiological cutoffs for Salmonella and Escherichia coli.

    Science.gov (United States)

    Tyson, Gregory H; Li, Cong; Ayers, Sherry; McDermott, Patrick F; Zhao, Shaohua

    2016-02-01

    For Enterobacteriaceae such as Salmonella spp. and Escherichia coli, no unified interpretive resistance criteria exist for streptomycin, an epidemiologically important antibiotic. As part of the National Antimicrobial Resistance Monitoring System, we had previously used a minimum inhibitory concentration of ≥ 64 μg mL(-1) as an epidemiological cutoff value (ECV) to define non-wild-type isolates. To identify whether this ECV correlated with genetic determinants of resistance, we performed whole-genome sequencing of 463 Salmonella and E. coli isolates to identify streptomycin resistance genotypes. From this analysis, we found that using a streptomycin resistance breakpoint of ≥ 64 μg mL(-1) classified over 20% of strains possessing aadA or strA/strB resistance genes as wild-type. Therefore, to improve the concordance between genotypic and phenotypic data, we propose reducing the phenotypic cutoff values to ≥ 32 μg mL(-1) for both Salmonella and E. coli, to be used widely as ECVs to categorize non-wild-type isolates.

  17. Whole-genome sequencing reveals the effect of vaccination on the evolution of Bordetella pertussis.

    Science.gov (United States)

    Xu, Yinghua; Liu, Bin; Gröndahl-Yli-Hannuksila, Kirsi; Tan, Yajun; Feng, Lu; Kallonen, Teemu; Wang, Lichan; Peng, Ding; He, Qiushui; Wang, Lei; Zhang, Shumin

    2015-08-18

    Herd immunity can potentially induce a change of circulating viruses. However, it remains largely unknown that how bacterial pathogens adapt to vaccination. In this study, Bordetella pertussis, the causative agent of whooping cough, was selected as an example to explore possible effect of vaccination on the bacterial pathogen. We sequenced and analysed the complete genomes of 40 B. pertussis strains from Finland and China, as well as 11 previously sequenced strains from the Netherlands, where different vaccination strategies have been used over the past 50 years. The results showed that the molecular clock moved at different rates in these countries and in distinct periods, which suggested that evolution of the B. pertussis population was closely associated with the country vaccination coverage. Comparative whole-genome analyses indicated that evolution in this human-restricted pathogen was mainly characterised by ongoing genetic shift and gene loss. Furthermore, 116 SNPs were specifically detected in currently circulating ptxP3-containing strains. The finding might explain the successful emergence of this lineage and its spread worldwide. Collectively, our results suggest that the immune pressure of vaccination is one major driving force for the evolution of B. pertussis, which facilitates further exploration of the pathogenicity of B. pertussis.

  18. Whole-Genome Sequencing Uncovers the Genetic Basis of Chronic Mountain Sickness in Andean Highlanders

    Science.gov (United States)

    Zhou, Dan; Udpa, Nitin; Ronen, Roy; Stobdan, Tsering; Liang, Junbin; Appenzeller, Otto; Zhao, Huiwen W.; Yin, Yi; Du, Yuanping; Guo, Lixia; Cao, Rui; Wang, Yu; Jin, Xin; Huang, Chen; Jia, Wenlong; Cao, Dandan; Guo, Guangwu; Gamboa, Jorge L.; Villafuerte, Francisco; Callacondo, David; Xue, Jin; Liu, Siqi; Frazer, Kelly A.; Li, Yingrui; Bafna, Vineet; Haddad, Gabriel G.

    2013-01-01

    The hypoxic conditions at high altitudes present a challenge for survival, causing pressure for adaptation. Interestingly, many high-altitude denizens (particularly in the Andes) are maladapted, with a condition known as chronic mountain sickness (CMS) or Monge disease. To decode the genetic basis of this disease, we sequenced and compared the whole genomes of 20 Andean subjects (10 with CMS and 10 without). We discovered 11 regions genome-wide with significant differences in haplotype frequencies consistent with selective sweeps. In these regions, two genes (an erythropoiesis regulator, SENP1, and an oncogene, ANP32D) had a higher transcriptional response to hypoxia in individuals with CMS relative to those without. We further found that downregulating the orthologs of these genes in flies dramatically enhanced survival rates under hypoxia, demonstrating that suppression of SENP1 and ANP32D plays an essential role in hypoxia tolerance. Our study provides an unbiased framework to identify and validate the genetic basis of adaptation to high altitudes and identifies potentially targetable mechanisms for CMS treatment. PMID:23954164

  19. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA

    DEFF Research Database (Denmark)

    Poulsen, Jesper Buchhave; Lescai, Francesco; Grove, Jakob

    2016-01-01

    Stored neonatal dried blood spot (DBS) samples from neonatal screening programmes are a valuable diagnostic and research resource. Combined with information from national health registries they can be used in population-based studies of genetic diseases. DNA extracted from neonatal DBSs can......_ref and a WB_ref replica sharing DNA extract with the WB_ref sample. Pilot 3: DBS_2x3.2, WB_ref, wgaDNA of 2x1.6 mm neonatal DBSs and wgaDNA of the WB reference sample. Following sequencing and data analysis, we compared pairwise variant calls to obtain a measure of similarity--the concordance rate...... be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA). Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject...

  20. Use of whole genome shotgun metagenomics: a practical guide for the microbiome-minded physician scientist.

    Science.gov (United States)

    Ma, Jun; Prince, Amanda; Aagaard, Kjersti M

    2014-01-01

    Whole genome shotgun sequencing (WGS) has been increasingly recognized as the most comprehensive and robust approach for metagenomics research. When compared with 16S-based metagenomics, it offers the advantage of identification of species level taxonomy and the estimation of metabolic pathway activities from human and environmental samples. Several large-scale metagenomic projects have been recently conducted or are currently underway utilizing WGS. With the generation of vast amounts of data, the bioinformatics and computational analysis of WGS results become vital for the success of a metagenomics study. However, each step in the WGS data analysis, including metagenome assembly, gene prediction, taxonomy identification, function annotation, and pathway analysis, is complicated by the shear amount of data. Algorithms and tools have been developed specifically to handle WGS-generated metagenomics data with the hope of reducing the requirement on computational time and storage space. Here, we present an overview of the current state of metagenomics through WGS sequencing, challenges frequently encountered, and up-to-date solutions. Several applications that are uniquely applicable to microbiome studies in reproductive and perinatal medicine are also discussed.

  1. Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic

    Directory of Open Access Journals (Sweden)

    Samantha B. Foley

    2015-01-01

    Full Text Available Despite the potential of whole-genome sequencing (WGS to improve patient diagnosis and care, the empirical value of WGS in the cancer genetics clinic is unknown. We performed WGS on members of two cohorts of cancer genetics patients: those with BRCA1/2 mutations (n = 176 and those without (n = 82. Initial analysis of potentially pathogenic variants (PPVs, defined as nonsynonymous variants with allele frequency < 1% in ESP6500 in 163 clinically-relevant genes suggested that WGS will provide useful clinical results. This is despite the fact that a majority of PPVs were novel missense variants likely to be classified as variants of unknown significance (VUS. Furthermore, previously reported pathogenic missense variants did not always associate with their predicted diseases in our patients. This suggests that the clinical use of WGS will require large-scale efforts to consolidate WGS and patient data to improve accuracy of interpretation of rare variants. While loss-of-function (LoF variants represented only a small fraction of PPVs, WGS identified additional cancer risk LoF PPVs in patients with known BRCA1/2 mutations and led to cancer risk diagnoses in 21% of non-BRCA cancer genetics patients after expanding our analysis to 3209 ClinVar genes. These data illustrate how WGS can be used to improve our ability to discover patients' cancer genetic risks.

  2. A field guide to whole-genome sequencing, assembly and annotation.

    Science.gov (United States)

    Ekblom, Robert; Wolf, Jochen B W

    2014-11-01

    Genome sequencing projects were long confined to biomedical model organisms and required the concerted effort of large consortia. Rapid progress in high-throughput sequencing technology and the simultaneous development of bioinformatic tools have democratized the field. It is now within reach for individual research groups in the eco-evolutionary and conservation community to generate de novo draft genome sequences for any organism of choice. Because of the cost and considerable effort involved in such an endeavour, the important first step is to thoroughly consider whether a genome sequence is necessary for addressing the biological question at hand. Once this decision is taken, a genome project requires careful planning with respect to the organism involved and the intended quality of the genome draft. Here, we briefly review the state of the art within this field and provide a step-by-step introduction to the workflow involved in genome sequencing, assembly and annotation with particular reference to large and complex genomes. This tutorial is targeted at scientists with a background in conservation genetics, but more generally, provides useful practical guidance for researchers engaging in whole-genome sequencing projects.

  3. Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units

    Science.gov (United States)

    Saunders, Carol Jean; Miller, Neil Andrew; Soden, Sarah Elizabeth; Dinwiddie, Darrell Lee; Noll, Aaron; Alnadi, Noor Abu; Andraws, Nevene; Patterson, Melanie LeAnn; Krivohlavek, Lisa Ann; Fellis, Joel; Humphray, Sean; Saffrey, Peter; Kingsbury, Zoya; Weir, Jacqueline Claire; Betley, Jason; Grocock, Russell James; Margulies, Elliott Harrison; Farrow, Emily Gwendolyn; Artman, Michael; Safina, Nicole Pauline; Petrikin, Joshua Erin; Hall, Kevin Peter; Kingsmore, Stephen Francis

    2014-01-01

    Monogenic diseases are frequent causes of neonatal morbidity and mortality, and disease presentations are often undifferentiated at birth. More than 3500 monogenic diseases have been characterized, but clinical testing is available for only some of them and many feature clinical and genetic heterogeneity. Hence, an immense unmet need exists for improved molecular diagnosis in infants. Because disease progression is extremely rapid, albeit heterogeneous, in newborns, molecular diagnoses must occur quickly to be relevant for clinical decision-making. We describe 50-hour differential diagnosis of genetic disorders by whole-genome sequencing (WGS) that features automated bioinformatic analysis and is intended to be a prototype for use in neonatal intensive care units. Retrospective 50-hour WGS identified known molecular diagnoses in two children. Prospective WGS disclosed potential molecular diagnosis of a severe GJB2-related skin disease in one neonate; BRAT1-related lethal neonatal rigidity and multifocal seizure syndrome in another infant; identified BCL9L as a novel, recessive visceral heterotaxy gene (HTX6) in a pedigree; and ruled out known candidate genes in one infant. Sequencing of parents or affected siblings expedited the identification of disease genes in prospective cases. Thus, rapid WGS can potentially broaden and foreshorten differential diagnosis, resulting in fewer empirical treatments and faster progression to genetic and prognostic counseling. PMID:23035047

  4. Examining phylogenetic relationships of Erwinia and Pantoea species using whole genome sequence data.

    Science.gov (United States)

    Zhang, Yucheng; Qiu, Sai

    2015-11-01

    The genera Erwinia and Pantoea contain species that are devastating plant pathogens, non-pathogen epiphytes, and opportunistic human pathogens. However, some controversies persist in the taxonomic classification of these two closely related genera. The phylogenomic analysis of these two genera was investigated via a comprehensive analysis of 25 Erwinia genomes and 23 Pantoea genomes. Single-copy orthologs could be extracted from the Erwinia/Pantoea core-genome to reconstruct the Erwinia/Pantoea phylogeny. This tree has strong bootstrap support for almost all branches. We also estimated the in silico DNA-DNA hybridization (isDDH) and the average nucleotide identity (ANI) values between each genome; strains from the same species showed ANI values ≥96% and isDDH values >70%. These data confirm that whole genome sequence data provides a powerful tool to resolve the complex taxonomic questions of Erwinia/Pantoea, e.g. Pantoea agglomerans 299R was not clustered into a single group with other P. agglomerans strains, and the ANI values and isDDH values between them were Erwinia/Pantoea phylogeny.

  5. Whole-genome copy number variation analysis in anophthalmia and microphthalmia.

    Science.gov (United States)

    Schilter, K F; Reis, L M; Schneider, A; Bardakjian, T M; Abdul-Rahman, O; Kozel, B A; Zimmerman, H H; Broeckel, U; Semina, E V

    2013-11-01

    Anophthalmia/microphthalmia (A/M) represent severe developmental ocular malformations. Currently, mutations in known genes explain less than 40% of A/M cases. We performed whole-genome copy number variation analysis in 60 patients affected with isolated or syndromic A/M. Pathogenic deletions of 3q26 (SOX2) were identified in four independent patients with syndromic microphthalmia. Other variants of interest included regions with a known role in human disease (likely pathogenic) as well as novel rearrangements (uncertain significance). A 2.2-Mb duplication of 3q29 in a patient with non-syndromic anophthalmia and an 877-kb duplication of 11p13 (PAX6) and a 1.4-Mb deletion of 17q11.2 (NF1) in two independent probands with syndromic microphthalmia and other ocular defects were identified; while ocular anomalies have been previously associated with 3q29 duplications, PAX6 duplications, and NF1 mutations in some cases, the ocular phenotypes observed here are more severe than previously reported. Three novel regions of possible interest included a 2q14.2 duplication which cosegregated with microphthalmia/microcornea and congenital cataracts in one family, and 2q21 and 15q26 duplications in two additional cases; each of these regions contains genes that are active during vertebrate ocular development. Overall, this study identified causative copy number mutations and regions with a possible role in ocular disease in 17% of A/M cases.

  6. Whole genome sequence and analysis of the Marwari horse breed and its genetic origin.

    Science.gov (United States)

    Jun, JeHoon; Cho, Yun Sung; Hu, Haejin; Kim, Hak-Min; Jho, Sungwoong; Gadhvi, Priyvrat; Park, Kyung Mi; Lim, Jeongheui; Paek, Woon Kee; Han, Kyudong; Manica, Andrea; Edwards, Jeremy S; Bhak, Jong

    2014-01-01

    The horse (Equus ferus caballus) is one of the earliest domesticated species and has played an important role in the development of human societies over the past 5,000 years. In this study, we characterized the genome of the Marwari horse, a rare breed with unique phenotypic characteristics, including inwardly turned ear tips. It is thought to have originated from the crossbreeding of local Indian ponies with Arabian horses beginning in the 12th century. We generated 101 Gb (~30 × coverage) of whole genome sequences from a Marwari horse using the Illumina HiSeq2000 sequencer. The sequences were mapped to the horse reference genome at a mapping rate of ~98% and with ~95% of the genome having at least 10 × coverage. A total of 5.9 million single nucleotide variations, 0.6 million small insertions or deletions, and 2,569 copy number variation blocks were identified. We confirmed a strong Arabian and Mongolian component in the Marwari genome. Novel variants from the Marwari sequences were annotated, and were found to be enriched in olfactory functions. Additionally, we suggest a potential functional genetic variant in the TSHZ1 gene (p.Ala344>Val) associated with the inward-turning ear tip shape of the Marwari horses. Here, we present an analysis of the Marwari horse genome. This is the first genomic data for an Asian breed, and is an invaluable resource for future studies of genetic variation associated with phenotypes and diseases in horses.

  7. Unique features of a Japanese 'Candidatus Liberibacter asiaticus' strain revealed by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Hiroshi Katoh

    Full Text Available Citrus greening (huanglongbing is the most destructive disease of citrus worldwide. It is spread by citrus psyllids and is associated with phloem-limited bacteria of three species of α-Proteobacteria, namely, 'Candidatus Liberibacter asiaticus', 'Ca. L. americanus', and 'Ca. L. africanus'. Recent findings suggested that some Japanese strains lack the bacteriophage-type DNA polymerase region (DNA pol, in contrast to the Floridian psy62 strain. The whole genome sequence of the pol-negative 'Ca. L. asiaticus' Japanese isolate Ishi-1 was determined by metagenomic analysis of DNA extracted from 'Ca. L. asiaticus'-infected psyllids and leaf midribs. The 1.19-Mb genome has an average 36.32% GC content. Annotation revealed 13 operons encoding rRNA and 44 tRNA genes, but no typical bacterial pathogenesis-related genes were located within the genome, similar to the Floridian psy62 and Chinese gxpsy. In contrast to other 'Ca. L. asiaticus' strains, the genome of the Japanese Ishi-1 strain lacks a prophage-related region.

  8. Whole genome association study of rheumatoid arthritis using 27 039 microsatellites.

    Science.gov (United States)

    Tamiya, Gen; Shinya, Minori; Imanishi, Tadashi; Ikuta, Tomoki; Makino, Satoshi; Okamoto, Koichi; Furugaki, Koh; Matsumoto, Toshiko; Mano, Shuhei; Ando, Satoshi; Nozaki, Yasuyuki; Yukawa, Wataru; Nakashige, Ryo; Yamaguchi, Daisuke; Ishibashi, Hideo; Yonekura, Manabu; Nakami, Yuu; Takayama, Seiken; Endo, Takaho; Saruwatari, Takuya; Yagura, Masaru; Yoshikawa, Yoko; Fujimoto, Kei; Oka, Akira; Chiku, Suenori; Linsen, Samuel E V; Giphart, Marius J; Kulski, Jerzy K; Fukazawa, Toru; Hashimoto, Hiroshi; Kimura, Minoru; Hoshina, Yuuichi; Suzuki, Yasuo; Hotta, Tomomitsu; Mochida, Joji; Minezaki, Takatoshi; Komai, Koichiro; Shiozawa, Shunichi; Taniguchi, Atsuo; Yamanaka, Hisashi; Kamatani, Naoyuki; Gojobori, Takashi; Bahram, Seiamak; Inoko, Hidetoshi

    2005-08-15

    A major goal of current human genome-wide studies is to identify the genetic basis of complex disorders. However, the availability of an unbiased, reliable, cost efficient and comprehensive methodology to analyze the entire genome for complex disease association is still largely lacking or problematic. Therefore, we have developed a practical and efficient strategy for whole genome association studies of complex diseases by charting the human genome at 100 kb intervals using a collection of 27,039 microsatellites and the DNA pooling method in three successive genomic screens of independent case-control populations. The final step in our methodology consists of fine mapping of the candidate susceptible DNA regions by single nucleotide polymorphisms (SNPs) analysis. This approach was validated upon application to rheumatoid arthritis, a destructive joint disease affecting up to 1% of the population. A total of 47 candidate regions were identified. The top seven loci, withstanding the most stringent statistical tests, were dissected down to individual genes and/or SNPs on four chromosomes, including the previously known 6p21.3-encoded Major Histocompatibility Complex gene, HLA-DRB1. Hence, microsatellite-based genome-wide association analysis complemented by end stage SNP typing provides a new tool for genetic dissection of multifactorial pathologies including common diseases.

  9. Single Cell Analysis of Dystrophin and SRY Gene by Using Whole Genome Amplification

    Institute of Scientific and Technical Information of China (English)

    徐晨明; 金帆; 黄荷凤; 陶冶; 叶英辉

    2001-01-01

    Objective To develop a reliable and sensitive method for detection of sex and multiloci of Duchenne muscular dystrophy (DMD) gene in single cell Materials & methods Whole genome of single cell were amplified by using 15-base random primers (primer extension preamplification, PEP), then a small aliquot of PEP product were analyzed by using locus-specific nest PCR amplification. The procedure was evaluated by detection dystrophin exons 8, 17, 19, 44, 45, 48 and human testis-determining gene (SRY)in single lymphocytes from known sources and single blastomeres from the couples with no family history of DMD.Results The amplification efficiency rate of six dystrophin exons from single lymphocytes and single blastomeres were 97. 2% (175/180) and 100% (60/60) respectively.Results of SRY showed that 100% (15/15) amplification in single male-derived lymphocytes and 0% (0/15) amplification in single female-derived lymphocytes. Conclusion The technique of single cell PEP-nest PCR for dystrophin exons 8, 17,19, 44, 45, 48 and SRY is highly specifc. PEP-nest PCR is suitable for Preimplantation genetic diagnosis (PGD) of DMD at single cell level.

  10. Whole genome sequence of Staphylococcus saprophyticus reveals the pathogenesis of uncomplicated urinary tract infection.

    Science.gov (United States)

    Kuroda, Makoto; Yamashita, Atsushi; Hirakawa, Hideki; Kumano, Miyuki; Morikawa, Kazuya; Higashide, Masato; Maruyama, Atsushi; Inose, Yumiko; Matoba, Kimio; Toh, Hidehiro; Kuhara, Satoru; Hattori, Masahira; Ohta, Toshiko

    2005-09-13

    Staphylococcus saprophyticus is a uropathogenic Staphylococcus frequently isolated from young female outpatients presenting with uncomplicated urinary tract infections. We sequenced the whole genome of S. saprophyticus type strain ATCC 15305, which harbors a circular chromosome of 2,516,575 bp with 2,446 ORFs and two plasmids. Comparative genomic analyses with the strains of two other species, Staphylococcus aureus and Staphylococcus epidermidis, as well as experimental data, revealed the following characteristics of the S. saprophyticus genome. S. saprophyticus does not possess any virulence factors found in S. aureus, such as coagulase, enterotoxins, exoenzymes, and extracellular matrix-binding proteins, although it does have a remarkable paralog expansion of transport systems related to highly variable ion contents in the urinary environment. A further unique feature is that only a single ORF is predictable as a cell wall-anchored protein, and it shows positive hemagglutination and adherence to human bladder cell associated with initial colonization in the urinary tract. It also shows significantly high urease activity in S. saprophyticus. The uropathogenicity of S. saprophyticus can be attributed to its genome that is needed for its survival in the human urinary tract by means of novel cell wall-anchored adhesin and redundant uro-adaptive transport systems, together with urease.

  11. Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate.

    Directory of Open Access Journals (Sweden)

    Benjamin Georgi

    2014-03-01

    Full Text Available Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders.

  12. The delivery of therapeutic oligonucleotides.

    Science.gov (United States)

    Juliano, Rudolph L

    2016-08-19

    The oligonucleotide therapeutics field has seen remarkable progress over the last few years with the approval of the first antisense drug and with promising developments in late stage clinical trials using siRNA or splice switching oligonucleotides. However, effective delivery of oligonucleotides to their intracellular sites of action remains a major issue. This review will describe the biological basis of oligonucleotide delivery including the nature of various tissue barriers and the mechanisms of cellular uptake and intracellular trafficking of oligonucleotides. It will then examine a variety of current approaches for enhancing the delivery of oligonucleotides. This includes molecular scale targeted ligand-oligonucleotide conjugates, lipid- and polymer-based nanoparticles, antibody conjugates and small molecules that improve oligonucleotide delivery. The merits and liabilities of these approaches will be discussed in the context of the underlying basic biology. © The Author 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Whole genome scan to detect quantitative trait loci for bovine milk protein composition.

    Science.gov (United States)

    Schopen, G C B; Koks, P D; van Arendonk, J A M; Bovenhuis, H; Visker, M H P W

    2009-08-01

    The objective of this study was to perform a whole genome scan to detect quantitative trait loci (QTL) for milk protein composition in 849 Holstein-Friesian cows originating from seven sires. One morning milk sample was analysed for the major milk proteins using capillary zone electrophoresis. A genetic map was constructed with 1341 single nucleotide polymorphisms, covering 2829 centimorgans (cM) and 95% of the cattle genome. The chromosomal regions most significantly related to milk protein composition (P(genome) casein, alpha(S2)-casein, beta-casein and kappa-casein. The QTL on BTA11 was found at 124 cM, and affected beta-lactoglobulin, and the QTL on BTA14 was found at 0 cM, and affected protein percentage. The proportion of phenotypic variance explained by the QTL was 3.6% for beta-casein and 7.9% for kappa-casein on BTA6, 28.3% for beta-lactoglobulin on BTA11, and 8.6% for protein percentage on BTA14. The QTL affecting alpha(S2)-casein on BTA6 and 17 showed a significant interaction. We investigated the extent to which the detected QTL affecting milk protein composition could be explained by known polymorphisms in beta-casein, kappa-casein, beta-lactoglobulin and DGAT1 genes. Correction for these polymorphisms decreased the proportion of phenotypic variance explained by the QTL previously found on BTA6, 11 and 14. Thus, several significant QTL affecting milk protein composition were found, of which some QTL could partially be explained by polymorphisms in milk protein genes.

  14. Utility of Whole-Genome Sequencing in Characterizing Acinetobacter Epidemiology and Analyzing Hospital Outbreaks.

    Science.gov (United States)

    Fitzpatrick, Margaret A; Ozer, Egon A; Hauser, Alan R

    2016-03-01

    Acinetobacter baumannii frequently causes nosocomial infections and outbreaks. Whole-genome sequencing (WGS) is a promising technique for strain typing and outbreak investigations. We compared the performance of conventional methods with WGS for strain typing clinical Acinetobacter isolates and analyzing a carbapenem-resistant A. baumannii (CRAB) outbreak. We performed two band-based typing techniques (pulsed-field gel electrophoresis and repetitive extragenic palindromic-PCR), multilocus sequence type (MLST) analysis, and WGS on 148 Acinetobacter calcoaceticus-A. baumannii complex bloodstream isolates collected from a single hospital from 2005 to 2012. Phylogenetic trees inferred from core-genome single nucleotide polymorphisms (SNPs) confirmed three Acinetobacter species within this collection. Four major A. baumannii clonal lineages (as defined by MLST) circulated during the study, three of which are globally distributed and one of which is novel. WGS indicated that a threshold of 2,500 core SNPs accurately distinguished A. baumannii isolates from different clonal lineages. The band-based techniques performed poorly in assigning isolates to clonal lineages and exhibited little agreement with sequence-based techniques. After applying WGS to a CRAB outbreak that occurred during the study, we identified a threshold of 2.5 core SNPs that distinguished nonoutbreak from outbreak strains. WGS was more discriminatory than the band-based techniques and was used to construct a more accurate transmission map that resolved many of the plausible transmission routes suggested by epidemiologic links. Our study demonstrates that WGS is superior to conventional techniques for A. baumannii strain typing and outbreak analysis. These findings support the incorporation of WGS into health care infection prevention efforts.

  15. Whole-genome SNP association analysis of reproduction traits in the Finnish Landrace pig breed

    Directory of Open Access Journals (Sweden)

    Uimari Pekka

    2011-12-01

    Full Text Available Abstract Background Good genetic progress for pig reproduction traits has been achieved using a quantitative genetics-based multi-trait BLUP evaluation system. At present, whole-genome single nucleotide polymorphisms (SNP panels provide a new tool for pig selection. The purpose of this study was to identify SNP associated with reproduction traits in the Finnish Landrace pig breed using the Illumina PorcineSNP60 BeadChip. Methods Association of each SNP with different traits was tested with a weighted linear model, using SNP genotype as a covariate and animal as a random variable. Deregressed estimated breeding values of the progeny tested boars were used as the dependent variable and weights were based on their reliabilities. Statistical significance of the associations was based on Bonferroni-corrected P-values. Results Deregressed estimated breeding values were available for 328 genotyped boars. Of the 62 163 SNP in the chip, 57 868 SNP had a call rate > 0.9 and 7 632 SNP were monomorphic. Statistically significant results (P-value P-value P-value = 1.69E-08 more than unfavourable double homozygote animals. A region on chromosome 9 (66 Mb was statistically significant for piglet mortality between birth and weaning in later parity (0.44 piglets between homozygotes, P-value = 6.94E-08. Conclusions Three separate regions on chromosome 9 gave significant results for litter size and pig mortality. The frequencies of favourable alleles of the significant SNP are moderate in the Finnish Landrace population and these SNP are thus valuable candidates for possible marker-assisted selection.

  16. Whole-Genome Saliva and Blood DNA Methylation Profiling in Individuals with a Respiratory Allergy.

    Science.gov (United States)

    Langie, Sabine A S; Szarc Vel Szic, Katarzyna; Declerck, Ken; Traen, Sophie; Koppen, Gudrun; Van Camp, Guy; Schoeters, Greet; Vanden Berghe, Wim; De Boever, Patrick

    2016-01-01

    The etiology of respiratory allergies (RA) can be partly explained by DNA methylation changes caused by adverse environmental and lifestyle factors experienced early in life. Longitudinal, prospective studies can aid in the unravelment of the epigenetic mechanisms involved in the disease development. High compliance rates can be expected in these studies when data is collected using non-invasive and convenient procedures. Saliva is an attractive biofluid to analyze changes in DNA methylation patterns. We investigated in a pilot study the differential methylation in saliva of RA (n = 5) compared to healthy controls (n = 5) using the Illumina Methylation 450K BeadChip platform. We evaluated the results against the results obtained in mononuclear blood cells from the same individuals. Differences in methylation patterns from saliva and mononuclear blood cells were clearly distinguishable (PAdj0.2), though the methylation status of about 96% of the cg-sites was comparable between peripheral blood mononuclear cells and saliva. When comparing RA cases with healthy controls, the number of differentially methylated sites (DMS) in saliva and blood were 485 and 437 (P0.1), respectively, of which 216 were in common. The methylation levels of these sites were significantly correlated between blood and saliva. The absolute levels of methylation in blood and saliva were confirmed for 3 selected DMS in the PM20D1, STK32C, and FGFR2 genes using pyrosequencing analysis. The differential methylation could only be confirmed for DMS in PM20D1 and STK32C genes in saliva. We show that saliva can be used for genome-wide methylation analysis and that it is possible to identify DMS when comparing RA cases and healthy controls. The results were replicated in blood cells of the same individuals and confirmed by pyrosequencing analysis. This study provides proof-of-concept for the applicability of saliva-based whole-genome methylation analysis in the field of respiratory allergy.

  17. The American cranberry: first insights into the whole genome of a species adapted to bog habitat

    Science.gov (United States)

    2014-01-01

    Background The American cranberry (Vaccinium macrocarpon Ait.) is one of only three widely-cultivated fruit crops native to North America- the other two are blueberry (Vaccinium spp.) and native grape (Vitis spp.). In terms of taxonomy, cranberries are in the core Ericales, an order for which genome sequence data are currently lacking. In addition, cranberries produce a host of important polyphenolic secondary compounds, some of which are beneficial to human health. Whereas next-generation sequencing technology is allowing the advancement of whole-genome sequencing, one major obstacle to the successful assembly from short-read sequence data of complex diploid (and higher ploidy) organisms is heterozygosity. Cranberry has the advantage of being diploid (2n = 2x = 24) and self-fertile. To minimize the issue of heterozygosity, we sequenced the genome of a fifth-generation inbred genotype (F ≥ 0.97) derived from five generations of selfing originating from the cultivar Ben Lear. Results The genome size of V. macrocarpon has been estimated to be about 470 Mb. Genomic sequences were assembled into 229,745 scaffolds representing 420 Mbp (N50 = 4,237 bp) with 20X average coverage. The number of predicted genes was 36,364 and represents 17.7% of the assembled genome. Of the predicted genes, 30,090 were assigned to candidate genes based on homology. Genes supported by transcriptome data totaled 13,170 (36%). Conclusions Shotgun sequencing of the cranberry genome, with an average sequencing coverage of 20X, allowed efficient assembly and gene calling. The candidate genes identified represent a useful collection to further study important biochemical pathways and cellular processes and to use for marker development for breeding and the study of horticultural characteristics, such as disease resistance. PMID:24927653

  18. Microbiota present in cystic fibrosis lungs as revealed by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Philippe M Hauser

    Full Text Available Determination of the precise composition and variation of microbiota in cystic fibrosis lungs is crucial since chronic inflammation due to microorganisms leads to lung damage and ultimately, death. However, this constitutes a major technical challenge. Culturing of microorganisms does not provide a complete representation of a microbiota, even when using culturomics (high-throughput culture. So far, only PCR-based metagenomics have been investigated. However, these methods are biased towards certain microbial groups, and suffer from uncertain quantification of the different microbial domains. We have explored whole genome sequencing (WGS using the Illumina high-throughput technology applied directly to DNA extracted from sputa obtained from two cystic fibrosis patients. To detect all microorganism groups, we used four procedures for DNA extraction, each with a different lysis protocol. We avoided biases due to whole DNA amplification thanks to the high efficiency of current Illumina technology. Phylogenomic classification of the reads by three different methods produced similar results. Our results suggest that WGS provides, in a single analysis, a better qualitative and quantitative assessment of microbiota compositions than cultures and PCRs. WGS identified a high quantity of Haemophilus spp. (patient 1 or Staphylococcus spp. plus Streptococcus spp. (patient 2 together with low amounts of anaerobic (Veillonella, Prevotella, Fusobacterium and aerobic bacteria (Gemella, Moraxella, Granulicatella. WGS suggested that fungal members represented very low proportions of the microbiota, which were detected by cultures and PCRs because of their selectivity. The future increase of reads' sizes and decrease in cost should ensure the usefulness of WGS for the characterisation of microbiota.

  19. Whole Genome Duplications Shaped the Receptor Tyrosine Kinase Repertoire of Jawed Vertebrates.

    Science.gov (United States)

    Brunet, Frédéric G; Volff, Jean-Nicolas; Schartl, Manfred

    2016-06-03

    The receptor tyrosine kinase (RTK) gene family, involved primarily in cell growth and differentiation, comprises proteins with a common enzymatic tyrosine kinase intracellular domain adjacent to a transmembrane region. The amino-terminal portion of RTKs is extracellular and made of different domains, the combination of which characterizes each of the 20 RTK subfamilies among mammals. We analyzed a total of 7,376 RTK sequences among 143 vertebrate species to provide here the first comprehensive census of the jawed vertebrate repertoire. We ascertained the 58 genes previously described in the human and mouse genomes and established their phylogenetic relationships. We also identified five additional RTKs amounting to a total of 63 genes in jawed vertebrates. We found that the vertebrate RTK gene family has been shaped by the two successive rounds of whole genome duplications (WGD) called 1R and 2R (1R/2R) that occurred at the base of the vertebrates. In addition, the Vegfr and Ephrin receptor subfamilies were expanded by single gene duplications. In teleost fish, 23 additional RTK genes have been retained after another expansion through the fish-specific third round (3R) of WGD. Several lineage-specific gene losses were observed. For instance, birds have lost three RTKs, and different genes are missing in several fish sublineages. The RTK gene family presents an unusual high gene retention rate from the vertebrate WGDs (58.75% after 1R/2R, 64.4% after 3R), resulting in an expansion that might be correlated with the evolution of complexity of vertebrate cellular communication and intracellular signaling.

  20. Whole genome analysis of Mycobacterium tuberculosis isolates from recurrent episodes of tuberculosis, Finland, 1995-2013.

    Science.gov (United States)

    Korhonen, V; Smit, P W; Haanperä, M; Casali, N; Ruutu, P; Vasankari, T; Soini, H

    2016-06-01

    Recurrent tuberculosis (TB) is caused by an endogenous re-activation of the same strain of Mycobacterium tuberculosis (relapse) or exogenous infection with a new strain (re-infection). Recurrence of TB in Finland was analysed in a population-based, 19-year study, and genotyping was used to define relapse and re-infection. The M. tuberculosis isolates from patients with suspected relapse were further analysed by whole genome sequencing (WGS) to determine the number and type of mutations occurring in the bacterial genome between the first and second disease episodes. In addition, publicly available tools (PhyResSE and SpolPred) were used to predict drug resistance and spoligotype profile from the WGS data. Of the 8299 notified TB cases, 48 (0.6%) patients had episodes classified as recurrent. Forty-two patients had more than one culture-confirmed TB episode, and isolates from two episodes in 21 patients were available for genotyping. In 18 patients, the M. tuberculosis isolates obtained from the first and second TB episodes had identical spoligotypes. The WGS analysis of the 36 M. tuberculosis isolates from the 18 suspected relapse patients (average time between isolates 2.8 years) revealed 0 to 38 single nucleotide polymorphisms (median 1, mean 3.78) between the first and second isolate. There seemed to be no direct relation between the number of years between the two isolates, or treatment outcome, and the number of single nucleotide polymorphisms. The results suggest that the mutation rate may depend on multiple host-, strain- and treatment-related factors.

  1. Whole genome evaluation of horizontal transfers in the pathogenic fungus Aspergillus fumigatus

    Directory of Open Access Journals (Sweden)

    Deschavanne Patrick

    2010-03-01

    Full Text Available Abstract Background Numerous cases of horizontal transfers (HTs have been described for eukaryote genomes, but in contrast to prokaryote genomes, no whole genome evaluation of HTs has been carried out. This is mainly due to a lack of parametric methods specially designed to take the intrinsic heterogeneity of eukaryote genomes into account. We applied a simple and tested method based on local variations of genomic signatures to analyze the genome of the pathogenic fungus Aspergillus fumigatus. Results We detected 189 atypical regions containing 214 genes, accounting for about 1 Mb of DNA sequences. However, the fraction of atypical DNA detected was smaller than the average amount detected in the same conditions in prokaryote genomes (3.1% vs 5.6%. It appeared that about one third of these regions contained no annotated genes, a proportion far greater than in prokaryote genomes. When analyzing the origin of these HTs by comparing their signatures to a home made database of species signatures, 3 groups of donor species emerged: bacteria (40%, fungi (25%, and viruses (22%. It is to be noticed that though inter-domain exchanges are confirmed, we only put in evidence very few exchanges between eukaryotic kingdoms. Conclusions In conclusion, we demonstrated that HTs are not negligible in eukaryote genomes, bearing in mind that in our stringent conditions this amount is a floor value, though of a lesser extent than in prokaryote genomes. The biological mechanisms underlying those transfers remain to be elucidated as well as the biological functions of the transferred genes.

  2. Computational analysis of whole-genome differential allelic expression data in human.

    Science.gov (United States)

    Wagner, James R; Ge, Bing; Pokholok, Dmitry; Gunderson, Kevin L; Pastinen, Tomi; Blanchette, Mathieu

    2010-07-08

    Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (approximately 750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 3' end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases.

  3. Discovery of Gene Sources for Economic Traits in Hanwoo by Whole-genome Resequencing

    Directory of Open Access Journals (Sweden)

    Younhee Shin

    2016-09-01

    Full Text Available Hanwoo, a Korean native cattle (Bos taurus coreana, has great economic value due to high meat quality. Also, the breed has genetic variations that are associated with production traits such as health, disease resistance, reproduction, growth as well as carcass quality. In this study, next generation sequencing technologies and the availability of an appropriate reference genome were applied to discover a large amount of single nucleotide polymorphisms (SNPs in ten Hanwoo bulls. Analysis of whole-genome resequencing generated a total of 26.5 Gb data, of which 594,716,859 and 592,990,750 reads covered 98.73% and 93.79% of the bovine reference genomes of UMD 3.1 and Btau 4.6.1, respectively. In total, 2,473,884 and 2,402,997 putative SNPs were discovered, of which 1,095,922 (44.3% and 982,674 (40.9% novel SNPs were discovered against UMD3.1 and Btau 4.6.1, respectively. Among the SNPs, the 46,301 (UMD 3.1 and 28,613 SNPs (Btau 4.6.1 that were identified as Hanwoo-specific SNPs were included in the functional genes that may be involved in the mechanisms of milk production, tenderness, juiciness, marbling of Hanwoo beef and yellow hair. Most of the Hanwoo-specific SNPs were identified in the promoter region, suggesting that the SNPs influence differential expression of the regulated genes relative to the relevant traits. In particular, the non-synonymous (ns SNPs found in CORIN, which is a negative regulator of Agouti, might be a causal variant to determine yellow hair of Hanwoo. Our results will provide abundant genetic sources of variation to characterize Hanwoo genetics and for subsequent breeding.

  4. Genomic analysis of the basal lineage fungus Rhizopus oryzae reveals a whole-genome duplication.

    Directory of Open Access Journals (Sweden)

    Li-Jun Ma

    2009-07-01

    Full Text Available Rhizopus oryzae is the primary cause of mucormycosis, an emerging, life-threatening infection characterized by rapid angioinvasive growth with an overall mortality rate that exceeds 50%. As a representative of the paraphyletic basal group of the fungal kingdom called "zygomycetes," R. oryzae is also used as a model to study fungal evolution. Here we report the genome sequence of R. oryzae strain 99-880, isolated from a fatal case of mucormycosis. The highly repetitive 45.3 Mb genome assembly contains abundant transposable elements (TEs, comprising approximately 20% of the genome. We predicted 13,895 protein-coding genes not overlapping TEs, many of which are paralogous gene pairs. The order and genomic arrangement of the duplicated gene pairs and their common phylogenetic origin provide evidence for an ancestral whole-genome duplication (WGD event. The WGD resulted in the duplication of nearly all subunits of the protein complexes associated with respiratory electron transport chains, the V-ATPase, and the ubiquitin-proteasome systems. The WGD, together with recent gene duplications, resulted in the expansion of multiple gene families related to cell growth and signal transduction, as well as secreted aspartic protease and subtilase protein families, which are known fungal virulence factors. The duplication of the ergosterol biosynthetic pathway, especially the major azole target, lanosterol 14alpha-demethylase (ERG11, could contribute to the variable responses of R. oryzae to different azole drugs, including voriconazole and posaconazole. Expanded families of cell-wall synthesis enzymes, essential for fungal cell integrity but absent in mammalian hosts, reveal potential targets for novel and R. oryzae-specific diagnostic and therapeutic treatments.

  5. Whole Genome Analysis of Leptospira licerasiae Provides Insight into Leptospiral Evolution and Pathogenicity

    Science.gov (United States)

    Selengut, Jeremy D.; Harkins, Derek M.; Patra, Kailash P.; Moreno, Angelo; Lehmann, Jason S.; Purushe, Janaki; Sanka, Ravi; Torres, Michael; Webster, Nicholas J.; Vinetz, Joseph M.; Matthias, Michael A.

    2012-01-01

    The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835) provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae) that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT). Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for its infectiousness

  6. Insight into Shiga toxin genes encoded by Escherichia coli O157 from whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Philip M. Ashton

    2015-02-01

    Full Text Available The ability of Shiga toxin-producing Escherichia coli (STEC to cause severe illness in humans is determined by multiple host factors and bacterial characteristics, including Shiga toxin (Stx subtype. Given the link between Stx2a subtype and disease severity, we sought to identify the stx subtypes present in whole genome sequences (WGS of 444 isolates of STEC O157. Difficulties in assembling the stx genes in some strains were overcome by using two complementary bioinformatics methods: mapping and de novo assembly. We compared the WGS analysis with the results obtained using a PCR approach and investigated the diversity within and between the subtypes. All strains of STEC O157 in this study had stx1a, stx2a or stx2c or a combination of these three genes. There was over 99% (442/444 concordance between PCR and WGS. When common source strains were excluded, 236/349 strains of STEC O157 had multiple copies of different Stx subtypes and 54 had multiple copies of the same Stx subtype. Of those strains harbouring multiple copies of the same Stx subtype, 33 had variants between the alleles while 21 had identical copies. Strains harbouring Stx2a only were most commonly found to have multiple alleles of the same subtype (42%. Both the PCR and WGS approach to stx subtyping provided a good level of sensitivity and specificity. In addition, the WGS data also showed there were a significant proportion of strains harbouring multiple alleles of the same Stx subtype associated with clinical disease in England.

  7. Accuracy of whole-genome prediction using a genetic architecture-enhanced variance-covariance matrix.

    Science.gov (United States)

    Zhang, Zhe; Erbe, Malena; He, Jinlong; Ober, Ulrike; Gao, Ning; Zhang, Hao; Simianer, Henner; Li, Jiaqi

    2015-02-09

    Obtaining accurate predictions of unobserved genetic or phenotypic values for complex traits in animal, plant, and human populations is possible through whole-genome prediction (WGP), a combined analysis of genotypic and phenotypic data. Because the underlying genetic architecture of the trait of interest is an important factor affecting model selection, we propose a new strategy, termed BLUP|GA (BLUP-given genetic architecture), which can use genetic architecture information within the dataset at hand rather than from public sources. This is achieved by using a trait-specific covariance matrix ( T: ), which is a weighted sum of a genetic architecture part ( S: matrix) and the realized relationship matrix ( G: ). The algorithm of BLUP|GA (BLUP-given genetic architecture) is provided and illustrated with real and simulated datasets. Predictive ability of BLUP|GA was validated with three model traits in a dairy cattle dataset and 11 traits in three public datasets with a variety of genetic architectures and compared with GBLUP and other approaches. Results show that BLUP|GA outperformed GBLUP in 20 of 21 scenarios in the dairy cattle dataset and outperformed GBLUP, BayesA, and BayesB in 12 of 13 traits in the analyzed public datasets. Further analyses showed that the difference of accuracies for BLUP|GA and GBLUP significantly correlate with the distance between the T: and G: matrices. The new strategy applied in BLUP|GA is a favorable and flexible alternative to the standard GBLUP model, allowing to account for the genetic architecture of the quantitative trait under consideration when necessary. This feature is mainly due to the increased similarity between the trait-specific relationship matrix ( T: matrix) and the genetic relationship matrix at unobserved causal loci. Applying BLUP|GA in WGP would ease the burden of model selection. Copyright © 2015 Zhang et al.

  8. Genome-Wide Association Study of HIV Whole Genome Sequences Validated using Drug Resistance

    Science.gov (United States)

    Power, Robert A.; Davaniah, Siva; Derache, Anne; Wilkinson, Eduan; Tanser, Frank; Pillay, Deenan; de Oliveira, Tulio

    2016-01-01

    Background Genome-wide association studies (GWAS) have considerably advanced our understanding of human traits and diseases. With the increasing availability of whole genome sequences (WGS) for pathogens, it is important to establish whether GWAS of viral genomes could reveal important biological insights. Here we perform the first proof of concept viral GWAS examining drug resistance (DR), a phenotype with well understood genetics. Method We performed a GWAS of DR in a sample of 343 HIV subtype C patients failing 1st line antiretroviral treatment in rural KwaZulu-Natal, South Africa. The majority and minority variants within each sequence were called using PILON, and GWAS was performed within PLINK. HIV WGS from patients failing on different antiretroviral treatments were compared to sequences derived from individuals naïve to the respective treatment. Results GWAS methodology was validated by identifying five associations on a genetic level that led to amino acid changes known to cause DR. Further, we highlighted the ability of GWAS to identify epistatic effects, identifying two replicable variants within amino acid 68 of the reverse transcriptase protein previously described as potential fitness compensatory mutations. A possible additional DR variant within amino acid 91 of the matrix region of the Gag protein was associated with tenofovir failure, highlighting GWAS’s ability to identify variants outside classical candidate genes. Our results also suggest a polygenic component to DR. Conclusions These results validate the applicability of GWAS to HIV WGS data even in relative small samples, and emphasise how high throughput sequencing can provide novel and clinically relevant insights. Further they suggested that for viruses like HIV, population structure was only minor concern compared to that seen in bacteria or parasite GWAS. Given the small genome length and reduced burden for multiple testing, this makes HIV an ideal candidate for GWAS. PMID:27677172

  9. Identification of Escherichia coli and Shigella Species from Whole-Genome Sequences.

    Science.gov (United States)

    Chattaway, Marie A; Schaefer, Ulf; Tewolde, Rediat; Dallman, Timothy J; Jenkins, Claire

    2017-02-01

    Escherichia coli and Shigella species are closely related and genetically constitute the same species. Differentiating between these two pathogens and accurately identifying the four species of Shigella are therefore challenging. The organism-specific bioinformatics whole-genome sequencing (WGS) typing pipelines at Public Health England are dependent on the initial identification of the bacterial species by use of a kmer-based approach. Of the 1,982 Escherichia coli and Shigella sp. isolates analyzed in this study, 1,957 (98.4%) had concordant results by both traditional biochemistry and serology (TB&S) and the kmer identification (ID) derived from the WGS data. Of the 25 mismatches identified, 10 were enteroinvasive E. coli isolates that were misidentified as Shigella flexneri or S. boydii by the kmer ID, and 8 were S. flexneri isolates misidentified by TB&S as S. boydii due to nonfunctional S. flexneri O antigen biosynthesis genes. Analysis of the population structure based on multilocus sequence typing (MLST) data derived from the WGS data showed that the remaining discrepant results belonged to clonal complex 288 (CC288), comprising both S. boydii and S. dysenteriae strains. Mismatches between the TB&S and kmer ID results were explained by the close phylogenetic relationship between the two species and were resolved with reference to the MLST data. Shigella can be differentiated from E. coli and accurately identified to the species level by use of kmer comparisons and MLST. Analysis of the WGS data provided explanations for the discordant results between TB&S and WGS data, revealed the true phylogenetic relationships between different species of Shigella, and identified emerging pathoadapted lineages. © Crown copyright 2017.

  10. Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata.

    Directory of Open Access Journals (Sweden)

    Marco Fracassetti

    Full Text Available Sequencing pooled DNA of multiple individuals from a population instead of sequencing individuals separately has become popular due to its cost-effectiveness and simple wet-lab protocol, although some criticism of this approach remains. Here we validated a protocol for pooled whole-genome re-sequencing (Pool-seq of Arabidopsis lyrata libraries prepared with low amounts of DNA (1.6 ng per individual. The validation was based on comparing single nucleotide polymorphism (SNP frequencies obtained by pooling with those obtained by individual-based Genotyping By Sequencing (GBS. Furthermore, we investigated the effect of sample number, sequencing depth per individual and variant caller on population SNP frequency estimates. For Pool-seq data, we compared frequency estimates from two SNP callers, VarScan and Snape; the former employs a frequentist SNP calling approach while the latter uses a Bayesian approach. Results revealed concordance correlation coefficients well above 0.8, confirming that Pool-seq is a valid method for acquiring population-level SNP frequency data. Higher accuracy was achieved by pooling more samples (25 compared to 14 and working with higher sequencing depth (4.1× per individual compared to 1.4× per individual, which increased the concordance correlation coefficient to 0.955. The Bayesian-based SNP caller produced somewhat higher concordance correlation coefficients, particularly at low sequencing depth. We recommend pooling at least 25 individuals combined with sequencing at a depth of 100× to produce satisfactory frequency estimates for common SNPs (minor allele frequency above 0.05.

  11. Clinical application of whole-genome sequencing to inform treatment for multidrug-resistant tuberculosis cases.

    Science.gov (United States)

    Witney, Adam A; Gould, Katherine A; Arnold, Amber; Coleman, David; Delgado, Rachel; Dhillon, Jasvir; Pond, Marcus J; Pope, Cassie F; Planche, Tim D; Stoker, Neil G; Cosgrove, Catherine A; Butcher, Philip D; Harrison, Thomas S; Hinds, Jason

    2015-05-01

    The treatment of drug-resistant tuberculosis cases is challenging, as drug options are limited, and the existing diagnostics are inadequate. Whole-genome sequencing (WGS) has been used in a clinical setting to investigate six cases of suspected extensively drug-resistant Mycobacterium tuberculosis (XDR-TB) encountered at a London teaching hospital between 2008 and 2014. Sixteen isolates from six suspected XDR-TB cases were sequenced; five cases were analyzed in a clinically relevant time frame, with one case sequenced retrospectively. WGS identified mutations in the M. tuberculosis genes associated with antibiotic resistance that are likely to be responsible for the phenotypic resistance. Thus, an evidence base was developed to inform the clinical decisions made around antibiotic treatment over prolonged periods. All strains in this study belonged to the East Asian (Beijing) lineage, and the strain relatedness was consistent with the expectations from the case histories, confirming one contact transmission event. We demonstrate that WGS data can be produced in a clinically relevant time scale some weeks before drug sensitivity testing (DST) data are available, and they actively help clinical decision-making through the assessment of whether an isolate (i) has a particular resistance mutation where there are absent or contradictory DST results, (ii) has no further resistance markers and therefore is unlikely to be XDR, or (iii) is identical to an isolate of known resistance (i.e., a likely transmission event). A small number of discrepancies between the genotypic predictions and phenotypic DST results are discussed in the wider context of the interpretation and reporting of WGS results.

  12. Whole-genome sequencing reveals complex mechanisms of intrinsic resistance to BRAF inhibition.

    Science.gov (United States)

    Turajlic, S; Furney, S J; Stamp, G; Rana, S; Ricken, G; Oduko, Y; Saturno, G; Springer, C; Hayes, A; Gore, M; Larkin, J; Marais, R

    2014-05-01

    BRAF is mutated in ∼42% of human melanomas (COSMIC. http://www.sanger.ac.uk/genetics/CGP/cosmic/) and pharmacological BRAF inhibitors such as vemurafenib and dabrafenib achieve dramatic responses in patients whose tumours harbour BRAF(V600) mutations. Objective responses occur in ∼50% of patients and disease stabilisation in a further ∼30%, but ∼20% of patients present primary or innate resistance and do not respond. Here, we investigated the underlying cause of treatment failure in a patient with BRAF mutant melanoma who presented primary resistance. We carried out whole-genome sequencing and single nucleotide polymorphism (SNP) array analysis of five metastatic tumours from the patient. We validated mechanisms of resistance in a cell line derived from the patient's tumour. We observed that the majority of the single-nucleotide variants identified were shared across all tumour sites, but also saw site-specific copy-number alterations in discrete cell populations at different sites. We found that two ubiquitous mutations mediated resistance to BRAF inhibition in these tumours. A mutation in GNAQ sustained mitogen-activated protein kinase (MAPK) signalling, whereas a mutation in PTEN activated the PI3 K/AKT pathway. Inhibition of both pathways synergised to block the growth of the cells. Our analyses show that the five metastases arose from a common progenitor and acquired additional alterations after disease dissemination. We demonstrate that a distinct combination of mutations mediated primary resistance to BRAF inhibition in this patient. These mutations were present in all five tumours and in a tumour sample taken before BRAF inhibitor treatment was administered. Inhibition of both pathways was required to block tumour cell growth, suggesting that combined targeting of these pathways could have been a valid therapeutic approach for this patient.

  13. Colorectal Cancer and the Human Gut Microbiome: Reproducibility with Whole-Genome Shotgun Sequencing.

    Directory of Open Access Journals (Sweden)

    Emily Vogtmann

    Full Text Available Accumulating evidence indicates that the gut microbiota affects colorectal cancer development, but previous studies have varied in population, technical methods, and associations with cancer. Understanding these variations is needed for comparisons and for potential pooling across studies. Therefore, we performed whole-genome shotgun sequencing on fecal samples from 52 pre-treatment colorectal cancer cases and 52 matched controls from Washington, DC. We compared findings from a previously published 16S rRNA study to the metagenomics-derived taxonomy within the same population. In addition, metagenome-predicted genes, modules, and pathways in the Washington, DC cases and controls were compared to cases and controls recruited in France whose specimens were processed using the same platform. Associations between the presence of fecal Fusobacteria, Fusobacterium, and Porphyromonas with colorectal cancer detected by 16S rRNA were reproduced by metagenomics, whereas higher relative abundance of Clostridia in cancer cases based on 16S rRNA was merely borderline based on metagenomics. This demonstrated that within the same sample set, most, but not all taxonomic associations were seen with both methods. Considering significant cancer associations with the relative abundance of genes, modules, and pathways in a recently published French metagenomics dataset, statistically significant associations in the Washington, DC population were detected for four out of 10 genes, three out of nine modules, and seven out of 17 pathways. In total, colorectal cancer status in the Washington, DC study was associated with 39% of the metagenome-predicted genes, modules, and pathways identified in the French study. More within and between population comparisons are needed to identify sources of variation and disease associations that can be reproduced despite these variations. Future studies should have larger sample sizes or pool data across studies to have sufficient

  14. Whole Genome Sequencing for Surveillance of Antimicrobial Resistance in Actinobacillus pleuropneumoniae

    Science.gov (United States)

    Bossé, Janine T.; Li, Yanwen; Rogers, Jon; Fernandez Crespo, Roberto; Li, Yinghui; Chaudhuri, Roy R.; Holden, Matthew T. G.; Maskell, Duncan J.; Tucker, Alexander W.; Wren, Brendan W.; Rycroft, Andrew N.; Langford, Paul R.

    2017-01-01

    The aim of this study was to evaluate the correlation between antimicrobial resistance (AMR) profiles of 96 clinical isolates of Actinobacillus pleuropneumoniae, an important porcine respiratory pathogen, and the identification of AMR genes in whole genome sequence (wgs) data. Susceptibility of the isolates to nine antimicrobial agents (ampicillin, enrofloxacin, erythromycin, florfenicol, sulfisoxazole, tetracycline, tilmicosin, trimethoprim, and tylosin) was determined by agar dilution susceptibility test. Except for the macrolides tested, elevated MICs were highly correlated to the presence of AMR genes identified in wgs data using ResFinder or BLASTn. Of the isolates tested, 57% were resistant to tetracycline [MIC ≥ 4 mg/L; 94.8% with either tet(B) or tet(H)]; 48% to sulfisoxazole (MIC ≥ 256 mg/L or DD = 6; 100% with sul2), 20% to ampicillin (MIC ≥ 4 mg/L; 100% with blaROB-1), 17% to trimethoprim (MIC ≥ 32 mg/L; 100% with dfrA14), and 6% to enrofloxacin (MIC ≥ 0.25 mg/L; 100% with GyrAS83F). Only 33% of the isolates did not have detectable AMR genes, and were sensitive by MICs for the antimicrobial agents tested. Although 23 isolates had MIC ≥ 32 mg/L for tylosin, all isolates had MIC ≤ 16 mg/L for both erythromycin and tilmicosin, and no macrolide resistance genes or known point mutations were detected. Other than the GyrAS83F mutation, the AMR genes detected were mapped to potential plasmids. In addition to presence on plasmid(s), the tet(B) gene was also found chromosomally either as part of a 56 kb integrative conjugative element (ICEApl1) in 21, or as part of a Tn7 insertion in 15 isolates. Our results indicate that, with the exception of macrolides, wgs data can be used to accurately predict resistance of A. pleuropneumoniae to the tested antimicrobial agents and provides added value for routine surveillance.

  15. Is gene activity in plant cells affected by UMTS-irradiation? A whole genome approach

    Directory of Open Access Journals (Sweden)

    Julia C Engelmann

    2008-10-01

    Full Text Available Julia C Engelmann3,* Rosalia Deeken1,* Tobias Müller3, Günter Nimtz2, M Rob G Roelfsema1, Rainer Hedrich11Molecular Plant Physiology and Biophysics, Julius-von-Sachs Institute for Biosciences; 2Institute of Physics II, University of Cologne, Cologne, Germany; 3Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany; *These authors contributed equally to this workAbstract: Mobile phone technology makes use of radio frequency (RF electromagnetic fields transmitted through a dense network of base stations in Europe. Possible harmful effects of RF fields on humans and animals are discussed, but their effect on plants has received little attention. In search for physiological processes of plant cells sensitive to RF fields, cell suspension cultures of Arabidopsis thaliana were exposed for 24 h to a RF field protocol representing typical microwave exposition in an urban environment. mRNA of exposed cultures and controls was used to hybridize Affymetrix-ATH1 whole genome microarrays. Differential expression analysis revealed significant changes in transcription of 10 genes, but they did not exceed a fold change of 2.5. Besides that 3 of them are dark-inducible, their functions do not point to any known responses of plants to environmental stimuli. The changes in transcription of these genes were compared with published microarray datasets and revealed a weak similarity of the microwave to light treatment experiments. Considering the large changes described in published experiments, it is questionable if the small alterations caused by a 24 h continuous microwave exposure would have any impact on the growth and reproduction of whole plants.Keywords: suspension cultured plant cells, radio frequency electromagnetic fields, microarrays, Arabidopsis thaliana

  16. Whole-genome sequencing overcomes pseudogene homology to diagnose autosomal dominant polycystic kidney disease.

    Science.gov (United States)

    Mallawaarachchi, Amali C; Hort, Yvonne; Cowley, Mark J; McCabe, Mark J; Minoche, André; Dinger, Marcel E; Shine, John; Furlong, Timothy J

    2016-11-01

    Autosomal dominant polycystic kidney disease (ADPKD) is the most common monogenic kidney disorder and is due to disease-causing variants in PKD1 or PKD2. Strong genotype-phenotype correlation exists although diagnostic sequencing is not part of routine clinical practice. This is because PKD1 bears 97.7% sequence similarity with six pseudogenes, requiring laborious and error-prone long-range PCR and Sanger sequencing to overcome. We hypothesised that whole-genome sequencing (WGS) would be able to overcome the problem of this sequence homology, because of 150 bp, paired-end reads and avoidance of capture bias that arises from targeted sequencing. We prospectively recruited a cohort of 28 unique pedigrees with ADPKD phenotype. Standard DNA extraction, library preparation and WGS were performed using Illumina HiSeq X and variants were classified following standard guidelines. Molecular diagnosis was made in 24 patients (86%), with 100% variant confirmation by current gold standard of long-range PCR and Sanger sequencing. We demonstrated unique alignment of sequencing reads over the pseudogene-homologous region. In addition to identifying function-affecting single-nucleotide variants and indels, we identified single- and multi-exon deletions affecting PKD1 and PKD2, which would have been challenging to identify using exome sequencing. We report the first use of WGS to diagnose ADPKD. This method overcomes pseudogene homology, provides uniform coverage, detects all variant types in a single test and is less labour-intensive than current techniques. This technique is translatable to a diagnostic setting, allows clinicians to make better-informed management decisions and has implications for other disease groups that are challenged by regions of confounding sequence homology.

  17. Systematic pharmacogenomics analysis of a Malay whole genome: proof of concept for personalized medicine.

    Directory of Open Access Journals (Sweden)

    Mohd Zaki Salleh

    Full Text Available BACKGROUND: With a higher throughput and lower cost in sequencing, second generation sequencing technology has immense potential for translation into clinical practice and in the realization of pharmacogenomics based patient care. The systematic analysis of whole genome sequences to assess patient to patient variability in pharmacokinetics and pharmacodynamics responses towards drugs would be the next step in future medicine in line with the vision of personalizing medicine. METHODS: Genomic DNA obtained from a 55 years old, self-declared healthy, anonymous male of Malay descent was sequenced. The subject's mother died of lung cancer and the father had a history of schizophrenia and deceased at the age of 65 years old. A systematic, intuitive computational workflow/pipeline integrating custom algorithm in tandem with large datasets of variant annotations and gene functions for genetic variations with pharmacogenomics impact was developed. A comprehensive pathway map of drug transport, metabolism and action was used as a template to map non-synonymous variations with potential functional consequences. PRINCIPAL FINDINGS: Over 3 million known variations and 100,898 novel variations in the Malay genome were identified. Further in-depth pharmacogenetics analysis revealed a total of 607 unique variants in 563 proteins, with the eventual identification of 4 drug transport genes, 2 drug metabolizing enzyme genes and 33 target genes harboring deleterious SNVs involved in pharmacological pathways, which could have a potential role in clinical settings. CONCLUSIONS: The current study successfully unravels the potential of personal genome sequencing in understanding the functionally relevant variations with potential influence on drug transport, metabolism and differential therapeutic outcomes. These will be essential for realizing personalized medicine through the use of comprehensive computational pipeline for systematic data mining and analysis.

  18. Whole-genome analysis of mycobacteria from birds at the San Diego Zoo.

    Science.gov (United States)

    Pfeiffer, Wayne; Braun, Josephine; Burchell, Jennifer; Witte, Carmel L; Rideout, Bruce A

    2017-01-01

    Mycobacteria isolated from more than 100 birds diagnosed with avian mycobacteriosis at the San Diego Zoo and its Safari Park were cultured postmortem and had their whole genomes sequenced. Computational workflows were developed and applied to identify the mycobacterial species in each DNA sample, to find single-nucleotide polymorphisms (SNPs) between samples of the same species, to further differentiate SNPs between as many as three different genotypes within a single sample, and to identify which samples are closely clustered genomically. Nine species of mycobacteria were found in 123 samples from 105 birds. The most common species were Mycobacterium avium and Mycobacterium genavense, which were in 49 and 48 birds, respectively. Most birds contained only a single mycobacterial species, but two birds contained a mixture of two species. The M. avium samples represent diverse strains of M. avium avium and M. avium hominissuis, with many pairs of samples differing by hundreds or thousands of SNPs across their common genome. By contrast, the M. genavense samples are much closer genomically; samples from 46 of 48 birds differ from each other by less than 110 SNPs. Some birds contained two, three, or even four genotypes of the same bacterial species. Such infections were found in 4 of 49 birds (8%) with M. avium and in 11 of 48 birds (23%) with M. genavense. Most were mixed infections, in which the bird was infected by multiple mycobacterial strains, but three infections with two genotypes differing by ≤ 10 SNPs were likely the result of within-host evolution. The samples from 31 birds with M. avium can be grouped into nine clusters within which any sample is ≤ 12 SNPs from at least one other sample in the cluster. Similarly, the samples from 40 birds with M. genavense can be grouped into ten such clusters. Information about these genomic clusters is being used in an ongoing, companion study of mycobacterial transmission to help inform management of bird collections.

  19. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

    Science.gov (United States)

    Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike; van den Berg, Jan; Vriezen, Wim; Janssen, Antoine; Datema, Erwin; Jahrman, Torben; Moquet, Frederic; Bonnet, Julien; Peters, Sander

    2014-10-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies.

  20. A whole genome analyses of genetic variants in two Kelantan Malay individuals.

    Science.gov (United States)

    Wan Juhari, Wan Khairunnisa; Md Tamrin, Nur Aida; Mat Daud, Mohd Hanif Ridzuan; Isa, Hatin Wan; Mohd Nasir, Nurfazreen; Maran, Sathiya; Abdul Rajab, Nur Shafawati; Ahmad Amin Noordin, Khairul Bariah; Nik Hassan, Nik Norliza; Tearle, Rick; Razali, Rozaimi; Merican, Amir Feisal; Zilfalil, Bin Alwi

    2014-12-01

    The sequencing of two members of the Royal Kelantan Malay family genomes will provide insights on the Kelantan Malay whole genome sequences. The two Kelantan Malay genomes were analyzed for the SNP markers associated with thalassemia and Helicobacter pylori infection. Helicobacter pylori infection was reported to be low prevalence in the north-east as compared to the west coast of the Peninsular Malaysia and beta-thalassemia was known to be one of the most common inherited and genetic disorder in Malaysia. By combining SNP information from literatures, GWAS study and NCBI ClinVar, 18 unique SNPs were selected for further analysis. From these 18 SNPs, 10 SNPs came from previous study of Helicobacter pylori infection among Malay patients, 6 SNPs were from NCBI ClinVar and 2 SNPs from GWAS studies. The analysis reveals that both Royal Kelantan Malay genomes shared all the 10 SNPs identified by Maran (Single Nucleotide Polymorphims (SNPs) genotypic profiling of Malay patients with and without Helicobacter pylori infection in Kelantan, 2011) and one SNP from GWAS study. In addition, the analysis also reveals that both Royal Kelantan Malay genomes shared 3 SNP markers; HBG1 (rs1061234), HBB (rs1609812) and BCL11A (rs766432) where all three markers were associated with beta-thalassemia. Our findings suggest that the Royal Kelantan Malays carry the SNPs which are associated with protection to Helicobacter pylori infection. In addition they also carry SNPs which are associated with beta-thalassemia. These findings are in line with the findings by other researchers who conducted studies on thalassemia and Helicobacter pylori infection in the non-royal Malay population.

  1. Antisense oligonucleotides in cancer.

    Science.gov (United States)

    Castanotto, Daniela; Stein, Cy A

    2014-11-01

    Over the past several dozen years, regardless of the substantial effort directed toward developing rational oligonucleotide strategies to silence gene expression, antisense oligonucleotide-based cancer therapy has not been successful. This review focuses on the most likely reasons for this lack of success, and on the barriers that still need to be overcome to make a clinical cancer treatment reality out of the promise of antisense therapy. Considerable progress has been made in the design and delivery of nucleic acid fragments. Chemical modifications have considerably improved oligonucleotide absorption, distribution and metabolism while at the same time reducing toxicity. Nevertheless, the delivery and the cellular uptake of these molecules are still not adequate to provide the desired therapeutic outcome. Recent therapeutic interventional phase III trials of antisense oligodeoxyribonucleotides for a cancer indication will be discussed, in addition to those studies that markedly improve the scientific understanding of the properties of these molecules. We still do not have a marketed antisense oligonucleotide for a cancer indication. This is because critical aspects of the cellular, tumor pharmacology and delivery properties of these agents are still not well understood.

  2. snpTree - a web-server to identify and construct SNP trees from whole genome sequence data

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Kaas, Rolf Sommer; Thomsen, Martin Christen Frølund;

    2012-01-01

    Background The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differe......Background The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis...... from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script. The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evalution results for the first three...

  3. Beyond the Whole-Genome Duplication: Phylogenetic Evidence for an Ancient Interspecies Hybridization in the Baker's Yeast Lineage.

    Directory of Open Access Journals (Sweden)

    Marina Marcet-Houben

    2015-08-01

    Full Text Available Whole-genome duplications have shaped the genomes of several vertebrate, plant, and fungal lineages. Earlier studies have focused on establishing when these events occurred and on elucidating their functional and evolutionary consequences, but we still lack sufficient understanding of how genome duplications first originated. We used phylogenomics to study the ancient genome duplication occurred in the yeast Saccharomyces cerevisiae lineage and found compelling evidence for the existence of a contemporaneous interspecies hybridization. We propose that the genome doubling was a direct consequence of this hybridization and that it served to provide stability to the recently formed allopolyploid. This scenario provides a mechanism for the origin of this ancient duplication and the lineage that originated from it and brings a new perspective to the interpretation of the origin and consequences of whole-genome duplications.

  4. Whole genome sequence and genome annotation of Colletotrichum acutatum, causal agent of anthracnose in pepper plants in South Korea

    Directory of Open Access Journals (Sweden)

    Joon-Hee Han

    2016-06-01

    Full Text Available Colletotrichum acutatum is a destructive fungal pathogen which causes anthracnose in a wide range of crops. Here we report the whole genome sequence and annotation of C. acutatum strain KC05, isolated from an infected pepper in Kangwon, South Korea. Genomic DNA from the KC05 strain was used for the whole genome sequencing using a PacBio sequencer and the MiSeq system. The KC05 genome was determined to be 52,190,760 bp in size with a G + C content of 51.73% in 27 scaffolds and to contain 13,559 genes with an average length of 1516 bp. Gene prediction and annotation were performed by incorporating RNA-Seq data. The genome sequence of the KC05 was deposited at DDBJ/ENA/GenBank under the accession number LUXP00000000.

  5. A novel whole genome amplification method using type IIS restriction enzymes to create overhangs with random sequences.

    Science.gov (United States)

    Pan, Xiaoming; Wan, Baihui; Li, Chunchuan; Liu, Yu; Wang, Jing; Mou, Haijin; Liang, Xingguo

    2014-08-20

    Ligation-mediated polymerase chain reaction (LM-PCR) is a whole genome amplification (WGA) method, for which genomic DNA is cleaved into numerous fragments and then all of the fragments are amplified by PCR after attaching a universal end sequence. However, the self-ligation of these fragments could happen and may cause biased amplification and restriction of its application. To decrease the self-ligation probability, here we use type IIS restriction enzymes to digest genomic DNA into fragments with 4-5nt long overhangs with random sequences. After ligation to an adapter with random end sequences to above fragments, PCR is carried out and almost all present DNA sequences are amplified. In this study, whole genome of Vibrio parahaemolyticus was amplified and the amplification efficiency was evaluated by quantitative PCR. The results suggested that our approach could provide sufficient genomic DNA with good quality to meet requirements of various genetic analyses.

  6. Whole genome sequence and genome annotation of Colletotrichum acutatum, causal agent of anthracnose in pepper plants in South Korea.

    Science.gov (United States)

    Han, Joon-Hee; Chon, Jae-Kyung; Ahn, Jong-Hwa; Choi, Ik-Young; Lee, Yong-Hwan; Kim, Kyoung Su

    2016-06-01

    Colletotrichum acutatum is a destructive fungal pathogen which causes anthracnose in a wide range of crops. Here we report the whole genome sequence and annotation of C. acutatum strain KC05, isolated from an infected pepper in Kangwon, South Korea. Genomic DNA from the KC05 strain was used for the whole genome sequencing using a PacBio sequencer and the MiSeq system. The KC05 genome was determined to be 52,190,760 bp in size with a G + C content of 51.73% in 27 scaffolds and to contain 13,559 genes with an average length of 1516 bp. Gene prediction and annotation were performed by incorporating RNA-Seq data. The genome sequence of the KC05 was deposited at DDBJ/ENA/GenBank under the accession number LUXP00000000.

  7. Whole-genome sequence of Sunxiuqinia dokdonensis DH1T, isolated from deep sub-seafloor sediment in Dokdo Island

    OpenAIRE

    Sooyeon Lim; Dong-Ho Chang; Byoung-Chan Kim

    2016-01-01

    Sunxiuqinia dokdonensis DH1T was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island) in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  8. Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences.

    Science.gov (United States)

    Ganapathiraju, Madhavi K; Mitchell, Asia D; Thahir, Mohamed; Motwani, Kamiya; Ananthasubramanian, Seshan

    2012-12-01

    Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.

  9. Whole-Genome Sequencing of Vibrio cholerae O1 El Tor Strains Isolated in Ukraine (2011) and Russia (2014)

    Science.gov (United States)

    Smirnova, Nina I.; Agafonova, Elena Y.; Shchelkanova, Elena Y.; Alkhova, Zhanna V.; Kutyrev, Vladimir V.

    2017-01-01

    ABSTRACT Here, we present the draft whole-genome sequence of Vibrio cholerae O1 El Tor strains 76 and M3265/80, isolated in Mariupol, Ukraine, and Moscow, Russia. The presence of various mutations detected in virulence-associated mobile elements indicates high genetic similarity of the strains reported here with new highly virulent variants of the cholera agent V. cholerae. PMID:28232438

  10. Psychological and behavioural impact of returning personal results from whole-genome sequencing: the HealthSeq project.

    Science.gov (United States)

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Zinberg, Randi; Wasserstein, Melissa; Kasarskis, Andrew; Diaz, George A; Schadt, Eric E

    2017-02-01

    Providing ostensibly healthy individuals with personal results from whole-genome sequencing could lead to improved health and well-being via enhanced disease risk prediction, prevention, and diagnosis, but also poses practical and ethical challenges. Understanding how individuals react psychologically and behaviourally will be key in assessing the potential utility of personal whole-genome sequencing. We conducted an exploratory longitudinal cohort study in which quantitative surveys and in-depth qualitative interviews were conducted before and after personal results were returned to individuals who underwent whole-genome sequencing. The participants were offered a range of interpreted results, including Alzheimer's disease, type 2 diabetes, pharmacogenomics, rare disease-associated variants, and ancestry. They were also offered their raw data. Of the 35 participants at baseline, 29 (82.9%) completed the 6-month follow-up. In the quantitative surveys, test-related distress was low, although it was higher at 1-week than 6-month follow-up (Z=2.68, P=0.007). In the 6-month qualitative interviews, most participants felt happy or relieved about their results. A few were concerned, particularly about rare disease-associated variants and Alzheimer's disease results. Two of the 29 participants had sought clinical follow-up as a direct or indirect consequence of rare disease-associated variants results. Several had mentioned their results to their doctors. Some participants felt having their raw data might be medically useful to them in the future. The majority reported positive reactions to having their genomes sequenced, but there were notable exceptions to this. The impact and value of returning personal results from whole-genome sequencing when implemented on a larger scale remains to be seen.

  11. Coverage tradeoffs and power estimation in the design of whole-genome sequencing experiments for detecting association

    OpenAIRE

    Shen, Yufeng; Song, Ruijie; Pe'er, Itsik

    2011-01-01

    Motivation: Whole-genome sequencing (WGS) allows direct interrogation of previously undetected uncommon or rare variants, which potentially contribute to the missing heritability of human disease. However, cost of sequencing large numbers of samples limits its application in case–control association studies. Here, we describe theoretical and empirical design considerations for such sequencing studies, aimed at maximizing the power of detecting association under the constraint of study-wide co...

  12. Whole Genome Sequencing and Phylogenetic Analysis of a Historical Collection of Bacillus anthracis Strains from Danish Cattle

    DEFF Research Database (Denmark)

    Derzelle, Sylviane; Girault, Guillaume; Kokotovic, Branko

    2015-01-01

    Bacillus anthracis, the causative agent of anthrax, is known as one of the most genetically monomorphic species. Canonical single-nucleotide polymorphism (SNP) typing and whole-genome sequencing were used to investigate the molecular diversity of eleven B. anthracis strains isolated from cattle......-associated strains responsible for outbreaks of injection anthrax in drug users in Europe. Eight novel diagnostic SNPs that specifically discriminate the different sub-groups of Danish strains were identified and developed into PCR-based genotyping assays....

  13. Whole-Genome Sequencing of Vibrio cholerae O1 El Tor Strains Isolated in Ukraine (2011) and Russia (2014).

    Science.gov (United States)

    Smirnova, Nina I; Krasnov, Yaroslav M; Agafonova, Elena Y; Shchelkanova, Elena Y; Alkhova, Zhanna V; Kutyrev, Vladimir V

    2017-02-23

    Here, we present the draft whole-genome sequence of Vibrio cholerae O1 El Tor strains 76 and M3265/80, isolated in Mariupol, Ukraine, and Moscow, Russia. The presence of various mutations detected in virulence-associated mobile elements indicates high genetic similarity of the strains reported here with new highly virulent variants of the cholera agent V. cholerae. Copyright © 2017 Smirnova et al.

  14. Whole-genome sequence of Sunxiuqinia dokdonensis DH1(T), isolated from deep sub-seafloor sediment in Dokdo Island.

    Science.gov (United States)

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-09-01

    Sunxiuqinia dokdonensis DH1(T) was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island) in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  15. Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis

    OpenAIRE

    Ayele, Mulu; Haas, Brian J.; Kumar, Nikhil; Wu, Hank; Xiao, Yongli; Van Aken, Susan; Utterback, Teresa R.; WORTMAN, Jennifer R.; White, Owen R.; Town, Christopher D

    2005-01-01

    Through comparative studies of the model organism Arabidopsis thaliana and its close relative Brassica oleracea, we have identified conserved regions that represent potentially functional sequences overlooked by previous Arabidopsis genome annotation methods. A total of 454,274 whole genome shotgun sequences covering 283 Mb (0.44×) of the estimated 650 Mb Brassica genome were searched against the Arabidopsis genome, and conserved Arabidopsis genome sequences (CAGSs) were identified. Of these ...

  16. Psychological and behavioural impact of returning personal results from whole-genome sequencing: the HealthSeq project

    Science.gov (United States)

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Zinberg, Randi; Wasserstein, Melissa; Kasarskis, Andrew; Diaz, George A; Schadt, Eric E

    2017-01-01

    Providing ostensibly healthy individuals with personal results from whole-genome sequencing could lead to improved health and well-being via enhanced disease risk prediction, prevention, and diagnosis, but also poses practical and ethical challenges. Understanding how individuals react psychologically and behaviourally will be key in assessing the potential utility of personal whole-genome sequencing. We conducted an exploratory longitudinal cohort study in which quantitative surveys and in-depth qualitative interviews were conducted before and after personal results were returned to individuals who underwent whole-genome sequencing. The participants were offered a range of interpreted results, including Alzheimer's disease, type 2 diabetes, pharmacogenomics, rare disease-associated variants, and ancestry. They were also offered their raw data. Of the 35 participants at baseline, 29 (82.9%) completed the 6-month follow-up. In the quantitative surveys, test-related distress was low, although it was higher at 1-week than 6-month follow-up (Z=2.68, P=0.007). In the 6-month qualitative interviews, most participants felt happy or relieved about their results. A few were concerned, particularly about rare disease-associated variants and Alzheimer's disease results. Two of the 29 participants had sought clinical follow-up as a direct or indirect consequence of rare disease-associated variants results. Several had mentioned their results to their doctors. Some participants felt having their raw data might be medically useful to them in the future. The majority reported positive reactions to having their genomes sequenced, but there were notable exceptions to this. The impact and value of returning personal results from whole-genome sequencing when implemented on a larger scale remains to be seen. PMID:28051073

  17. Whole-genome sequence of Sunxiuqinia dokdonensis DH1T, isolated from deep sub-seafloor sediment in Dokdo Island

    Directory of Open Access Journals (Sweden)

    Sooyeon Lim

    2016-09-01

    Full Text Available Sunxiuqinia dokdonensis DH1T was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  18. Colonization with methicillin-resistant Staphylococcus pseudintermedius in multi-dog households: A longitudinal study using whole genome sequencing.

    Science.gov (United States)

    Windahl, Ulrika; Gren, Joakim; Holst, Bodil S; Börjesson, Stefan

    2016-06-30

    Despite a worldwide increase in the presence of methicillin-resistant Staphylococcus pseudintermedius (MRSP) in dogs and its potential to cause serious canine health problem, the understanding of the transmission and long-term carriage of MRSP is limited. The objective of this study was to investigate the transmission of MRSP to contact dogs living in multiple dog households where one or more of the dogs had been diagnosed with a clinically apparent infection with MRSP. MRSP carriage was investigated over several months in 11 dogs living in four separate multiple dog households where an MRSP infection in a dog had been diagnosed. Whole-genome sequencing was used for genotypic characterization. Contact dogs were only MRSP-positive if the index dog was positive on the same sample occasion. Three contact dogs were consistently MRSP-negative. The data from whole genome sequencing showed similarities between isolates within each family group, indicating that MRSP was transmitted within each family. The results show that the risk of MRSP-colonization in dogs living with an MRSP-infected dog is reduced if the index dog becomes MRSP negative. All of the contact dogs will not carry MRSP continuously during the time the index dog is MRSP-positive. The information yielded from whole genome sequencing showed the methodology to be a promising additional tool in epidemiologic investigations of MRSP transmission.

  19. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data

    Science.gov (United States)

    Roth, Andrew; Khattra, Jaswinder; Ho, Julie; Yap, Damian; Prentice, Leah M.; Melnyk, Nataliya; McPherson, Andrew; Bashashati, Ali; Laks, Emma; Biele, Justina; Ding, Jiarui; Le, Alan; Rosner, Jamie; Shumansky, Karey; Marra, Marco A.; Gilks, C. Blake; Huntsman, David G.; McAlpine, Jessica N.; Aparicio, Samuel

    2014-01-01

    The evolution of cancer genomes within a single tumor creates mixed cell populations with divergent somatic mutational landscapes. Inference of tumor subpopulations has been disproportionately focused on the assessment of somatic point mutations, whereas computational methods targeting evolutionary dynamics of copy number alterations (CNA) and loss of heterozygosity (LOH) in whole-genome sequencing data remain underdeveloped. We present a novel probabilistic model, TITAN, to infer CNA and LOH events while accounting for mixtures of cell populations, thereby estimating the proportion of cells harboring each event. We evaluate TITAN on idealized mixtures, simulating clonal populations from whole-genome sequences taken from genomically heterogeneous ovarian tumor sites collected from the same patient. In addition, we show in 23 whole genomes of breast tumors that the inference of CNA and LOH using TITAN critically informs population structure and the nature of the evolving cancer genome. Finally, we experimentally validated subclonal predictions using fluorescence in situ hybridization (FISH) and single-cell sequencing from an ovarian cancer patient sample, thereby recapitulating the key modeling assumptions of TITAN. PMID:25060187

  20. Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery

    Directory of Open Access Journals (Sweden)

    Stothard Paul

    2011-11-01

    Full Text Available Abstract Background One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle. Results The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs, 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs. Ten

  1. Whole genome SNP discovery and analysis of genetic diversity in Turkey (Meleagris gallopavo

    Directory of Open Access Journals (Sweden)

    Aslam Muhammad L

    2012-08-01

    whole genome SNP discovery study in turkey resulted in the detection of 5.49 million putative SNPs compared to the reference genome. All commercial lines appear to share a common origin. Presence of different alleles/haplotypes in the SM population highlights that specific haplotypes have been selected in the modern domesticated turkey.

  2. Clusters versus affinity-based approaches in F. tularensis whole genome search of CTL epitopes.

    Directory of Open Access Journals (Sweden)

    Anat Zvi

    Full Text Available Deciphering the cellular immunome of a bacterial pathogen is challenging due to the enormous number of putative peptidic determinants. State-of-the-art prediction methods developed in recent years enable to significantly reduce the number of peptides to be screened, yet the number of remaining candidates for experimental evaluation is still in the range of ten-thousands, even for a limited coverage of MHC alleles. We have recently established a resource-efficient approach for down selection of candidates and enrichment of true positives, based on selection of predicted MHC binders located in high density "hotspots" of putative epitopes. This cluster-based approach was applied to an unbiased, whole genome search of Francisella tularensis CTL epitopes and was shown to yield a 17-25 fold higher level of responders as compared to randomly selected predicted epitopes tested in Kb/Db C57BL/6 mice. In the present study, we further evaluate the cluster-based approach (down to a lower density range and compare this approach to the classical affinity-based approach by testing putative CTL epitopes with predicted IC(50 values of <10 nM. We demonstrate that while the percent of responders achieved by both approaches is similar, the profile of responders is different, and the predicted binding affinity of most responders in the cluster-based approach is relatively low (geometric mean of 170 nM, rendering the two approaches complimentary. The cluster-based approach is further validated in BALB/c F. tularensis immunized mice belonging to another allelic restriction (Kd/Dd group. To date, the cluster-based approach yielded over 200 novel F. tularensis peptides eliciting a cellular response, all were verified as MHC class I binders, thereby substantially increasing the F. tularensis dataset of known CTL epitopes. The generality and power of the high density cluster-based approach suggest that it can be a valuable tool for identification of novel CTLs in

  3. Genomic Context of Azole Resistance Mutations in Aspergillus fumigatus Determined Using Whole-Genome Sequencing.

    Science.gov (United States)

    Abdolrasouli, Alireza; Rhodes, Johanna; Beale, Mathew A; Hagen, Ferry; Rogers, Thomas R; Chowdhary, Anuradha; Meis, Jacques F; Armstrong-James, Darius; Fisher, Matthew C

    2015-06-02

    A rapid and global emergence of azole resistance has been observed in the pathogenic fungus Aspergillus fumigatus over the past decade. The dominant resistance mechanism appears to be of environmental origin and involves mutations in the cyp51A gene, which encodes a protein targeted by triazole antifungal drugs. Whole-genome sequencing (WGS) was performed for high-resolution single-nucleotide polymorphism (SNP) analysis of 24 A. fumigatus isolates, including azole-resistant and susceptible clinical and environmental strains obtained from India, the Netherlands, and the United Kingdom, in order to assess the utility of WGS for characterizing the alleles causing resistance. WGS analysis confirmed that TR34/L98H (a mutation comprising a tandem repeat [TR] of 34 bases in the promoter of the cyp51A gene and a leucine-to-histidine change at codon 98) is the sole mechanism of azole resistance among the isolates tested in this panel of isolates. We used population genomic analysis and showed that A. fumigatus was panmictic, with as much genetic diversity found within a country as is found between continents. A striking exception to this was shown in India, where isolates are highly related despite being isolated from both clinical and environmental sources across >1,000 km; this broad occurrence suggests a recent selective sweep of a highly fit genotype that is associated with the TR34/L98H allele. We found that these sequenced isolates are all recombining, showing that azole-resistant alleles are segregating into diverse genetic backgrounds. Our analysis delineates the fundamental population genetic parameters that are needed to enable the use of genome-wide association studies to identify the contribution of SNP diversity to the generation and spread of azole resistance in this medically important fungus. Resistance to azoles in the ubiquitous ascomycete fungus A. fumigatus was first reported from clinical isolates collected in the United States during the late 1980s

  4. Assessment of the Utility of Whole Genome Sequencing of Measles Virus in the Characterisation of Outbreaks.

    Directory of Open Access Journals (Sweden)

    Ana Raquel Penedos

    Full Text Available Measles is a highly infectious disease caused by measles virus (MeV. Despite the availability of a safe and cost-effective vaccine, measles is one of the world-leading causes of death in young children. Within Europe, there is a target for eliminating endemic measles in 2015, with molecular epidemiology required on 80% of cases for inclusion/exclusion of outbreak transmission chains. Currently, MeV is genotyped on the basis of a 450 nucleotide region of the nucleoprotein gene (N-450 and the hemagglutinin gene (H. However, this is not sufficiently informative for distinguishing endemic from imported MeV. We have developed an amplicon-based method for obtaining whole genome sequences (WGS using NGS or Sanger methodologies from cell culture isolates or oral fluid specimens, and have sequenced over 60 samples, including 42 from the 2012 outbreak in the UK.Overall, NGS coverage was over 90% for approximately 71% of the samples tested. Analysis of 32 WGS excluding 3' and 5' termini (WGS-t obtained from the outbreak indicates that the single nucleotide difference found between the two major groups of N-450 sequences detected during the outbreak is most likely a result of stochastic viral mutation during endemic transmission rather than of multiple importation events: earlier strains appear to have evolved into two distinct strain clusters in 2013, one containing strains with both outbreak-associated N-450 sequences. Additionally, phylogenetic analysis of each genomic region of MeV for the strains in this study suggests that the most information is acquired from the non-coding region located between the matrix and fusion protein genes (M/F NCR and the N-450 genotyping sequence, an observation supported by entropy analysis across genotypes.We suggest that both M/F NCR and WGS-t could be used to complement the information from classical epidemiology and N-450 sequencing to address specific questions in the context of measles elimination.

  5. Whole Genome Sequencing Based Characterization of Extensively Drug-Resistant Mycobacterium tuberculosis Isolates from Pakistan

    KAUST Repository

    Ali, Asho

    2015-02-26

    Improved molecular diagnostic methods for detection drug resistance in Mycobacterium tuberculosis (MTB) strains are required. Resistance to first- and second- line anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs) in particular genes. However, these SNPs can vary between MTB lineages therefore local data is required to describe different strain populations. We used whole genome sequencing (WGS) to characterize 37 extensively drug-resistant (XDR) MTB isolates from Pakistan and investigated 40 genes associated with drug resistance. Rifampicin resistance was attributable to SNPs in the rpoB hot-spot region. Isoniazid resistance was most commonly associated with the katG codon 315 (92%) mutation followed by inhA S94A (8%) however, one strain did not have SNPs in katG, inhA or oxyR-ahpC. All strains were pyrazimamide resistant but only 43% had pncA SNPs. Ethambutol resistant strains predominantly had embB codon 306 (62%) mutations, but additional SNPs at embB codons 406, 378 and 328 were also present. Fluoroquinolone resistance was associated with gyrA 91-94 codons in 81% of strains; four strains had only gyr B mutations, while others did not have SNPs in either gyrA or gyrB. Streptomycin resistant strains had mutations in ribosomal RNA genes; rpsL codon 43 (42%); rrs 500 region (16%), and gidB (34%) while six strains did not have mutations in any of these genes. Amikacin/kanamycin/capreomycin resistance was associated with SNPs in rrs at nt1401 (78%) and nt1484 (3%), except in seven (19%) strains. We estimate that if only the common hot-spot region targets of current commercial assays were used, the concordance between phenotypic and genotypic testing for these XDR strains would vary between rifampicin (100%), isoniazid (92%), flouroquinolones (81%), aminoglycoside (78%) and ethambutol (62%); while pncA sequencing would provide genotypic resistance in less than half the isolates. This work highlights the importance of expanded

  6. Whole genome transcript profiling from fingerstick blood samples: a comparison and feasibility study

    Directory of Open Access Journals (Sweden)

    Williams Adam R

    2009-12-01

    Full Text Available Abstract Background Whole genome gene expression profiling has revolutionized research in the past decade especially with the advent of microarrays. Recently, there have been significant improvements in whole blood RNA isolation techniques which, through stabilization of RNA at the time of sample collection, avoid bias and artifacts introduced during sample handling. Despite these improvements, current human whole blood RNA stabilization/isolation kits are limited by the requirement of a venous blood sample of at least 2.5 mL. While fingerstick blood collection has been used for many different assays, there has yet to be a kit developed to isolate high quality RNA for use in gene expression studies from such small human samples. The clinical and field testing advantages of obtaining reliable and reproducible gene expression data from a fingerstick are many; it is less invasive, time saving, more mobile, and eliminates the need of a trained phlebotomist. Furthermore, this method could also be employed in small animal studies, i.e. mice, where larger sample collections often require sacrificing the animal. In this study, we offer a rapid and simple method to extract sufficient amounts of high quality total RNA from approximately 70 μl of whole blood collected via a fingerstick using a modified protocol of the commercially available Qiagen PAXgene RNA Blood Kit. Results From two sets of fingerstick collections, about 70 uL whole blood collected via finger lancet and capillary tube, we recovered an average of 252.6 ng total RNA with an average RIN of 9.3. The post-amplification yields for 50 ng of total RNA averaged at 7.0 ug cDNA. The cDNA hybridized to Affymetrix HG-U133 Plus 2.0 GeneChips had an average % Present call of 52.5%. Both fingerstick collections were highly correlated with r2 values ranging from 0.94 to 0.97. Similarly both fingerstick collections were highly correlated to the venous collection with r2 values ranging from 0.88 to 0

  7. High resolution measurement of DUF1220 domain copy number from whole genome sequence data.

    Science.gov (United States)

    Astling, David P; Heft, Ilea E; Jones, Kenneth L; Sikela, James M

    2017-08-14

    DUF1220 protein domains found primarily in Neuroblastoma BreakPoint Family (NBPF) genes show the greatest human lineage-specific increase in copy number of any coding region in the genome. There are 302 haploid copies of DUF1220 in hg38 (~160 of which are human-specific) and the majority of these can be divided into 6 different subtypes (referred to as clades). Copy number changes of specific DUF1220 clades have been associated in a dose-dependent manner with brain size variation (both evolutionarily and within the human population), cognitive aptitude, autism severity, and schizophrenia severity. However, no published methods can directly measure copies of DUF1220 with high accuracy and no method can distinguish between domains within a clade. Here we describe a novel method for measuring copies of DUF1220 domains and the NBPF genes in which they are found from whole genome sequence data. We have characterized the effect that various sequencing and alignment parameters and strategies have on the accuracy and precision of the method and defined the parameters that lead to optimal DUF1220 copy number measurement and resolution. We show that copy number estimates obtained using our read depth approach are highly correlated with those generated by ddPCR for three representative DUF1220 clades. By simulation, we demonstrate that our method provides sufficient resolution to analyze DUF1220 copy number variation at three levels: (1) DUF1220 clade copy number within individual genes and groups of genes (gene-specific clade groups) (2) genome wide DUF1220 clade copies and (3) gene copy number for DUF1220-encoding genes. To our knowledge, this is the first method to accurately measure copies of all six DUF1220 clades and the first method to provide gene specific resolution of these clades. This allows one to discriminate among the ~300 haploid human DUF1220 copies to an extent not possible with any other method. The result is a greatly enhanced capability to analyze the

  8. Whole genome analysis of p38 SAPK-mediated gene expression upon stress

    Directory of Open Access Journals (Sweden)

    Lopez-Bigas Nuria

    2010-03-01

    Full Text Available Abstract Background Cells have the ability to respond and adapt to environmental changes through activation of stress-activated protein kinases (SAPKs. Although p38 SAPK signalling is known to participate in the regulation of gene expression little is known on the molecular mechanisms used by this SAPK to regulate stress-responsive genes and the overall set of genes regulated by p38 in response to different stimuli. Results Here, we report a whole genome expression analyses on mouse embryonic fibroblasts (MEFs treated with three different p38 SAPK activating-stimuli, namely osmostress, the cytokine TNFα and the protein synthesis inhibitor anisomycin. We have found that the activation kinetics of p38α SAPK in response to these insults is different and also leads to a complex gene pattern response specific for a given stress with a restricted set of overlapping genes. In addition, we have analysed the contribution of p38α the major p38 family member present in MEFs, to the overall stress-induced transcriptional response by using both a chemical inhibitor (SB203580 and p38α deficient (p38α-/- MEFs. We show here that p38 SAPK dependency ranged between 60% and 88% depending on the treatments and that there is a very good overlap between the inhibitor treatment and the ko cells. Furthermore, we have found that the dependency of SAPK varies depending on the time the cells are subjected to osmostress. Conclusions Our genome-wide transcriptional analyses shows a selective response to specific stimuli and a restricted common response of up to 20% of the stress up-regulated early genes that involves an important set of transcription factors, which might be critical for either cell adaptation or preparation for continuous extra-cellular changes. Interestingly, up to 85% of the up-regulated genes are under the transcriptional control of p38 SAPK. Thus, activation of p38 SAPK is critical to elicit the early gene expression program required for cell

  9. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA.

    Directory of Open Access Journals (Sweden)

    Jesper Buchhave Poulsen

    Full Text Available Stored neonatal dried blood spot (DBS samples from neonatal screening programmes are a valuable diagnostic and research resource. Combined with information from national health registries they can be used in population-based studies of genetic diseases. DNA extracted from neonatal DBSs can be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA. Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject we analysed a neonatal DBS sample and corresponding adult whole-blood (WB reference sample. Different DNA sample types were prepared for each of the subjects. Pilot 1: wgaDNA of 2x3.2mm neonatal DBSs (DBS_2x3.2 and raw DNA extract of the WB reference sample (WB_ref. Pilot 2: DBS_2x3.2, WB_ref and a WB_ref replica sharing DNA extract with the WB_ref sample. Pilot 3: DBS_2x3.2, WB_ref, wgaDNA of 2x1.6 mm neonatal DBSs and wgaDNA of the WB reference sample. Following sequencing and data analysis, we compared pairwise variant calls to obtain a measure of similarity-the concordance rate. Concordance rates were slightly lower when comparing DBS vs WB sample types than for any two WB sample types of the same subject before filtering of the variant calls. The overall concordance rates were dependent on the variant type, with SNPs performing best. Post-filtering, the comparisons of DBS vs WB and WB vs WB sample types yielded similar concordance rates, with values close to 100%. WgaDNA of neonatal DBS samples performs with great accuracy and efficiency in exome sequencing. The wgaDNA performed similarly to matched high-quality reference-whole-blood DNA-based on concordance rates calculated from variant calls. No differences were observed substituting 2x3.2 with 2x1.6 mm discs, allowing for additional reduction of sample material in future projects.

  10. Rapid, comprehensive, and affordable mycobacterial diagnosis with whole-genome sequencing: a prospective study

    Science.gov (United States)

    Pankhurst, Louise J; del Ojo Elias, Carlos; Votintseva, Antonina A; Walker, Timothy M; Cole, Kevin; Davies, Jim; Fermont, Jilles M; Gascoyne-Binzi, Deborah M; Kohl, Thomas A; Kong, Clare; Lemaitre, Nadine; Niemann, Stefan; Paul, John; Rogers, Thomas R; Roycroft, Emma; Smith, E Grace; Supply, Philip; Tang, Patrick; Wilcox, Mark H; Wordsworth, Sarah; Wyllie, David; Xu, Li; Crook, Derrick W

    2016-01-01

    Summary Background Slow and cumbersome laboratory diagnostics for Mycobacterium tuberculosis complex (MTBC) risk delayed treatment and poor patient outcomes. Whole-genome sequencing (WGS) could potentially provide a rapid and comprehensive diagnostic solution. In this prospective study, we compare real-time WGS with routine MTBC diagnostic workflows. Methods We compared sequencing mycobacteria from all newly positive liquid cultures with routine laboratory diagnostic workflows across eight laboratories in Europe and North America for diagnostic accuracy, processing times, and cost between Sept 6, 2013, and April 14, 2014. We sequenced specimens once using local Illumina MiSeq platforms and processed data centrally using a semi-automated bioinformatics pipeline. We identified species or complex using gene presence or absence, predicted drug susceptibilities from resistance-conferring mutations identified from reference-mapped MTBC genomes, and calculated genetic distance to previously sequenced UK MTBC isolates to detect outbreaks. WGS data processing and analysis was done by staff masked to routine reference laboratory and clinical results. We also did a microcosting analysis to assess the financial viability of WGS-based diagnostics. Findings Compared with routine results, WGS predicted species with 93% (95% CI 90–96; 322 of 345 specimens; 356 mycobacteria specimens submitted) accuracy and drug susceptibility also with 93% (91–95; 628 of 672 specimens; 168 MTBC specimens identified) accuracy, with one sequencing attempt. WGS linked 15 (16% [95% CI 10–26]) of 91 UK patients to an outbreak. WGS diagnosed a case of multidrug-resistant tuberculosis before routine diagnosis was completed and discovered a new multidrug-resistant tuberculosis cluster. Full WGS diagnostics could be generated in a median of 9 days (IQR 6–10), a median of 21 days (IQR 14–32) faster than final reference laboratory reports were produced (median of 31 days [IQR 21–44]), at a cost

  11. Whole-genome sequence and analysis of Xanthomonas euvesicatoria strains and reassessment of the species

    Directory of Open Access Journals (Sweden)

    Jeri D. Barak

    2016-12-01

    Full Text Available Multiple species of Xanthomonas cause bacterial spot of tomato (BST and pepper. We sequenced five Xanthomonas euvesicatoria strains isolated from three continents (Africa, Asia, and South America to provide a set of representative genomes with temporal and geographic diversity. LMG strains 667, 905, 909, and 933 were pathogenic on tomato and pepper, except LMG 918 which was pathogenic on pepper but elicited a hypersensitive reaction (HR on tomato. Furthermore, LMG 667, 909, and 918 elicited a HR on Early Cal Wonder 30R containing Bs3. We examined pectolytic activity and starch hydrolysis, two tests which are useful in differentiating X. euvesicatoria from X. perforans, both causal agents of BST. LMG strains 905, 909, 918, and 933 were nonpectolytic while only LMG 918 was amylolytic. These results suggest that these strains are all atypical to both X. euvesicatoria and X. perforans. Sequence analysis of all the publicly available X. euvesicatoria and X. perforans strains comparing seven housekeeping genes identified seven haplotypes with few polymorphisms. Whole genome comparison by average nucleotide identity (ANI resulted in values of >99% among the LMG strains 667, 905, 909, 918, and 933 and X. euvesicatoria strains and >99.6% among the LMG strains and a subset of X. perforans strains. These results suggest that X. euvesicatoria and X. perforans should be considered a single species. ANI values between strains of X. euvesicatoria, X. perforans, X. allii, X. alfalfa subsp. citrumelonis, X. dieffenbachiae, and a recently described pathogen of rose were >97.8% suggesting these pathogens should be a single species and recognized as X. euvesicatoria as well. Analysis of the newly sequenced X. euvesicatoria strains revealed interesting findings among the type 3 (T3 effectors, relatively ancient stepwise erosion of some T3 effectors, additional X. euvesicatoria-specific T3 effectors among the causal agents of BST, orthologs of avrBs3 and avrBs4, and

  12. Whole-genome sequencing of individuals from a founder population identifies candidate genes for asthma.

    Science.gov (United States)

    Campbell, Catarina D; Mohajeri, Kiana; Malig, Maika; Hormozdiari, Fereydoun; Nelson, Benjamin; Du, Gaixin; Patterson, Kristen M; Eng, Celeste; Torgerson, Dara G; Hu, Donglei; Herman, Catherine; Chong, Jessica X; Ko, Arthur; O'Roak, Brian J; Krumm, Niklas; Vives, Laura; Lee, Choli; Roth, Lindsey A; Rodriguez-Cintron, William; Rodriguez-Santana, Jose; Brigino-Buenaventura, Emerita; Davis, Adam; Meade, Kelley; LeNoir, Michael A; Thyne, Shannon; Jackson, Daniel J; Gern, James E; Lemanske, Robert F; Shendure, Jay; Abney, Mark; Burchard, Esteban G; Ober, Carole; Eichler, Evan E

    2014-01-01

    Asthma is a complex genetic disease caused by a combination of genetic and environmental risk factors. We sought to test classes of genetic variants largely missed by genome-wide association studies (GWAS), including copy number variants (CNVs) and low-frequency variants, by performing whole-genome sequencing (WGS) on 16 individuals from asthma-enriched and asthma-depleted families. The samples were obtained from an extended 13-generation Hutterite pedigree with reduced genetic heterogeneity due to a small founding gene pool and reduced environmental heterogeneity as a result of a communal lifestyle. We sequenced each individual to an average depth of 13-fold, generated a comprehensive catalog of genetic variants, and tested the most severe mutations for association with asthma. We identified and validated 1960 CNVs, 19 nonsense or splice-site single nucleotide variants (SNVs), and 18 insertions or deletions that were out of frame. As follow-up, we performed targeted sequencing of 16 genes in 837 cases and 540 controls of Puerto Rican ancestry and found that controls carry a significantly higher burden of mutations in IL27RA (2.0% of controls; 0.23% of cases; nominal p = 0.004; Bonferroni p = 0.21). We also genotyped 593 CNVs in 1199 Hutterite individuals. We identified a nominally significant association (p = 0.03; Odds ratio (OR) = 3.13) between a 6 kbp deletion in an intron of NEDD4L and increased risk of asthma. We genotyped this deletion in an additional 4787 non-Hutterite individuals (nominal p = 0.056; OR = 1.69). NEDD4L is expressed in bronchial epithelial cells, and conditional knockout of this gene in the lung in mice leads to severe inflammation and mucus accumulation. Our study represents one of the early instances of applying WGS to complex disease with a large environmental component and demonstrates how WGS can identify risk variants, including CNVs and low-frequency variants, largely untested in GWAS.

  13. Genetics professionals' opinions of whole-genome sequencing in the newborn period.

    Science.gov (United States)

    Ulm, Elizabeth; Feero, W Gregory; Dineen, Richard; Charrow, Joel; Wicklund, Catherine

    2015-06-01

    Newborn screening (NBS) programs have been successful in identifying infants with rare, treatable, congenital conditions. While current programs rely largely on biochemical analysis, some predict that in the future, genome sequencing may be used as an adjunct. The purpose of this exploratory pilot study was to begin to characterize genetics professionals' opinions of the use of whole-genome sequencing (WGS) in NBS. We surveyed members of the American College of Medical Genetics and Genomics (ACMG) via an electronic survey distributed through email. The survey included questions about results disclosure, the current NBS paradigm, and the current criteria for adding a condition to the screening panel. The response rate was 7.3 % (n = 113/1549). The majority of respondents (85 %, n = 96/113) felt that WGS should not be currently used in NBS, and that if it were used, it should not be mandatory (86.5 %, n = 96/111). However, 75.7 % (n = 84/111) foresee it as a future use of WGS. Respondents felt that accurate interpretation of results (86.5 %, n = 83/96), a more extensive consent process (72.6 %, n = 69/95), pre- (79.2 %, n = 76/96) and post-test (91.6 %, n = 87/95) counseling, and comparable costs (70.8 %, n = 68/96) and turn-around-times (64.6 %, n = 62/96) to current NBS would be important for using WGS in NBS. Participants were in favor of disclosing most types of results at some point in the lifetime. However, the majority (87.3 %, n = 96/110) also indicated that parents should be able to choose what results are disclosed. Overall, respondents foresee NBS as a future use of WGS, but indicated that WGS should not occur within the framework of traditional NBS. They agreed with the current criteria for including a condition on the recommended uniform screening panel (RUSP). Further discussion about these criteria is needed in order to better understand how they could be utilized if WGS is incorporated into NBS.

  14. Whole-Genome Analysis of Antimicrobial-Resistant and Extraintestinal Pathogenic Escherichia coli in River Water.

    Science.gov (United States)

    Gomi, Ryota; Matsuda, Tomonari; Matsumura, Yasufumi; Yamamoto, Masaki; Tanaka, Michio; Ichiyama, Satoshi; Yoneda, Minoru

    2017-03-01

    Contamination of surface waters by antimicrobial-resistant bacteria and pathogenic bacteria is a great concern. In this study, 531 Escherichia coli isolates obtained from the Yamato River in Japan were evaluated phenotypically for resistance to 25 antimicrobials. Seventy-six isolates (14.3%) were multidrug resistant (MDR), 66 (12.4%) were nonsusceptible to one or two classes of agents, and 389 (73.3%) were susceptible. We performed whole-genome sequencing of selected strains by using Illumina technology. In total, the genome sequences of 155 strains were analyzed for antibiotic resistance determinants and phylogenetic characteristics. More than 50 different resistance determinants, including acquired resistance genes and chromosomal resistance mutations, were detected. Among the sequenced MDR strains (n = 66), sequence type 155 (ST155) complex (n = 9), ST10 complex (n = 9), and ST69 complex (n = 7) were prevalent. Among extraintestinal pathogenic E. coli (ExPEC) strains (n = 58), clinically important clonal groups, namely, ST95 complex (n = 18), ST127 complex (n = 8), ST12 complex (n = 6), ST14 complex (n = 6), and ST131 complex (n = 6), were prevalent, demonstrating the clonal distribution of environmental ExPEC strains. Typing of the fimH (type 1 fimbrial adhesin) gene revealed that ST131 complex strains carried fimH22 or fimH41, and no strains belonging to the fimH30 subgroup were detected. Fine-scale phylogenetic analysis and virulence gene content analysis of strains belonging to the ST95 complex (one of the major clonal ExPEC groups causing community-onset infections) revealed no significant differences between environmental and clinical strains. The results indicate contamination of surface waters by E. coli strains belonging to clinically important clonal groups.IMPORTANCE The prevalence of antimicrobial-resistant and pathogenic E. coli strains in surface waters is a concern because surface waters are used as sources for drinking water, irrigation, and

  15. Whole Genome Association Studies of Residual Feed Intake and Related Traits in the Pig.

    Directory of Open Access Journals (Sweden)

    Suneel K Onteru

    Full Text Available Residual feed intake (RFI, a measure of feed efficiency, is the difference between observed feed intake and the expected feed requirement predicted from growth and maintenance. Pigs with low RFI have reduced feed costs without compromising their growth. Identification of genes or genetic markers associated with RFI will be useful for marker-assisted selection at an early age of animals with improved feed efficiency.Whole genome association studies (WGAS for RFI, average daily feed intake (ADFI, average daily gain (ADG, back fat (BF and loin muscle area (LMA were performed on 1,400 pigs from the divergently selected ISU-RFI lines, using the Illumina PorcineSNP60 BeadChip. Various statistical methods were applied to find SNPs and genomic regions associated with the traits, including a Bayesian approach using GenSel software, and frequentist approaches such as allele frequency differences between lines, single SNP and haplotype analyses using PLINK software. Single SNP and haplotype analyses showed no significant associations (except for LMA after genomic control and FDR. Bayesian analyses found at least 2 associations for each trait at a false positive probability of 0.5. At generation 8, the RFI selection lines mainly differed in allele frequencies for SNPs near (<0.05 Mb genes that regulate insulin release and leptin functions. The Bayesian approach identified associations of genomic regions containing insulin release genes (e.g., GLP1R, CDKAL, SGMS1 with RFI and ADFI, of regions with energy homeostasis (e.g., MC4R, PGM1, GPR81 and muscle growth related genes (e.g., TGFB1 with ADG, and of fat metabolism genes (e.g., ACOXL, AEBP1 with BF. Specifically, a very highly significantly associated QTL for LMA on SSC7 with skeletal myogenesis genes (e.g., KLHL31 was identified for subsequent fine mapping.Important genomic regions associated with RFI related traits were identified for future validation studies prior to their incorporation in marker

  16. Whole Genome Association Studies of Residual Feed Intake and Related Traits in the Pig.

    Science.gov (United States)

    Onteru, Suneel K; Gorbach, Danielle M; Young, Jennifer M; Garrick, Dorian J; Dekkers, Jack C M; Rothschild, Max F

    2013-01-01

    Residual feed intake (RFI), a measure of feed efficiency, is the difference between observed feed intake and the expected feed requirement predicted from growth and maintenance. Pigs with low RFI have reduced feed costs without compromising their growth. Identification of genes or genetic markers associated with RFI will be useful for marker-assisted selection at an early age of animals with improved feed efficiency. Whole genome association studies (WGAS) for RFI, average daily feed intake (ADFI), average daily gain (ADG), back fat (BF) and loin muscle area (LMA) were performed on 1,400 pigs from the divergently selected ISU-RFI lines, using the Illumina PorcineSNP60 BeadChip. Various statistical methods were applied to find SNPs and genomic regions associated with the traits, including a Bayesian approach using GenSel software, and frequentist approaches such as allele frequency differences between lines, single SNP and haplotype analyses using PLINK software. Single SNP and haplotype analyses showed no significant associations (except for LMA) after genomic control and FDR. Bayesian analyses found at least 2 associations for each trait at a false positive probability of 0.5. At generation 8, the RFI selection lines mainly differed in allele frequencies for SNPs near (<0.05 Mb) genes that regulate insulin release and leptin functions. The Bayesian approach identified associations of genomic regions containing insulin release genes (e.g., GLP1R, CDKAL, SGMS1) with RFI and ADFI, of regions with energy homeostasis (e.g., MC4R, PGM1, GPR81) and muscle growth related genes (e.g., TGFB1) with ADG, and of fat metabolism genes (e.g., ACOXL, AEBP1) with BF. Specifically, a very highly significantly associated QTL for LMA on SSC7 with skeletal myogenesis genes (e.g., KLHL31) was identified for subsequent fine mapping. Important genomic regions associated with RFI related traits were identified for future validation studies prior to their incorporation in marker

  17. Characterization of Genomic Variants Associated with Scout and Recruit Behavioral Castes in Honey Bees Using Whole-Genome Sequencing: e0146430

    National Research Council Canada - National Science Library

    Bruce R Southey; Ping Zhu; Morgan K Carr-Markell; Zhengzheng S Liang; Amro Zayed; Ruiqiang Li; Gene E Robinson; Sandra L Rodriguez-Zas

    2016-01-01

    .... Whole-genome sequencing of 44 Apis mellifera scouts and recruits was undertaken to detect variants and further understand the genetic architecture underlying the behavioral differences between scouts and recruits...

  18. A comparison of alternative 60-mer probe designs in an in-situ synthesized oligonucleotide microarray

    Directory of Open Access Journals (Sweden)

    Fairbanks Benjamin D

    2006-04-01

    Full Text Available Abstract Background DNA microarrays have proven powerful for functional genomics studies. Several technologies exist for the generation of whole-genome arrays. It is well documented that 25mer probes directed against different regions of the same gene produce variable signal intensity values. However, the extent to which this is true for probes of greater length (60mers is not well characterized. Moreover, this information has not previously been reported for whole-genome arrays designed against bacteria, whose genomes may differ substantially in characteristics directly affecting microarray performance. Results We report here an analysis of alternative 60mer probe designs for an in-situ synthesized oligonucleotide array for the GC rich, β-proteobacterium Burkholderia cenocepacia. Probes were designed using the ArrayOligoSel3.5 software package and whole-genome microarrays synthesized by Agilent, Inc. using their in-situ, ink-jet technology platform. We first validated the quality of the microarrays as demonstrated by an average signal to noise ratio of >1000. Next, we determined that the variance of replicate probes (1178 total probes examined of identical sequence was 3.8% whereas the variance of alternative probes (558 total alternative probes examined designs was 9.5%. We determined that depending upon the definition, about 2.4% of replicate and 7.8% of alternative probes produced outlier conclusions. Finally, we determined none of the probe design subscores (GC content, internal repeat, binding energy and self annealment produced by ArrayOligoSel3.5 were predictive or probes that produced outlier signals. Conclusion Our analysis demonstrated that the use of multiple probes per target sequence is not essential for in-situ synthesized 60mer oligonucleotide arrays designed against bacteria. Although probes producing outlier signals were identified, the use of ratios results in less than 10% of such outlier conclusions. We also determined that

  19. Whole Genome Analysis of Injectional Anthrax Identifies Two Disease Clusters Spanning More Than 13 Years.

    Science.gov (United States)

    Keim, Paul; Grunow, Roland; Vipond, Richard; Grass, Gregor; Hoffmaster, Alex; Birdsell, Dawn N; Klee, Silke R; Pullan, Steven; Antwerpen, Markus; Bayer, Brittany N; Latham, Jennie; Wiggins, Kristin; Hepp, Crystal; Pearson, Talima; Brooks, Tim; Sahl, Jason; Wagner, David M

    2015-11-01

    Anthrax is a rare disease in humans but elicits great public fear because of its past use as an agent of bioterrorism. Injectional anthrax has been occurring sporadically for more than ten years in heroin consumers across multiple European countries and this outbreak has been difficult to trace back to a source. We took a molecular epidemiological approach in understanding this disease outbreak, including whole genome sequencing of Bacillus anthracis isolates from the anthrax victims. We also screened two large strain repositories for closely related strains to provide context to the outbreak. Analyzing 60 Bacillus anthracis isolates associated with injectional anthrax cases and closely related reference strains, we identified 1071 Single Nucleotide Polymorphisms (SNPs). The synapomorphic SNPs (350) were used to reconstruct phylogenetic relationships, infer likely epidemiological sources and explore the dynamics of evolving pathogen populations. Injectional anthrax genomes separated into two tight clusters: one group was exclusively associated with the 2009-10 outbreak and located primarily in Scotland, whereas the second comprised more recent (2012-13) cases but also a single Norwegian case from 2000. Genome-based differentiation of injectional anthrax isolates argues for at least two separate disease events spanning > 12 years. The genomic similarity of the two clusters makes it likely that they are caused by separate contamination events originating from the same geographic region and perhaps the same site of drug manufacturing or processing. Pathogen diversity within single patients challenges assumptions concerning population dynamics of infecting B. anthracis and host defensive barriers for injectional anthrax. This work was supported by the United States Department of Homeland Security grant no. HSHQDC-10-C-00,139 and via a binational cooperative agreement between the United States Government and the Government of Germany. This work was supported by funds

  20. Construction of white spot syndrome virus (WSSV) whole genome phage display library

    Institute of Scientific and Technical Information of China (English)

    ZHU Yanbing; YANG Feng

    2007-01-01

    A rebuilt vector pCANTAB 5 EE was obtained by inserting a 34 bp double-stranded oligonucleotide which contained a EcoRV recognition sequence into pCANTAB 5 E. White spot syndrome virus (WSSV) genome DNA was fragmented by sonication to isolate fragments mainly in the range of 0.8 ~2.0 kb, then the fragments were blunt-ended with T4 DNA polymerase and cloned into the EcoRV site of pCANTAB 5 EE. The primary recombinant clone of the library was 3.0 × 105.Colony PCR of random selected recombinants showed that the size of the inserts was 0.12 ~ 1.77 kb. After the whole library recombinant phages infected Escherichia coli HB2151 cells, the extracellular and periplasmic extracts were dropped on PVDF membranes to perform dot blot, using polyclonal mouse anti-VP24 serum,anti-WSV026 serum,anti-WSV063 serum,anti-WSV069 serum,anti-WSV112 serum, anti WSV238 serum,anti-WSV303 serum and anti-VP26 serum as the primary antibody, respectively. The results showed that the display library could express the viral proteins.

  1. Acne vulgaris.

    Science.gov (United States)

    Moradi Tuchayi, Sara; Makrantonaki, Evgenia; Ganceviciene, Ruta; Dessinioti, Clio; Feldman, Steven R; Zouboulis, Christos C

    2015-09-17

    Acne vulgaris is a chronic inflammatory disease - rather than a natural part of the life cycle as colloquially viewed - of the pilosebaceous unit (comprising the hair follicle, hair shaft and sebaceous gland) and is among the most common dermatological conditions worldwide. Some of the key mechanisms involved in the development of acne include disturbed sebaceous gland activity associated with hyperseborrhoea (that is, increased sebum production) and alterations in sebum fatty acid composition, dysregulation of the hormone microenvironment, interaction with neuropeptides, follicular hyperkeratinization, induction of inflammation and dysfunction of the innate and adaptive immunity. Grading of acne involves lesion counting and photographic methods. However, there is a lack of consensus on the exact grading criteria, which hampers the conduction and comparison of randomized controlled clinical trials evaluating treatments. Prevention of acne relies on the successful management of modifiable risk factors, such as underlying systemic diseases and lifestyle factors. Several treatments are available, but guidelines suffer from a lack of data to make evidence-based recommendations. In addition, the complex combination treatment regimens required to target different aspects of acne pathophysiology lead to poor adherence, which undermines treatment success. Acne commonly causes scarring and reduces the quality of life of patients. New treatment options with a shift towards targeting the early processes involved in acne development instead of suppressing the effects of end products will enhance our ability to improve the outcomes for patients with acne.

  2. Within-host whole genome analysis of an antibiotic resistant Pseudomonas aeruginosa strain sub-type in cystic fibrosis.

    Science.gov (United States)

    Sherrard, Laura J; Tai, Anna S; Wee, Bryan A; Ramsay, Kay A; Kidd, Timothy J; Ben Zakour, Nouri L; Whiley, David M; Beatson, Scott A; Bell, Scott C

    2017-01-01

    A Pseudomonas aeruginosa AUST-02 strain sub-type (M3L7) has been identified in Australia, infects the lungs of some people with cystic fibrosis and is associated with antibiotic resistance. Multiple clonal lineages may emerge during treatment with mutations in chromosomally encoded antibiotic resistance genes commonly observed. Here we describe the within-host diversity and antibiotic resistance of M3L7 during and after antibiotic treatment of an acute pulmonary exacerbation using whole genome sequencing and show both variation and shared mutations in important genes. Eleven isolates from an M3L7 population (n = 134) isolated over 3 months from an individual with cystic fibrosis underwent whole genome sequencing. A phylogeny based on core genome SNPs identified three distinct phylogenetic groups comprising two groups with higher rates of mutation (hypermutators) and one non-hypermutator group. Genomes were screened for acquired antibiotic resistance genes with the result suggesting that M3L7 resistance is principally driven by chromosomal mutations as no acquired mechanisms were detected. Small genetic variations, shared by all 11 isolates, were found in 49 genes associated with antibiotic resistance including frame-shift mutations (mexA, mexT), premature stop codons (oprD, mexB) and mutations in quinolone-resistance determining regions (gyrA, parE). However, whole genome sequencing also revealed mutations in 21 genes that were acquired following divergence of groups, which may also impact the activity of antibiotics and multi-drug efflux pumps. Comparison of mutations with minimum inhibitory concentrations of anti-pseudomonal antibiotics could not easily explain all resistance profiles observed. These data further demonstrate the complexity of chronic and antibiotic resistant P. aeruginosa infection where a multitude of co-existing genotypically diverse sub-lineages might co-exist during and after intravenous antibiotic treatment.

  3. Toward a Taxonomy for Multi-Omics Science? Terminology Development for Whole Genome Study Approaches by Omics Technology and Hierarchy.

    Science.gov (United States)

    Pirih, Nina; Kunej, Tanja

    2017-01-01

    Omics is a form of high-throughput systems science. However, taxonomies for omics studies are limited, inviting us to rethink new ways in which we classify, prioritize, and rank various omics systems science studies. In this overarching context, the genome-wide study approaches have proliferated in number and popularity over the past decade. However, their hierarchy is not well organized and the development of attendant terminology is not controlled. In the present study, we searched the literature in PubMed and the Web of Science databases published from March 1999 to September 2016 using the keywords, including genome-wide, association, whole genome, transcriptome-wide, metabolome, epigenome, and phenome. We identified the whole genome study approaches and sorted them according to the omics technology types (genomics, proteomics, and so on) and hierarchy. Thirty-four studies from over 90 publications were sorted into 10 omics groups: DNA level, transcriptomics, proteomics, interactomics, metabolomics, epigenomics, miRNomics/ncRNomics, phenomics, environmental omics, and pharmacogenomics. We suggest here modifications of terminology for study approaches, which share the same acronyms such as EWAS for epigenome-wide association and environment-wide association studies, and MWAS for methylome-wide association and metabolome-wide association studies. Taken together, our study presented here provides the first systematic review and analyses of whole genome approaches and presents a baseline for further controlled terminology development, with a view to a new taxonomy for omics and multi-omics studies in the future. Finally, we call for greater dialogue and collaboration across diverse omics knowledge domains and applications, for example, across plants, animals, clinical medicine, and ecology.

  4. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Science.gov (United States)

    Wilson, Mark R; Brown, Eric; Keys, Chris; Strain, Errol; Luo, Yan; Muruvanda, Tim; Grim, Christopher; Jean-Gilles Beaubrun, Junia; Jarvis, Karen; Ewing, Laura; Gopinath, Gopal; Hanes, Darcy; Allard, Marc W; Musser, Steven

    2016-01-01

    Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS) to Salmonella subspecies enterica serotype Tennessee (S. Tennessee) to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana), which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP) analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs), suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts during future

  5. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Directory of Open Access Journals (Sweden)

    Mark R Wilson

    Full Text Available Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS to Salmonella subspecies enterica serotype Tennessee (S. Tennessee to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana, which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs, suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts

  6. Whole genome sequencing identifies circulating Beijing-lineage Mycobacterium tuberculosis strains in Guatemala and an associated urban outbreak.

    Science.gov (United States)

    Saelens, Joseph W; Lau-Bonilla, Dalia; Moller, Anneliese; Medina, Narda; Guzmán, Brenda; Calderón, Maylena; Herrera, Raúl; Sisk, Dana M; Xet-Mull, Ana M; Stout, Jason E; Arathoon, Eduardo; Samayoa, Blanca; Tobin, David M

    2015-12-01

    Limited data are available regarding the molecular epidemiology of Mycobacterium tuberculosis (Mtb) strains circulating in Guatemala. Beijing-lineage Mtb strains have gained prevalence worldwide and are associated with increased virulence and drug resistance, but there have been only a few cases reported in Central America. Here we report the first whole genome sequencing of Central American Beijing-lineage strains of Mtb. We find that multiple Beijing-lineage strains, derived from independent founding events, are currently circulating in Guatemala, but overall still represent a relatively small proportion of disease burden. Finally, we identify a specific Beijing-lineage outbreak centered on a poor neighborhood in Guatemala City.

  7. Monitoring meticillin resistant Staphylococcus aureus and its spread in Copenhagen, Denmark, 2013, through routine whole genome sequencing

    DEFF Research Database (Denmark)

    Bartels, M D; Larner-Svensson, H; Meiniche, H;

    2015-01-01

    Typing of meticillin resistant Staphylococcus aureus (MRSA) by whole genome sequencing (WGS) is performed routinely in Copenhagen since January 2013. We describe the relatedness, based on WGS data and epidemiological data, of 341 MRSA isolates. These comprised all MRSA (n = 300) identified...... in Copenhagen in the first five months of 2013. Moreover, because MRSA of staphylococcal protein A (spa)-type 304 (t304), sequence type (ST) 6 had been associated with a continuous neonatal ward outbreak in Copenhagen starting in 2011, 41 t304 isolates collected in the city between 2010 and 2012 were also...

  8. Draft whole-genome sequence of the Diaporthe helianthi 7/96 strain, causal agent of sunflower stem canker

    Directory of Open Access Journals (Sweden)

    Riccardo Baroncelli

    2016-12-01

    Full Text Available Diaporthe helianthi is a fungus pathogenic to sunflower. Virulent strains of this fungus cause stem canker with important yield losses and reduction of oil content. Here we present the first draft whole-genome sequence of the highly virulent isolate D. helianthi strain 7/96, thus providing a useful platform for future research on stem canker of sunflower and fungal genomics. The genome sequence of the D. helianthi isolate 7/96 was deposited at DDBJ/ENA/GenBank under the accession number MAVT00000000 (BioProject PRJNA327798.

  9. Whole Genome Comparison of Campylobacter jejuni Human Isolates Using a Low-Cost Microarray Reveals Extensive Genetic Diversity

    OpenAIRE

    2001-01-01

    Campylobacter jejuni is the leading cause of bacterial food-borne diarrhoeal disease throughout the world, and yet is still a poorly understood pathogen. Whole genome microarray comparisons of 11 C. jejuni strains of diverse origin identified genes in up to 30 NCTC 11168 loci ranging from 0.7 to 18.7 kb that are either absent or highly divergent in these isolates. Many of these regions are associated with the biosynthesis of surface structures including flagella, lipo-oligosaccharide, and the...

  10. Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods.

    Science.gov (United States)

    Ahrenfeldt, Johanne; Skaarup, Carina; Hasman, Henrik; Pedersen, Anders Gorm; Aarestrup, Frank Møller; Lund, Ole

    2017-01-05

    Whole genome sequencing (WGS) is increasingly used in diagnostics and surveillance of infectious diseases. A major application for WGS is to use the data for identifying outbreak clusters, and there is therefore a need for methods that can accurately and efficiently infer phylogenies from sequencing reads. In the present study we describe a new dataset that we have created for the purpose of benchmarking such WGS-based methods for epidemiological data, and also present an analysis where we use the data to compare the performance of some current methods. Our aim was to create a benchmark data set that mimics sequencing data of the sort that might be collected during an outbreak of an infectious disease. This was achieved by letting an E. coli hypermutator strain grow in the lab for 8 consecutive days, each day splitting the culture in two while also collecting samples for sequencing. The result is a data set consisting of 101 whole genome sequences with known phylogenetic relationship. Among the sequenced samples 51 correspond to internal nodes in the phylogeny because they are ancestral, while the remaining 50 correspond to leaves. We also used the newly created data set to compare three different online available methods that infer phylogenies from whole-genome sequencing reads: NDtree, CSI Phylogeny and REALPHY. One complication when comparing the output of these methods with the known phylogeny is that phylogenetic methods typically build trees where all observed sequences are placed as leafs, even though some of them are in fact ancestral. We therefore devised a method for post processing the inferred trees by collapsing short branches (thus relocating some leafs to internal nodes), and also present two new measures of tree similarity that takes into account the identity of both internal and leaf nodes. Based on this analysis we find that, among the investigated methods, CSI Phylogeny had the best performance, correctly identifying 73% of all branches in the

  11. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

    Science.gov (United States)

    van der Weide, Robin H; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.

  12. epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data.

    Science.gov (United States)

    Vincent, Martin; Mundbjerg, Kamilla; Skou Pedersen, Jakob; Liang, Gangning; Jones, Peter A; Ørntoft, Torben Falck; Dalsgaard Sørensen, Karina; Wiuf, Carsten

    2017-02-21

    The study of epigenetic heterogeneity at the level of individual cells and in whole populations is the key to understanding cellular differentiation, organismal development, and the evolution of cancer. We develop a statistical method, epiG, to infer and differentiate between different epi-allelic haplotypes, annotated with CpG methylation status and DNA polymorphisms, from whole-genome bisulfite sequencing data, and nucleosome occupancy from NOMe-seq data. We demonstrate the capabilities of the method by inferring allele-specific methylation and nucleosome occupancy in cell lines, and colon and tumor samples, and by benchmarking the method against independent experimental data.

  13. Whole-Genome Sequencing of Invasion-Resistant Cells Identifies Laminin α2 as a Host Factor for Bacterial Invasion

    DEFF Research Database (Denmark)

    van Wijk, Xander M.; Döhrmann, Simon; Hallstrom, Bjorn

    2017-01-01

    To understand the role of glycosaminoglycans in bacterial cellular invasion, xylosyltransferase-deficient mutants of Chinese hamster ovary (CHO) cells were created using clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated gene 9 (CRISPR-cas9) gene targeting. When...... cells. Whole-genome sequencing and transcriptome sequencing (RNA-Seq) uncovered a deletion in the gene encoding the laminin subunit α2 (Lama2) that eliminated much of domain L4a. Silencing of the long Lama2 isoform in wild-type cells strongly reduced bacterial invasion, whereas transfection with human...

  14. Whole-Genome Sequence of a blaOXA-48-Harboring Raoultella ornithinolytica Clinical Isolate from Lebanon.

    Science.gov (United States)

    Al-Bayssari, Charbel; Olaitan, Abiola Olumuyiwa; Leangapichart, Thongpan; Okdah, Liliane; Dabboussi, Fouad; Hamze, Monzer; Rolain, Jean-Marc

    2016-04-01

    We analyzed the whole-genome sequence of ablaOXA-48-harboringRaoultella ornithinolyticaclinical isolate from a patient in Lebanon. The size of theRaoultella ornithinolyticaCMUL058 genome was 5,622,862 bp, with a G+C content of 55.7%. We deciphered all the molecular mechanisms of antibiotic resistance, and we compared our genome to other availableR. ornithinolyticagenomes in GenBank. The resistome consisted of 9 antibiotic resistance genes, including a plasmidicblaOXA-48gene whose genetic organization is also described.

  15. A flexible whole-genome microarray for transcriptomics in three-spine stickleback (Gasterosteus aculeatus

    Directory of Open Access Journals (Sweden)

    Primmer Craig R

    2009-09-01

    Full Text Available Abstract Background The use of microarray technology for describing changes in mRNA expression to address ecological and evolutionary questions is becoming increasingly popular. Since three-spine stickleback are an important ecological and evolutionary model-species as well as an emerging model for eco-toxicology, the ability to have a functional and flexible microarray platform for transcriptome studies will greatly enhance the research potential in these areas. Results We designed 43,392 unique oligonucleotide probes representing 19,274 genes (93% of the estimated total gene number, and tested the hybridization performance of both DNA and RNA from different populations to determine the efficacy of probe design for transcriptome analysis using the Agilent array platform. The majority of probes were functional as evidenced by the DNA hybridization success, and 30,946 probes (14,615 genes had a signal that was significantly above background for RNA isolated from liver tissue. Genes identified as being expressed in liver tissue were grouped into functional categories for each of the three Gene Ontology groups: biological process, molecular function, and cellular component. As expected, the highest proportions of functional categories belonged to those associated with metabolic functions: metabolic process, binding, catabolism, and organelles. Conclusion The probe and microarray design presented here provides an important step facilitating transcriptomics research for this important research organism by providing a set of over 43,000 probes whose hybridization success and specificity to liver expression has been demonstrated. Probes can easily be added or removed from the current design to tailor the array to specific experiments and additional flexibility lies in the ability to perform either one-color or two-color hybridizations.

  16. Evaluation ofA Single-reaction Method for Whole Genome Sequencing of Influenza A Virus using Next Generation Sequencing

    Institute of Scientific and Technical Information of China (English)

    ZOU Xiao Hui; CHEN Wen Bing; ZHAO Xiang; ZHU Wen Fei; YANG Lei; WANG Da Yan; SHU Yue Long

    2016-01-01

    ObjectiveTo evaluate a single-reaction genome amplification method, the multisegment reverse transcription-PCR (M-RTPCR), for its sensitivity to full genome sequencing of influenza A virus, and the ability to differentiate mix-subtype virus, using the next generation sequencing (NGS) platform. MethodsVirus genome copy was quantified and serially diluted to different titers, followed by amplification with the M-RTPCR method and sequencing on the NGS platform. Furthermore, we manually mixed two subtype viruses to different titer rate and amplified the mixed virus with the M-RTPCR protocol, followed by whole genome sequencing on the NGS platform. We also used clinical samples to test the method performance. ResultsThe M-RTPCR method obtained complete genome of testing virus at 125 copies/reaction and determined the virus subtype at titer of 25 copies/reaction. Moreover, the two subtypes in the mixed virus could be discriminated, even though these two virus copies differed by 200-fold using this amplification protocol. The sensitivity of this protocol we detected using virus RNA was also confirmed with clinical samples containing low-titer virus. ConclusionThe M-RTPCR is a robust and sensitive amplification method for whole genome sequencing of influenza A virus using NGS platform.

  17. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  18. An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform

    Directory of Open Access Journals (Sweden)

    Zhang Tongwu

    2011-11-01

    Full Text Available Abstract Motivation Complete organellar genome sequences (chloroplasts and mitochondria provide valuable resources and information for studying plant molecular ecology and evolution. As high-throughput sequencing technology advances, it becomes the norm that a shotgun approach is used to obtain complete genome sequences. Therefore, to assemble organellar sequences from the whole genome, shotgun reads are inevitable. However, associated techniques are often cumbersome, time-consuming, and difficult, because true organellar DNA is difficult to separate efficiently from nuclear copies, which have been transferred to the nucleus through the course of evolution. Results We report a new, rapid procedure for plant chloroplast and mitochondrial genome sequencing and assembly using the Roche/454 GS FLX platform. Plant cells can contain multiple copies of the organellar genomes, and there is a significant correlation between the depth of sequence reads in contigs and the number of copies of the genome. Without isolating organellar DNA from the mixture of nuclear and organellar DNA for sequencing, we retrospectively extracted assembled contigs of either chloroplast or mitochondrial sequences from the whole genome shotgun data. Moreover, the contig connection graph property of Newbler (a platform-specific sequence assembler ensures an efficient final assembly. Using this procedure, we assembled both chloroplast and mitochondrial genomes of a resurrection plant, Boea hygrometrica, with high fidelity. We also present information and a minimal sequence dataset as a reference for the assembly of other plant organellar genomes.

  19. Genotyping performance assessment of whole genome amplified DNA with respect to multiplexing level of assay and its period of storage.

    Directory of Open Access Journals (Sweden)

    Daniel W H Ho

    Full Text Available Whole genome amplification can faithfully amplify genomic DNA (gDNA with minimal bias and substantial genome coverage. Whole genome amplified DNA (wgaDNA has been tested to be workable for high-throughput genotyping arrays. However, issues about whether wgaDNA would decrease genotyping performance at increasing multiplexing levels and whether the storage period of wgaDNA would reduce genotyping performance have not been examined. Using the Sequenom MassARRAY iPLEX Gold assays, we investigated 174 single nucleotide polymorphisms for 3 groups of matched samples: group 1 of 20 gDNA samples, group 2 of 20 freshly prepared wgaDNA samples, and group 3 of 20 stored wgaDNA samples that had been kept frozen at -70°C for 18 months. MassARRAY is a medium-throughput genotyping platform with reaction chemistry different from those of high-throughput genotyping arrays. The results showed that genotyping performance (efficiency and accuracy of freshly prepared wgaDNA was similar to that of gDNA at various multiplexing levels (17-plex, 21-plex, 28-plex and 36-plex of the MassARRAY assays. However, compared with gDNA or freshly prepared wgaDNA, stored wgaDNA was found to give diminished genotyping performance (efficiency and accuracy due to potentially inferior quality. Consequently, no matter whether gDNA or wgaDNA was used, better genotyping efficiency would tend to have better genotyping accuracy.

  20. Whole-genome resequencing analyses of five pig breeds, including Korean wild and native, and three European origin breeds.

    Science.gov (United States)

    Choi, Jung-Woo; Chung, Won-Hyong; Lee, Kyung-Tai; Cho, Eun-Seok; Lee, Si-Woo; Choi, Bong-Hwan; Lee, Sang-Heon; Lim, Wonjun; Lim, Dajeong; Lee, Yun-Gyeong; Hong, Joon-Ki; Kim, Doo-Wan; Jeon, Hyeon-Jeong; Kim, Jiwoong; Kim, Namshin; Kim, Tae-Hun

    2015-08-01

    Pigs have been one of the most important sources of meat for humans, and their productivity has been substantially improved by recent strong selection. Here, we present whole-genome resequencing analyses of 55 pigs of five breeds representing Korean native pigs, wild boar and three European origin breeds. 1,673.1 Gb of sequence reads were mapped to the Swine reference assembly, covering ∼99.2% of the reference genome, at an average of ∼11.7-fold coverage. We detected 20,123,573 single-nucleotide polymorphisms (SNPs), of which 25.5% were novel. We extracted 35,458 of non-synonymous SNPs in 9,904 genes, which may contribute to traits of interest. The whole SNP sets were further used to access the population structures of the breeds, using multiple methodologies, including phylogenetic, similarity matrix, and population structure analysis. They showed clear population clusters with respect to each breed. Furthermore, we scanned the whole genomes to identify signatures of selection throughout the genome. The result revealed several promising loci that might underlie economically important traits in pigs, such as the CLDN1 and TWIST1 genes. These discoveries provide useful genomic information for further study of the discrete genetic mechanisms associated with economically important traits in pigs.

  1. Identification of molecular phenotypic descriptors of breast capsular contracture formation using informatics analysis of the whole genome transcriptome.

    Science.gov (United States)

    Kyle, Daniel J T; Harvey, Alison G; Shih, Barbara; Tan, Kian T; Chaudhry, Iskander H; Bayat, Ardeshir

    2013-01-01

    Breast capsular contracture formation following silicone implant augmentation/reconstruction is a common complication that remains poorly understood. The aim of this study was to identify potential biomarkers implicated in breast capsular contracture formation by using, for the first time, whole genome arrays. Biopsy samples were taken from 18 patients (23 breast capsules) with Baker Grade I-II (Control) and Baker Grade III-IV (Contracted). Whole genome microarrays were performed and six significantly dysregulated genes were selected for further validation with quantitative reverse transcriptase polymerase chain reaction and immunohistochemistry. Hematoxylin and eosin was also carried out to compare the histological characteristics of control and contracted samples. Microarray results showed that aggrecan, tissue inhibitor of metalloproteinase 4 (TIMP4), and tumor necrosis factor superfamily (ligand) member 11 were significantly down-regulated in contracted capsules; while matrix metallopeptidase 12, serum amyloid A 1, and interleukin 8 (IL8) were significantly up-regulated. The dysregulation of aggrecan, tumor necrosis factor superfamily (ligand) member 11, TIMP4, and IL8 was validated by quantitative reverse transcriptase polymerase chain reaction (p contracture formation. IL8 and TIMP4 may serve as potential key diagnostic, therapeutic, and prognostic biomarkers in capsular contracture formation. © 2013 by the Wound Healing Society.

  2. Efficient Haplotype Inference Algorithms in One Whole Genome Scan for Pedigree Data with Non-genotyped Founders

    Institute of Scientific and Technical Information of China (English)

    Yongxi Cheng; Hadi Sabaa; Zhipeng Cai; Randy Goebel; Guohui Lin

    2009-01-01

    An efficient rule-based algorithm is presented for haplotype inference from general pedigree genotype data, with the assumption of no recombination. This algorithm generalizes previous algorithms to handle the cases where some pedigree founders are not genotyped, provided that for each nuclear family at least one parent is genotyped and each non-genotyped founder appears in exactly one nuclear family. The importance of this generalization lies in that such cases frequently happen in real data, because some founders may have passed away and their genotype data can no longer be collected. The algorithm runs in O(m3n3) time, where m is the number of single nucleotide polymorphism (SNP) loci under consideration and n is the number of genotyped members in the pedigree. This zero-recombination haplotyping algorithm is extended to a maximum parsimoniously haplotyping algorithm in one whole genome scan to minimize the total number of breakpoint sites, or equivalently, the number of maximal zero-recombination chromosomal regions. We show that such a whole genome scan haplotyping algorithm can be implemented in O(m3n3) time in a novel incremental fashion,here m denotes the total number of SNP loci along the chromosome.

  3. From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing

    Science.gov (United States)

    Laurie, Steve; Fernandez‐Callejo, Marcos; Marco‐Sola, Santiago; Trotta, Jean‐Remi; Camps, Jordi; Chacón, Alejandro; Espinosa, Antonio; Gut, Marta; Gut, Ivo; Heath, Simon

    2016-01-01

    ABSTRACT As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next‐generation sequencing as standard practice in research and diagnostics. However, computing cost–performance ratio is not advancing at an equivalent rate. Therefore, it is essential to evaluate the robustness of the variant detection process taking into account the computing resources required. We have benchmarked six combinations of state‐of‐the‐art read aligners (BWA‐MEM and GEM3) and variant callers (FreeBayes, GATK HaplotypeCaller, SAMtools) on whole genome and whole exome sequencing data from the NA12878 human sample. Results have been compared between them and against the NIST Genome in a Bottle (GIAB) variants reference dataset. We report differences in speed of up to 20 times in some steps of the process and have observed that SNV, and to a lesser extent InDel, detection is highly consistent in 70% of the genome. SNV, and especially InDel, detection is less reliable in 20% of the genome, and almost unfeasible in the remaining 10%. These findings will aid in choosing the appropriate tools bearing in mind objectives, workload, and computing infrastructure available. PMID:27604516

  4. Enterobacter asburiae Strain L1: Complete Genome and Whole Genome Optical Mapping Analysis of a Quorum Sensing Bacterium

    Directory of Open Access Journals (Sweden)

    Yin Yin Lau

    2014-07-01

    Full Text Available Enterobacter asburiae L1 is a quorum sensing bacterium isolated from lettuce leaves. In this study, for the first time, the complete genome of E. asburiae L1 was sequenced using the single molecule real time sequencer (PacBio RSII and the whole genome sequence was verified by using optical genome mapping (OpGen technology. In our previous study, E. asburiae L1 has been reported to produce AHLs, suggesting the possibility of virulence factor regulation which is quorum sensing dependent. This evoked our interest to study the genome of this bacterium and here we present the complete genome of E. asburiae L1, which carries the virulence factor gene virK, the N-acyl homoserine lactone-based QS transcriptional regulator gene luxR and the N-acyl homoserine lactone synthase gene which we firstly named easI. The availability of the whole genome sequence of E. asburiae L1 will pave the way for the study of the QS-mediated gene expression in this bacterium. Hence, the importance and functions of these signaling molecules can be further studied in the hope of elucidating the mechanisms of QS-regulation in E. asburiae. To the best of our knowledge, this is the first documentation of both a complete genome sequence and the establishment of the molecular basis of QS properties of E. asburiae.

  5. Mycobacterium tuberculosis Whole Genome Sequences From Southern India Suggest Novel Resistance Mechanisms and the Need for Region-Specific Diagnostics.

    Science.gov (United States)

    Manson, Abigail L; Abeel, Thomas; Galagan, James E; Sundaramurthi, Jagadish Chandrabose; Salazar, Alex; Gehrmann, Thies; Shanmugam, Siva Kumar; Palaniyandi, Kannan; Narayanan, Sujatha; Swaminathan, Soumya; Earl, Ashlee M

    2017-06-01

    India is home to 25% of all tuberculosis cases and the second highest number of multidrug resistant cases worldwide. However, little is known about the genetic diversity and resistance determinants of Indian Mycobacterium tuberculosis, particularly for the primary lineages found in India, lineages 1 and 3. We whole genome sequenced 223 randomly selected M. tuberculosis strains from 196 patients within the Tiruvallur and Madurai districts of Tamil Nadu in Southern India. Using comparative genomics, we examined genetic diversity, transmission patterns, and evolution of resistance. Genomic analyses revealed (11) prevalence of strains from lineages 1 and 3, (11) recent transmission of strains among patients from the same treatment centers, (11) emergence of drug resistance within patients over time, (11) resistance gained in an order typical of strains from different lineages and geographies, (11) underperformance of known resistance-conferring mutations to explain phenotypic resistance in Indian strains relative to studies focused on other geographies, and (11) the possibility that resistance arose through mutations not previously implicated in resistance, or through infections with multiple strains that confound genotype-based prediction of resistance. In addition to substantially expanding the genomic perspectives of lineages 1 and 3, sequencing and analysis of M. tuberculosis whole genomes from Southern India highlight challenges of infection control and rapid diagnosis of resistant tuberculosis using current technologies. Further studies are needed to fully explore the complement of diversity and resistance determinants within endemic M. tuberculosis populations.

  6. Lessons learned from the application of whole-genome analysis to the treatment of patients with advanced cancers.

    Science.gov (United States)

    Laskin, Janessa; Jones, Steven; Aparicio, Samuel; Chia, Stephen; Ch'ng, Carolyn; Deyell, Rebecca; Eirew, Peter; Fok, Alexandra; Gelmon, Karen; Ho, Cheryl; Huntsman, David; Jones, Martin; Kasaian, Katayoon; Karsan, Aly; Leelakumari, Sreeja; Li, Yvonne; Lim, Howard; Ma, Yussanne; Mar, Colin; Martin, Monty; Moore, Richard; Mungall, Andrew; Mungall, Karen; Pleasance, Erin; Rassekh, S Rod; Renouf, Daniel; Shen, Yaoqing; Schein, Jacqueline; Schrader, Kasmintan; Sun, Sophie; Tinker, Anna; Zhao, Eric; Yip, Stephen; Marra, Marco A

    2015-10-01

    Given the success of targeted agents in specific populations it is expected that some degree of molecular biomarker testing will become standard of care for many, if not all, cancers. To facilitate this, cancer centers worldwide are experimenting with targeted "panel" sequencing of selected mutations. Recent advances in genomic technology enable the generation of genome-scale data sets for individual patients. Recognizing the risk, inherent in panel sequencing, of failing to detect meaningful somatic alterations, we sought to establish processes to integrate data from whole-genome analysis (WGA) into routine cancer care. Between June 2012 and August 2014, 100 adult patients with incurable cancers consented to participate in the Personalized OncoGenomics (POG) study. Fresh tumor and blood samples were obtained and used for whole-genome and RNA sequencing. Computational approaches were used to identify candidate driver mutations, genes, and pathways. Diagnostic and drug information were then sought based on these candidate "drivers." Reports were generated and discussed weekly in a multidisciplinary team setting. Other multidisciplinary working groups were assembled to establish guidelines on the interpretation, communication, and integration of individual genomic findings into patient care. Of 78 patients for whom WGA was possible, results were considered actionable in 55 cases. In 23 of these 55 cases, the patients received treatments motivated by WGA. Our experience indicates that a multidisciplinary team of clinicians and scientists can implement a paradigm in which WGA is integrated into the care of late stage cancer patients to inform systemic therapy decisions.

  7. SynMap2 and SynMap3D: web-based whole-genome synteny browsers.

    Science.gov (United States)

    Haug-Baltzell, Asher; Stephens, Sean A; Davey, Sean; Scheidegger, Carlos E; Lyons, Eric

    2017-07-15

    Current synteny visualization tools either focus on small regions of sequence and do not illustrate genome-wide trends, or are complicated to use and create visualizations that are difficult to interpret. To address this challenge, The Comparative Genomics Platform (CoGe) has developed two web-based tools to visualize synteny across whole genomes. SynMap2 and SynMap3D allow researchers to explore whole genome synteny patterns (across two or three genomes, respectively) in responsive, web-based visualization and virtual reality environments. Both tools have access to the extensive CoGe genome database (containing over 30 000 genomes) as well as the option for users to upload their own data. By leveraging modern web technologies there is no installation required, making the tools widely accessible and easy to use. Both tools are open source (MIT license) and freely available for use online through CoGe ( https://genomevolution.org ). SynMap2 and SynMap3D can be accessed at http://genomevolution.org/coge/SynMap.pl and http://genomevolution.org/coge/SynMap3D.pl , respectively. Source code is available: https://github.com/LyonsLab/coge . ericlyons@email.arizona.edu. Supplementary data are available at Bioinformatics online.

  8. Whole genome sequencing and phylogenetic characterization of brown bullhead (Ameiurus nebulosus) origin ranavirus strains from independent disease outbreaks.

    Science.gov (United States)

    Fehér, Enikő; Doszpoly, Andor; Horváth, Balázs; Marton, Szilvia; Forró, Barbara; Farkas, Szilvia L; Bányai, Krisztián; Juhász, Tamás

    2016-11-01

    Ranaviruses are emerging pathogens associated with high mortality diseases in fish, amphibians and reptiles. Here we describe the whole genome sequence of two ranavirus isolates from brown bullhead (Ameiurus nebulosus) specimens collected in 2012 at two different locations in Hungary during independent mass mortality events. The two Hungarian isolates were highly similar to each other at the genome sequence level (99.9% nucleotide identity) and to a European sheatfish (Silurus glanis) origin ranavirus (ESV, 99.7%-99.9% nucleotide identity). The coding potential of the genomes of both Hungarian isolates, with 136 putative proteins, were shared with that of the ESV. The core genes commonly used in phylogenetic analysis of ranaviruses were not useful to differentiate the two brown bullhead ESV strains. However genome-wide distribution of point mutations and structural variations observed mainly in the non-coding regions of the genome suggested that the ranavirus disease outbreaks in Hungary were caused by different virus strains. At this moment, due to limited whole genome sequence data of ESV it is unclear whether these genomic changes are useful in molecular epidemiological monitoring of ranavirus disease outbreaks. Therefore, complete genome sequencing of further isolates will be needed to identify adequate genetic markers, if any, and demonstrate their utility in disease control and prevention.

  9. Construction of whole genome radiation hybrid panels and map of chromosome 5A of wheat using asymmetric somatic hybridization.

    Directory of Open Access Journals (Sweden)

    Chuanen Zhou

    Full Text Available To explore the feasibility of constructing a whole genome radiation hybrid (WGRH map in plant species with large genomes, asymmetric somatic hybridization between wheat (Triticum aestivum L. and Bupleurum scorzonerifolium Willd. was performed. The protoplasts of wheat were irradiated with ultraviolet light (UV and gamma-ray and rescued by protoplast fusion using B. scorzonerifolium as the recipient. Assessment of SSR markers showed that the radiation hybrids have the average marker retention frequency of 15.5%. Two RH panels (RHPWI and RHPWII that contained 92 and 184 radiation hybrids, respectively, were developed and used for mapping of 68 SSR markers in chromosome 5A of wheat. A total of 1557 and 2034 breaks were detected in each panel. The RH map of chromosome 5A based on RHPWII was constructed. The distance of the comprehensive map was 2103 cR and the approximate resolution was estimated to be ∼501.6 kb/break. The RH panels evaluated in this study enabled us to order the ESTs in a single deletion bin or in the multiple bins cross the chromosome. These results demonstrated that RH mapping via protoplast fusion is feasible at the whole genome level for mapping purposes in wheat and the potential value of this mapping approach for the plant species with large genomes.

  10. Whole Genome Sequencing of the Braconid Parasitoid Wasp Fopius arisanus, an Important Biocontrol Agent of Pest Tepritid Fruit Flies

    Directory of Open Access Journals (Sweden)

    Scott M. Geib

    2017-08-01

    Full Text Available The braconid wasp Fopius arisanus (Sonan is an important biological control agent of tropical and subtropical pest fruit flies, including two important global pests, the Mediterranean fruit fly (Ceratitis capitata, and the oriental fruit fly (Bactrocera dorsalis. The goal of this study was to develop foundational genomic resources for this species to provide tools that can be used to answer questions exploring the multitrophic interactions between the host and parasitoid in this important research system. Here, we present a whole genome assembly of F. arisanus, derived from a pool of haploid offspring from a single unmated female. The genome is ∼154 Mb in size, with a N50 contig and scaffold size of 51,867 bp and 0.98 Mb, respectively. Utilizing existing RNA-Seq data for this species, as well as publicly available peptide sequences from related Hymenoptera, a high quality gene annotation set, which includes 10,991 protein coding genes, was generated. Prior to this assembly submission, no RefSeq proteins were present for this species. Parasitic wasps play an important role in a diverse ecosystem as well as a role in biological control of agricultural pests. This whole genome assembly and annotation data represents the first genome-scale assembly for this species or any closely related Opiine, and are publicly available in the National Center for Biotechnology Information Genome and RefSeq databases, providing a much needed genomic resource for this hymenopteran group.

  11. Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Alexander C Outhred

    Full Text Available Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways.We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants.Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade.Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster.

  12. Validation of whole genome amplification for analysis of the p53 tumor suppressor gene in limited amounts of tumor samples.

    Science.gov (United States)

    Hasmats, Johanna; Green, Henrik; Solnestam, Beata Werne; Zajac, Pawel; Huss, Mikael; Orear, Cedric; Validire, Pierre; Bjursell, Magnus; Lundeberg, Joakim

    2012-08-24

    Personalized cancer treatment requires molecular characterization of individual tumor biopsies. These samples are frequently only available in limited quantities hampering genomic analysis. Several whole genome amplification (WGA) protocols have been developed with reported varying representation of genomic regions post amplification. In this study we investigate region dropout using a φ29 polymerase based WGA approach. DNA from 123 lung cancers specimens and corresponding normal tissue were used and evaluated by Sanger sequencing of the p53 exons 5-8. To enable comparative analysis of this scarce material, WGA samples were compared with unamplified material using a pooling strategy of the 123 samples. In addition, a more detailed analysis of exon 7 amplicons were performed followed by extensive cloning and Sanger sequencing. Interestingly, by comparing data from the pooled samples to the individually sequenced exon 7, we demonstrate that mutations are more easily recovered from WGA pools and this was also supported by simulations of different sequencing coverage. Overall this data indicate a limited random loss of genomic regions supporting the use of whole genome amplification for genomic analysis.

  13. swDMR: A Sliding Window Approach to Identify Differentially Methylated Regions Based on Whole Genome Bisulfite Sequencing.

    Directory of Open Access Journals (Sweden)

    Zhen Wang

    Full Text Available DNA methylation is a widespread epigenetic modification that plays an essential role in gene expression through transcriptional regulation and chromatin remodeling. The emergence of whole genome bisulfite sequencing (WGBS represents an important milestone in the detection of DNA methylation. Characterization of differential methylated regions (DMRs is fundamental as well for further functional analysis. In this study, we present swDMR (http://sourceforge.net/projects/swDMR/ for the comprehensive analysis of DMRs from whole genome methylation profiles by a sliding window approach. It is an integrated tool designed for WGBS data, which not only implements accessible statistical methods to perform hypothesis test adapted to two or more samples without replicates, but false discovery rate was also controlled by multiple test correction. Downstream analysis tools were also provided, including cluster, annotation and visualization modules. In summary, based on WGBS data, swDMR can produce abundant information of differential methylated regions. As a convenient and flexible tool, we believe swDMR will bring us closer to unveil the potential functional regions involved in epigenetic regulation.

  14. Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    William P. Gilks

    2016-11-01

    Full Text Available As part of a study into the molecular genetics of sexually dimorphic complex traits, we used next-generation sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6, and a unique haplotype from the outbred base population (LHM. The use of a static and known genetic background enabled us to obtain sequences from whole genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502. We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp. Additionally we detected and genotyped 167 large structural variants (1-100Kb in size using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591. We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/.

  15. Spatiotemporal characterizations of dengue virus in mainland China: insights into the whole genome from 1978 to 2011.

    Science.gov (United States)

    Zhang, Hao; Zhang, Yanru; Hamoudi, Rifat; Yan, Guiyun; Chen, Xiaoguang; Zhou, Yuanping

    2014-01-01

    Temporal-Spatial of dengue virus (DENV) analyses have been performed in previous epidemiological studies in mainland China, but few studies have examined the whole genome of the DENV. Herein, 40 whole genome sequences of DENVs isolated from mainland China were downloaded from GenBank. Phylogenetic analyses and evolutionary distances of the dengue serotypes 1 and 2 were calculated using 14 maximum likelihood trees created from individual genes and whole genome. Amino acid variations were also analyzed in the 40 sequences that included dengue serotypes 1, 2, 3 and 4, and they were grouped according to temporal and spatial differences. The results showed that none of the phylogenetic trees created from each individual gene were similar to the trees created using the complete genome and the evolutionary distances were variable with each individual gene. The number of amino acid variations was significantly different (p = 0.015) between DENV-1 and DENV-2 after 2001; seven mutations, the N290D, L402F and A473T mutations in the E gene region and the R101K, G105R, D340E and L349M mutations in the NS1 region of DENV-1, had significant substitutions, compared to the amino acids of DENV-2. Based on the spatial distribution using Guangzhou, including Foshan, as the indigenous area and the other regions as expanding areas, significant differences in the number of amino acid variations in the NS3 (p = 0.03) and NS1 (p = 0.024) regions and the NS2B (p = 0.016) and NS3 (p = 0.042) regions were found in DENV-1 and DENV-2. Recombination analysis showed no inter-serotype recombination events between the DENV-1 and DENV-2, while six and seven breakpoints were found in DENV-1 and DENV-2. Conclusively, the individual genes might not be suitable to analyze the evolution and selection pressure isolated in mainland China; the mutations in the amino acid residues in the E, NS1 and NS3 regions may play important roles in DENV-1 and DENV-2 epidemics.

  16. Spatiotemporal characterizations of dengue virus in mainland China: insights into the whole genome from 1978 to 2011.

    Directory of Open Access Journals (Sweden)

    Hao Zhang

    Full Text Available Temporal-Spatial of dengue virus (DENV analyses have been performed in previous epidemiological studies in mainland China, but few studies have examined the whole genome of the DENV. Herein, 40 whole genome sequences of DENVs isolated from mainland China were downloaded from GenBank. Phylogenetic analyses and evolutionary distances of the dengue serotypes 1 and 2 were calculated using 14 maximum likelihood trees created from individual genes and whole genome. Amino acid variations were also analyzed in the 40 sequences that included dengue serotypes 1, 2, 3 and 4, and they were grouped according to temporal and spatial differences. The results showed that none of the phylogenetic trees created from each individual gene were similar to the trees created using the complete genome and the evolutionary distances were variable with each individual gene. The number of amino acid variations was significantly different (p = 0.015 between DENV-1 and DENV-2 after 2001; seven mutations, the N290D, L402F and A473T mutations in the E gene region and the R101K, G105R, D340E and L349M mutations in the NS1 region of DENV-1, had significant substitutions, compared to the amino acids of DENV-2. Based on the spatial distribution using Guangzhou, including Foshan, as the indigenous area and the other regions as expanding areas, significant differences in the number of amino acid variations in the NS3 (p = 0.03 and NS1 (p = 0.024 regions and the NS2B (p = 0.016 and NS3 (p = 0.042 regions were found in DENV-1 and DENV-2. Recombination analysis showed no inter-serotype recombination events between the DENV-1 and DENV-2, while six and seven breakpoints were found in DENV-1 and DENV-2. Conclusively, the individual genes might not be suitable to analyze the evolution and selection pressure isolated in mainland China; the mutations in the amino acid residues in the E, NS1 and NS3 regions may play important roles in DENV-1 and DENV-2 epidemics.

  17. An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella

    Directory of Open Access Journals (Sweden)

    James B. Pettengill

    2014-10-01

    Full Text Available Comparative genomics based on whole genome sequencing (WGS is increasingly being applied to investigate questions within evolutionary and molecular biology, as well as questions concerning public health (e.g., pathogen outbreaks. Given the impact that conclusions derived from such analyses may have, we have evaluated the robustness of clustering individuals based on WGS data to three key factors: (1 next-generation sequencing (NGS platform (HiSeq, MiSeq, IonTorrent, 454, and SOLiD, (2 algorithms used to construct a SNP (single nucleotide polymorphism matrix (reference-based and reference-free, and (3 phylogenetic inference method (FastTreeMP, GARLI, and RAxML. We carried out these analyses on 194 whole genome sequences representing 107 unique Salmonella enterica subsp. enterica ser. Montevideo strains. Reference-based approaches for identifying SNPs produced trees that were significantly more similar to one another than those produced under the reference-free approach. Topologies inferred using a core matrix (i.e., no missing data were significantly more discordant than those inferred using a non-core matrix that allows for some missing data. However, allowing for too much missing data likely results in a high false discovery rate of SNPs. When analyzing the same SNP matrix, we observed that the more thorough inference methods implemented in GARLI and RAxML produced more similar topologies than FastTreeMP. Our results also confirm that reproducibility varies among NGS platforms where the MiSeq had the lowest number of pairwise differences among replicate runs. Our investigation into the robustness of clustering patterns illustrates the importance of carefully considering how data from different platforms are combined and analyzed. We found clear differences in the topologies inferred, and certain methods performed significantly better than others for discriminating between the highly clonal organisms investigated here. The methods supported by

  18. Inability of ‘Whole Genome Amplification’ to Improve Success Rates for the Biomolecular Detection of Tuberculosis in Archaeological Samples

    Science.gov (United States)

    Forst, Jannine; Brown, Terence A.

    2016-01-01

    We assessed the ability of whole genome amplification (WGA) to improve the efficiency of downstream polymerase chain reactions (PCRs) directed at ancient DNA (aDNA) of members of the Mycobacterium tuberculosis complex (MTBC). Using extracts from a variety of bones and a tooth from human skeletons with or without lesions indicative of tuberculosis, from multiple time periods, we obtained inconsistent results. We conclude that WGA does not provide any advantage in studies of MTBC aDNA. The sporadic nature of our results are probably due to the fact that WGA is itself a PCR-based procedure which, although designed to deal with fragmented DNA, might be inefficient with the low concentration of templates in an aDNA extract. As such, WGA is subject to similar, if not the same, restrictions as PCR when applied to aDNA. PMID:27654468

  19. Comparing Whole-Genome Sequencing with Sanger Sequencing for spa Typing of Methicillin-Resistant Staphylococcus aureus

    DEFF Research Database (Denmark)

    Bartels, Mette Damkjaer; Petersen, Andreas; Worning, Peder

    2014-01-01

    spa typing of methicillin-resistant Staphylococcus aureus (MRSA) has traditionally been done by PCR amplification and Sanger sequencing of the spa repeat region. At Hvidovre Hospital, Denmark, whole-genome sequencing (WGS) of all MRSA isolates has been performed routinely since January 2013......, and an in-house analysis pipeline determines the spa types. Due to national surveillance, all MRSA isolates are sent to Statens Serum Institut, where the spa type is determined by PCR and Sanger sequencing. The purpose of this study was to evaluate the reliability of the spa types obtained by 150-bp paired......-end Illumina WGS. MRSA isolates from new MRSA patients in 2013 (n = 699) in the capital region of Denmark were included. We found a 97% agreement between spa types obtained by the two methods. All isolates achieved a spa type by both methods. Nineteen isolates differed in spa types by the two methods, in most...

  20. Identification of RAN1 orthologue associated with sex determination through whole genome sequencing analysis in fig (Ficus carica L.).

    Science.gov (United States)

    Mori, Kazuki; Shirasawa, Kenta; Nogata, Hitoshi; Hirata, Chiharu; Tashiro, Kosuke; Habu, Tsuyoshi; Kim, Sangwan; Himeno, Shuichi; Kuhara, Satoru; Ikegami, Hidetoshi

    2017-01-25

    With the aim of identifying sex determinants of fig, we generated the first draft genome sequence of fig and conducted the subsequent analyses. Linkage analysis with a high-density genetic map established by a restriction-site associated sequencing technique, and genome-wide association study followed by whole-genome resequencing analysis identified two missense mutations in RESPONSIVE-TO-ANTAGONIST1 (RAN1) orthologue encoding copper-transporting ATPase completely associated with sex phenotypes of investigated figs. This result suggests that RAN1 is a possible sex determinant candidate in the fig genome. The genomic resources and genetic findings obtained in this study can contribute to general understanding of Ficus species and provide an insight into fig's and plant's sex determination system.

  1. Genetic Diversity and Fingerprint Profiles of Commercial Lentinula edodes Cultivars Based on SSR Markers Developed from the Whole Genome Sequence

    Institute of Scientific and Technical Information of China (English)

    ZHANG Dan; SONG Chunyan; ZHANG Lujun; WU Ping; BAO Dapeng; SHANG Xiaodong; TAN Qi

    2014-01-01

    Lentinula edodes is an important cultivated mushroom in China, and accurate and reliable identification of individual cultivars is a prerequisite for successful cultivation and variety protection.In this study,the whole genome sequence of L.edodes was used to generate 200 simple sequence repeat (SSR) markers for delineating 25 commercial cultivars and for determining their genetic diversity.Our data revealed a relatively high level of genetic similarity among the cultivars,with average,minimum and maximum genetic similarity coefficient values of 0.776,0.567 and 1.000,respectively.Seven SSR primer pairs delineated eleven of the cultivars (Cr-02,Minfeng-1,Xianggu 241-4,Senyuan-1,Senyuan-8404,Xiang-9,Guangxiang-51,Huaxiang-5,L952,L9319 and L808)based on their unique multilocus SSR fingerprint profiles.

  2. Species-wide whole genome sequencing reveals historical global spread and recent local persistence in Shigella flexneri.

    Science.gov (United States)

    Connor, Thomas R; Barker, Clare R; Baker, Kate S; Weill, François-Xavier; Talukder, Kaisar Ali; Smith, Anthony M; Baker, Stephen; Gouali, Malika; Pham Thanh, Duy; Jahan Azmi, Ishrat; Dias da Silveira, Wanderley; Semmler, Torsten; Wieler, Lothar H; Jenkins, Claire; Cravioto, Alejandro; Faruque, Shah M; Parkhill, Julian; Wook Kim, Dong; Keddy, Karen H; Thomson, Nicholas R

    2015-08-04

    Shigella flexneri is the most common cause of bacterial dysentery in low-income countries. Despite this, S. flexneri remains largely unexplored from a genomic standpoint and is still described using a vocabulary based on serotyping reactions developed over half-a-century ago. Here we combine whole genome sequencing with geographical and temporal data to examine the natural history of the species. Our analysis subdivides S. flexneri into seven phylogenetic groups (PGs); each containing two-or-more serotypes and characterised by distinct virulence gene complement and geographic range. Within the S. flexneri PGs we identify geographically restricted sub-lineages that appear to have persistently colonised regions for many decades to over 100 years. Although we found abundant evidence of antimicrobial resistance (AMR) determinant acquisition, our dataset shows no evidence of subsequent intercontinental spread of antimicrobial resistant strains. The pattern of colonisation and AMR gene acquisition suggest that S. flexneri has a distinct life-cycle involving local persistence.

  3. Whole-genome amplification: a useful approach to characterize new genes in unculturable protozoan parasites such as Bonamia exitiosa.

    Science.gov (United States)

    Prado-Alvarez, Maria; Couraleau, Yann; Chollet, Bruno; Tourbiez, Delphine; Arzul, Isabelle

    2015-10-01

    Bonamia exitiosa is an intracellular parasite (Haplosporidia) that has been associated with mass mortalities in oyster populations in the Southern hemisphere. This parasite was recently detected in the Northern hemisphere including Europe. Some representatives of the Bonamia genus have not been well categorized yet due to the lack of genomic information. In the present work, we have applied Whole-Genome Amplification (WGA) technique in order to characterize the actin gene in the unculturable protozoan B. exitiosa. This is the first protein coding gene described in this species. Molecular analysis revealed that B. exitiosa actin is more similar to Bonamia ostreae actin gene-1. Actin phylogeny placed the Bonamia sp. infected oysters in the same clade where the herein described B. exitiosa actin resolved, offering novel information about the classification of the genus. Our results showed that WGA methodology is a promising and valuable technique to be applied to unculturable protozoans whose genomic material is limited.

  4. Transmission of Methicillin-Resistant Staphylococcus aureus via Deceased Donor Liver Transplantation Confirmed by Whole Genome Sequencing

    Science.gov (United States)

    Altman, D. R.; Sebra, R.; Hand, J.; Attie, O.; Deikus, G.; Carpini, K. W. D.; Patel, G.; Rana, M.; Arvelakis, A.; Grewal, P.; Dutta, J.; Rose, H.; Shopsin, B.; Daefler, S.; Schadt, E.; Kasarskis, A.; van Bakel, H.; Bashir, A.; Huprikar, S.

    2015-01-01

    Donor-derived bacterial infection is a recognized complication of solid organ transplantation (SOT). The present report describes the clinical details and successful outcome in a liver transplant recipient despite transmission of methicillin-resistant Staphylococcus aureus (MRSA) from a deceased donor with MRSA endocarditis and bacteremia. We further describe whole genome sequencing (WGS) and complete de novo assembly of the donor and recipient MRSA isolate genomes, which confirms that both isolates are genetically 100% identical. We propose that similar application of WGS techniques to future investigations of donor bacterial transmission would strengthen the definition of proven bacterial transmission in SOT, particularly in the presence of highly clonal bacteria such as MRSA. WGS will further improve our understanding of the epidemiology of bacterial transmission in SOT and the risk of adverse patient outcomes when it occurs. PMID:25250641

  5. Information theory-based algorithm for in silico prediction of PCR products with whole genomic sequences as templates

    Directory of Open Access Journals (Sweden)

    He Junjian

    2005-07-01

    Full Text Available Abstract Background A new algorithm for assessing similarity between primer and template has been developed based on the hypothesis that annealing of primer to template is an information transfer process. Results Primer sequence is converted to a vector of the full potential hydrogen numbers (3 for G or C, 2 for A or T, while template sequence is converted to a vector of the actual hydrogen bond numbers formed after primer annealing. The former is considered as source information and the latter destination information. An information coefficient is calculated as a measure for fidelity of this information transfer process and thus a measure of similarity between primer and potential annealing site on template. Conclusion Successful prediction of PCR products from whole genomic sequences with a computer program based on the algorithm demonstrated the potential of this new algorithm in areas like in silico PCR and gene finding.

  6. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products.

    Science.gov (United States)

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-07-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.

  7. A rural worker infected with a bovine-prevalent genotype of Campylobacter fetus subsp. fetus supports zoonotic transmission and inconsistency of MLST and whole-genome typing.

    Science.gov (United States)

    Iraola, G; Betancor, L; Calleros, L; Gadea, P; Algorta, G; Galeano, S; Muxi, P; Greif, G; Pérez, R

    2015-08-01

    Whole-genome characterisation in clinical microbiology enables to detect trends in infection dynamics and disease transmission. Here, we report a case of bacteraemia due to Campylobacter fetus subsp. fetus in a rural worker under cancer treatment that was diagnosed with cellulitis; the patient was treated with antibiotics and recovered. The routine typing methods were not able to identify the microorganism causing the infection, so it was further analysed by molecular methods and whole-genome sequencing. The multi-locus sequence typing (MLST) revealed the presence of the bovine-associated ST-4 genotype. Whole-genome comparisons with other C. fetus strains revealed an inconsistent phylogenetic position based on the core genome, discordant with previous ST-4 strains. To the best of our knowledge, this is the first C. fetus subsp. fetus carrying the ST-4 isolated from humans and represents a probable case of zoonotic transmission from cattle.

  8. Whole-genome sequences of influenza A(H3N2 viruses isolated from Brazilian patients with mild illness during the 2014 season

    Directory of Open Access Journals (Sweden)

    Paola Cristina Resende

    2015-02-01

    Full Text Available The influenza A(H3N2 virus has circulated worldwide for almost five decades and is the dominant subtype in most seasonal influenza epidemics, as occurred in the 2014 season in South America. In this study we evaluate five whole genome sequences of influenza A(H3N2 viruses detected in patients with mild illness collected from January-March 2014. To sequence the genomes, a new generation sequencing (NGS protocol was performed using the Ion Torrent PGM platform. In addition to analysing the common genes, haemagglutinin, neuraminidase and matrix, our work also comprised internal genes. This was the first report of a whole genome analysis with Brazilian influenza A(H3N2 samples. Considerable amino acid variability was encountered in all gene segments, demonstrating the importance of studying the internal genes. NGS of whole genomes in this study will facilitate deeper virus characterisation, contributing to the improvement of influenza strain surveillance in Brazil.

  9. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Directory of Open Access Journals (Sweden)

    Sathishkumar Natarajan

    Full Text Available Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L. and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs, 1.9 million InDels, and 182,398 putative structural variations (SVs. Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. A