WorldWideScience

Sample records for array-based whole-genome survey

  1. Comparison of buccal and blood-derived canine DNA, either native or whole genome amplified, for array-based genome-wide association studies

    Directory of Open Access Journals (Sweden)

    Lawley Cynthia

    2011-06-01

    Full Text Available Abstract Background The availability of array-based genotyping platforms for single nucleotide polymorphisms (SNPs for the canine genome has expanded the opportunities to undertake genome-wide association (GWA studies to identify the genetic basis for Mendelian and complex traits. Whole blood as the source of high quality DNA is undisputed but often proves impractical for collection of the large numbers of samples necessary to discover the loci underlying complex traits. Further, many countries prohibit the collection of blood from dogs unless medically necessary thereby restricting access to critical control samples from healthy dogs. Alternate sources of DNA, typically from buccal cytobrush extractions, while convenient, have been suggested to have low yield and perform poorly in GWA. Yet buccal cytobrushes provide a cost-effective means of collecting DNA, are readily accepted by dog owners, and represent a large resource base in many canine genetics laboratories. To increase the DNA quantities, whole genome amplification (WGA can be performed. Thus, the present study assessed the utility of buccal-derived DNA as well as whole genome amplification in comparison to blood samples for use on the most recent iteration of the canine HD SNP array (Illumina. Findings In both buccal and blood samples, whether whole genome amplified or not, 97% of the samples had SNP call rates in excess of 80% indicating that the vast majority of the SNPs would be suitable to perform association studies regardless of the DNA source. Similarly, there were no significant differences in marker intensity measurements between buccal and blood samples for copy number variations (CNV analysis. Conclusions All DNA samples assayed, buccal or blood, native or whole genome amplified, are appropriate for use in array-based genome-wide association studies. The concordance between subsets of dogs for which both buccal and blood samples, or those samples whole genome amplified, was

  2. Whole-genome survey of the putative ATP-binding cassette transporter family genes in Vitis vinifera.

    Science.gov (United States)

    Çakır, Birsen; Kılıçkaya, Ozan

    2013-01-01

    The ATP-binding cassette (ABC) protein superfamily constitutes one of the largest protein families known in plants. In this report, we performed a complete inventory of ABC protein genes in Vitis vinifera, the whole genome of which has been sequenced. By comparison with ABC protein members of Arabidopsis thaliana, we identified 135 putative ABC proteins with 1 or 2 NBDs in V. vinifera. Of these, 120 encode intrinsic membrane proteins, and 15 encode proteins missing TMDs. V. vinifera ABC proteins can be divided into 13 subfamilies with 79 "full-size," 41 "half-size," and 15 "soluble" putative ABC proteins. The main feature of the Vitis ABC superfamily is the presence of 2 large subfamilies, ABCG (pleiotropic drug resistance and white-brown complex homolog) and ABCC (multidrug resistance-associated protein). We identified orthologs of V. vinifera putative ABC transporters in different species. This work represents the first complete inventory of ABC transporters in V. vinifera. The identification of Vitis ABC transporters and their comparative analysis with the Arabidopsis counterparts revealed a strong conservation between the 2 species. This inventory could help elucidate the biological and physiological functions of these transporters in V. vinifera.

  3. Whole-genome survey of the putative ATP-binding cassette transporter family genes in Vitis vinifera.

    Directory of Open Access Journals (Sweden)

    Birsen Çakır

    Full Text Available The ATP-binding cassette (ABC protein superfamily constitutes one of the largest protein families known in plants. In this report, we performed a complete inventory of ABC protein genes in Vitis vinifera, the whole genome of which has been sequenced. By comparison with ABC protein members of Arabidopsis thaliana, we identified 135 putative ABC proteins with 1 or 2 NBDs in V. vinifera. Of these, 120 encode intrinsic membrane proteins, and 15 encode proteins missing TMDs. V. vinifera ABC proteins can be divided into 13 subfamilies with 79 "full-size," 41 "half-size," and 15 "soluble" putative ABC proteins. The main feature of the Vitis ABC superfamily is the presence of 2 large subfamilies, ABCG (pleiotropic drug resistance and white-brown complex homolog and ABCC (multidrug resistance-associated protein. We identified orthologs of V. vinifera putative ABC transporters in different species. This work represents the first complete inventory of ABC transporters in V. vinifera. The identification of Vitis ABC transporters and their comparative analysis with the Arabidopsis counterparts revealed a strong conservation between the 2 species. This inventory could help elucidate the biological and physiological functions of these transporters in V. vinifera.

  4. Proficiency testing for bacterial whole genome sequencing: an end-user survey of current capabilities, requirements and priorities

    DEFF Research Database (Denmark)

    Moran-Gilad, Jacob; Sintchenko, Vitali; Karlsmose Pedersen, Susanne

    2015-01-01

    Group 4 among GMI members in order to ascertain NGS end-use requirements and attitudes towards NGS PT. The survey identified the high professional diversity of laboratories engaged in NGS-based public health projects and the wide range of capabilities within institutions, at a notable range of costs...

  5. Copy Number Variation Analysis by Array Analysis of Single Cells Following Whole Genome Amplification.

    Science.gov (United States)

    Dimitriadou, Eftychia; Zamani Esteki, Masoud; Vermeesch, Joris Robert

    2015-01-01

    Whole genome amplification is required to ensure the availability of sufficient material for copy number variation analysis of a genome deriving from an individual cell. Here, we describe the protocols we use for copy number variation analysis of non-fixed single cells by array-based approaches following single-cell isolation and whole genome amplification. We are focusing on two alternative protocols, an isothermal and a PCR-based whole genome amplification method, followed by either comparative genome hybridization (aCGH) or SNP array analysis, respectively.

  6. Pathogenic Mutations in Cancer-Predisposing Genes: A Survey of 300 Patients with Whole-Genome Sequencing and Lifetime Electronic Health Records

    Science.gov (United States)

    He, Karen Y.; McPherson, Elizabeth W.; Li, Quan; Xia, Fan; Weng, Chunhua; Wang, Kai

    2016-01-01

    Background It is unclear whether and how whole-genome sequencing (WGS) data can be used to implement genomic medicine. Our objective is to retrospectively evaluate whether WGS can facilitate improving prevention and care for patients with susceptibility to cancer syndromes. Methods and Findings We analyzed genetic mutations in 60 autosomal dominant cancer-predisposition genes in 300 deceased patients with WGS data and nearly complete long-term (over 30 years) medical records. To infer biological insights from massive amounts of WGS data and comprehensive clinical data in a short period of time, we developed an in-house analysis pipeline within the SeqHBase software framework to quickly identify pathogenic or likely pathogenic variants. The clinical data of the patients who carried pathogenic and/or likely pathogenic variants were further reviewed to assess their clinical conditions using their lifetime EHRs. Among the 300 participants, 5 (1.7%) carried pathogenic or likely pathogenic variants in 5 cancer-predisposing genes: one in APC, BRCA1, BRCA2, NF1, and TP53 each. When assessing the clinical data, each of the 5 patients had one or more different types of cancers, fully consistent with their genetic profiles. Among these 5 patients, 2 died due to cancer while the others had multiple disorders later in their lifetimes; however, they may have benefited from early diagnosis and treatment for healthier lives, had the patients had genetic testing in their earlier lifetimes. Conclusions We demonstrated a case study where the discovery of pathogenic or likely pathogenic germline mutations from population-wide WGS correlates with clinical outcome. The use of WGS may have clinical impacts to improve healthcare delivery. PMID:27930734

  7. Bioinformatics for whole-genome shotgun sequencing of microbial communities.

    Directory of Open Access Journals (Sweden)

    Kevin Chen

    2005-07-01

    Full Text Available The application of whole-genome shotgun sequencing to microbial communities represents a major development in metagenomics, the study of uncultured microbes via the tools of modern genomic analysis. In the past year, whole-genome shotgun sequencing projects of prokaryotic communities from an acid mine biofilm, the Sargasso Sea, Minnesota farm soil, three deep-sea whale falls, and deep-sea sediments have been reported, adding to previously published work on viral communities from marine and fecal samples. The interpretation of this new kind of data poses a wide variety of exciting and difficult bioinformatics problems. The aim of this review is to introduce the bioinformatics community to this emerging field by surveying existing techniques and promising new approaches for several of the most interesting of these computational problems.

  8. Metabolic Adaptation after Whole Genome Duplication

    NARCIS (Netherlands)

    Hoek, M.J.A. van; Hogeweg, P.

    2009-01-01

    Whole genome duplications (WGDs) have been hypothesized to be responsible for major transitions in evolution. However, the effects of WGD and subsequent gene loss on cellular behavior and metabolism are still poorly understood. Here we develop a genome scale evolutionary model to study the dynamics

  9. Whole genome phylogenies for multiple Drosophila species

    Directory of Open Access Journals (Sweden)

    Seetharam Arun

    2012-12-01

    Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between

  10. Whole genome sequencing analysis of Plasmodium vivax using whole genome capture

    Directory of Open Access Journals (Sweden)

    Bright A

    2012-06-01

    Full Text Available Abstract Background Malaria caused by Plasmodium vivax is an experimentally neglected severe disease with a substantial burden on human health. Because of technical limitations, little is known about the biology of this important human pathogen. Whole genome analysis methods on patient-derived material are thus likely to have a substantial impact on our understanding of P. vivax pathogenesis and epidemiology. For example, it will allow study of the evolution and population biology of the parasite, allow parasite transmission patterns to be characterized, and may facilitate the identification of new drug resistance genes. Because parasitemias are typically low and the parasite cannot be readily cultured, on-site leukocyte depletion of blood samples is typically needed to remove human DNA that may be 1000X more abundant than parasite DNA. These features have precluded the analysis of archived blood samples and require the presence of laboratories in close proximity to the collection of field samples for optimal pre-cryopreservation sample preparation. Results Here we show that in-solution hybridization capture can be used to extract P. vivax DNA from human contaminating DNA in the laboratory without the need for on-site leukocyte filtration. Using a whole genome capture method, we were able to enrich P. vivax DNA from bulk genomic DNA from less than 0.5% to a median of 55% (range 20%-80%. This level of enrichment allows for efficient analysis of the samples by whole genome sequencing and does not introduce any gross biases into the data. With this method, we obtained greater than 5X coverage across 93% of the P. vivax genome for four P. vivax strains from Iquitos, Peru, which is similar to our results using leukocyte filtration (greater than 5X coverage across 96% . Conclusion The whole genome capture technique will enable more efficient whole genome analysis of P. vivax from a larger geographic region and from valuable archived sample collections.

  11. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  12. Strategies and tools for whole genome alignments

    Energy Technology Data Exchange (ETDEWEB)

    Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas; Ishkhanov,Tigran; Ryaboy, Dmitriy; Rubin, Edward; Pachter, Lior; Dubchak, Inna

    2002-11-25

    The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With a view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.

  13. Microbial species delineation using whole genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  14. Whole Genome Epidemiological Typing of Salmonella

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas

    Salmonella is one of the most common foodborne pathogens worldwide. In the US alone, salmonellosis was estimated to cause 1.4 million cases effecting 17,000 hospitalization and almost 600 deaths each year. Particularly, Salmonella enterica is a common cause of minor and large food borne outbreaks....... Technological advances and effective price in high throughput genome sequencing are making whole genome sequencing (WGS) available as a routine tool for bacterial typing. Typing of Salmonella, especially sub-typing within the same serotype or even the same clone, the genetic variation of the target genes being...... available Salmonella enterica genomes (accessed in April 2011). A consensus tree based on variation of the core genes gives better resolution than 16S rRNA and MLST that rarely provide separation between closely related strains. The performance of the pan-genome tree which is based on the presence...

  15. Whole genome analysis of a Vietnamese trio

    Indian Academy of Sciences (India)

    Dang Thanh Hai; Nguyen Dai Thanh; Pham Thi Minh Trang; Le Si Quang; Phan Thi Thu Hang; Dang Cao Cuong; Hoang Kim Phuc; Nguyen Huu Duc; Do Duc Dong; Bui Quang Minh; Pham Bao Son; Le Sy Vinh

    2015-03-01

    We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome.We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥ 300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.

  16. BSMAP: whole genome bisulfite sequence MAPping program

    Directory of Open Access Journals (Sweden)

    Li Wei

    2009-07-01

    Full Text Available Abstract Background Bisulfite sequencing is a powerful technique to study DNA cytosine methylation. Bisulfite treatment followed by PCR amplification specifically converts unmethylated cytosines to thymine. Coupled with next generation sequencing technology, it is able to detect the methylation status of every cytosine in the genome. However, mapping high-throughput bisulfite reads to the reference genome remains a great challenge due to the increased searching space, reduced complexity of bisulfite sequence, asymmetric cytosine to thymine alignments, and multiple CpG heterogeneous methylation. Results We developed an efficient bisulfite reads mapping algorithm BSMAP to address the above issues. BSMAP combines genome hashing and bitwise masking to achieve fast and accurate bisulfite mapping. Compared with existing bisulfite mapping approaches, BSMAP is faster, more sensitive and more flexible. Conclusion BSMAP is the first general-purpose bisulfite mapping software. It is able to map high-throughput bisulfite reads at whole genome level with feasible memory and CPU usage. It is freely available under GPL v3 license at http://code.google.com/p/bsmap/.

  17. Small Sample Whole-Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Hara, C A; Nguyen, C P; Wheeler, E K; Sorensen, K J; Arroyo, E S; Vrankovich, G P; Christian, A T

    2005-09-20

    Many challenges arise when trying to amplify and analyze human samples collected in the field due to limitations in sample quantity, and contamination of the starting material. Tests such as DNA fingerprinting and mitochondrial typing require a certain sample size and are carried out in large volume reactions; in cases where insufficient sample is present whole genome amplification (WGA) can be used. WGA allows very small quantities of DNA to be amplified in a way that enables subsequent DNA-based tests to be performed. A limiting step to WGA is sample preparation. To minimize the necessary sample size, we have developed two modifications of WGA: the first allows for an increase in amplified product from small, nanoscale, purified samples with the use of carrier DNA while the second is a single-step method for cleaning and amplifying samples all in one column. Conventional DNA cleanup involves binding the DNA to silica, washing away impurities, and then releasing the DNA for subsequent testing. We have eliminated losses associated with incomplete sample release, thereby decreasing the required amount of starting template for DNA testing. Both techniques address the limitations of sample size by providing ample copies of genomic samples. Carrier DNA, included in our WGA reactions, can be used when amplifying samples with the standard purification method, or can be used in conjunction with our single-step DNA purification technique to potentially further decrease the amount of starting sample necessary for future forensic DNA-based assays.

  18. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  19. Rapid whole genome sequencing and precision neonatology.

    Science.gov (United States)

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care.

  20. Comparative analysis of whole genome structure of Streptococcus suis using whole genome PCR scanning

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An outbreak associated with Streptococcus suis infection in humans emerged in Sichuan province, China in 2005. The outbreak is atypical for the apparent large number of human cases, high fatality rate and geographical spread. To determine whether the bacterium has changed, we compared both human and animal isolates from the Sichuan outbreak with those collected previously within China and in other countries using whole genome PCR scanning (WGPScaning) comparative sequencing of several known virulence factor genes and multilocus sequence typing (MLST) analysis. WGPScanning analysis showed that all primer pairs yielded PCR products of the expected sizes in all four strains tested. The nucleotide sequences of all the detected virulence factor genes are identical in the four strains and MLST results showed that the four isolates studied and reference strain all belonged to the ST1 com-plex. No new genetic changes were found in the genome structure of the isolates from this Sichuan outbreak.

  1. Comparative analysis of whole genome structure of Streptococcus suis using whole genome PCR scanning

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An outbreak associated with Streptococcus suis infection in humans emerged in Sichuan province, China in 2005. The outbreak is atypical for the apparent large number of human cases, high fatality rate and geographical spread. To determine whether the bacterium has changed, we compared both human and animal isolates from the Sichuan outbreak with those collected previously within China and in other countries using whole genome PCR scanning (WGPScaning) comparative sequencing of several known virulence factor genes and multilocus sequence typing (MLST) analysis. WGPScanning analysis showed that all primer pairs yielded PCR products of the expected sizes in all four strains tested. The nucleotide sequences of all the detected virulence factor genes are identical in the four strains and MLST results showed that the four isolates studied and reference strain all belonged to the ST1 complex. No new genetic changes were found in the genome structure of the isolates from this Sichuan outbreak.

  2. Multiple Whole Genome Alignments Without a Reference Organism

    Energy Technology Data Exchange (ETDEWEB)

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  3. Whole-genome sequence-based analysis of thyroid function

    OpenAIRE

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 1...

  4. Genomic prediction using QTL derived from whole genome sequence data

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    This study investigated the gain in accuracy of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k SNP data. Analyses were performed for Nordic Holstein and Danish Jersey animals, using eithe...

  5. Optimized design and assessment of whole genome tiling arrays.

    NARCIS (Netherlands)

    Graf, S.; Nielsen, F.G.G.; Kurtz, S.; Huynen, M.A.; Birney, E.; Stunnenberg, H.G.; Flicek, P.

    2007-01-01

    MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling arra

  6. Whole-Genome Sequencing of Two Bartonella bacilliformis Strains

    Science.gov (United States)

    Guillen, Yolanda; Casadellà, Maria; García-de-la-Guarda, Ruth; Espinoza-Culupú, Abraham; Paredes, Roger; Ruiz, Joaquim

    2016-01-01

    Bartonella bacilliformis is the causative agent of Carrion’s disease, a highly endemic human bartonellosis in Peru. We performed a whole-genome assembly of two B. bacilliformis strains isolated from the blood of infected patients in the acute phase of Carrion’s disease from the Cusco and Piura regions in Peru. PMID:27389274

  7. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    KAUST Repository

    Neave, Matthew J.

    2014-08-14

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp.

  8. Whole-Genome Sequences of 26 Vibrio cholerae Isolates

    Science.gov (United States)

    Watve, Samit S.; Chande, Aroon T.; Rishishwar, Lavanya; Jordan, I. King

    2016-01-01

    The human pathogen Vibrio cholerae employs several adaptive mechanisms for environmental persistence, including natural transformation and type VI secretion, creating a reservoir for the spread of disease. Here, we report whole-genome sequences of 26 diverse V. cholerae isolates, significantly increasing the sequence diversity of publicly available V. cholerae genomes. PMID:28007852

  9. Whole genome amplification - Review of applications and advances

    Energy Technology Data Exchange (ETDEWEB)

    Hawkins, Trevor L.; Detter, J.C.; Richardson, Paul

    2001-11-15

    The concept of Whole Genome Amplification is something that has arisen in the past few years as modifications to the polymerase chain reaction (PCR) have been adapted to replicate regions of genomes which are of biological interest. The applications here are many--forensics, embryonic disease diagnosis, bio terrorism genome detection, ''imoralization'' of clinical samples, microbial diversity, and genotyping. The key question is if DNA can be replicated a genome at a time without bias or non random distribution of the target. Several papers published in the last year and currently in preparation may lead to the conclusion that whole genome amplification may indeed be possible and therefore open up a new avenue to molecular biology.

  10. Whole-Genome Sequence Assembly for Mammalian Genomes: Arachne 2

    OpenAIRE

    Jaffe, David B.; Butler, Jonathan; Gnerre, Sante; Mauceli, Evan; Lindblad-Toh, Kerstin; Jill P. Mesirov; Michael C Zody; Lander, Eric S.

    2003-01-01

    We previously described the whole-genome assembly program Arachne, presenting assemblies of simulated data for small to mid-sized genomes. Here we describe algorithmic adaptations to the program, allowing for assembly of mammalian-size genomes, and also improving the assembly of smaller genomes. Three principal changes were simultaneously made and applied to the assembly of the mouse genome, during a six-month period of development: (1) Supercontigs (scaffolds) were iteratively broken and rej...

  11. Whole-genome molecular haplotyping of single cells

    OpenAIRE

    Fan, H. Christina; Wang, Jianbin; Potanina, Anastasia; Quake, Stephen R

    2010-01-01

    Conventional experimental methods of studying the human genome are limited by the inability to independently study the combination of alleles, or haplotype, on each of the homologous copies of the chromosomes. We developed a microfluidic device capable of separating and amplifying homologous copies of each chromosome from a single human metaphase cell. Single-nucleotide polymorphism (SNP) array analysis of amplified DNA enabled us to achieve completely deterministic, whole-genome, personal ha...

  12. Whole genome and transcriptome sequencing of a B3 thymoma.

    Directory of Open Access Journals (Sweden)

    Iacopo Petrini

    Full Text Available Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37. Copy number (CN aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs and 2 insertion/deletions (INDELs were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma.

  13. Whole genome sequencing of clinical isolates of Giardia lamblia.

    Science.gov (United States)

    Hanevik, K; Bakken, R; Brattbakk, H R; Saghaug, C S; Langeland, N

    2015-02-01

    Clinical isolates from protozoan parasites such as Giardia lamblia are at present practically impossible to culture. By using simple cyst purification methods, we show that Giardia whole genome sequencing of clinical stool samples is possible. Immunomagnetic separation after sucrose gradient flotation gave superior results compared to sucrose gradient flotation alone. The method enables detailed analysis of a wide range of genes of interest for genotyping, virulence and drug resistance.

  14. Whole-genome shotgun optical mapping of Rhodospirillum rubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, S. [Univ. Wisc.-Madison; Zhou, S. [Univ. Wisc.-Madison; Place, M. [Univ. Wisc.-Madison; Zhang, Y. [Univ. Wisc.-Madison; Briska, A. [Univ. Wisc.-Madison; Goldstein, S. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Forrest, D. [Univ. Wisc.-Madison; Lim, A. [Univ. Wisc.-Madison; Lapidus, A. [Univ. Wisc.-Madison; Han, C. S. [Univ. Wisc.-Madison; Roberts, G. P. [Univ. Wisc.-Madison; Schwartz, D. C. [Univ. Wisc.-Madison

    2005-09-01

    Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.

  15. Whole-genome shotgun optical mapping of rhodospirillumrubrum

    Energy Technology Data Exchange (ETDEWEB)

    Reslewic, Susan; Zhou, Shiguo; Place, Mike; Zhang, Yaoping; Briska, Adam; Goldstein, Steve; Churas, Chris; Runnheim, Rod; Forrest,Dan; Lim, Alex; Lapidus, Alla; Han, Cliff S.; Roberts, Gary P.; Schwartz,David C.

    2004-07-01

    Rhodospirillum rubrum is a phototrophic purple non-sulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems, and as a source of hydrogen and biodegradable plastics production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction maps (Xba I, Nhe I, and Hind III) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction maps from randomly sheared genomic DNA molecules extracted directly from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the Hind III map acted as a scaffold for high resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and validation of genome sequence, our work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a ''molecular cytogenetics'' approach to solving problems in genomic analysis.

  16. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    Directory of Open Access Journals (Sweden)

    David Koslicki

    Full Text Available With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  17. Whole genome microarray analysis, from neonatal blood cards

    Directory of Open Access Journals (Sweden)

    Hogan Michael E

    2009-07-01

    Full Text Available Abstract Background Neonatal blood, obtained from a heel stick and stored dry on paper cards, has been the standard for birth defects screening for 50 years. Such dried blood samples are used, primarily, for analysis of small-molecule analytes. More recently, the DNA complement of such dried blood cards has been used for targeted genetic testing, such as for single nucleotide polymorphism in cystic fibrosis. Expansion of such testing to include polygenic traits, and perhaps whole genome scanning, has been discussed as a formal possibility. However, until now the amount of DNA that might be obtained from such dried blood cards has been limiting, due to inefficient DNA recovery technology. Results A new technology is employed for efficient DNA release from a standard neonatal blood card. Using standard Guthrie cards, stored an average of ten years post-collection, about 1/40th of the air-dried neonatal blood specimen (two 3 mm punches was processed to obtain DNA that was sufficient in mass and quality for direct use in microarray-based whole genome scanning. Using that same DNA release technology, it is also shown that approximately 1/250th of the original purified DNA (about 1 ng could be subjected to whole genome amplification, thus yielding an additional microgram of amplified DNA product. That amplified DNA product was then used in microarray analysis and yielded statistical concordance of 99% or greater to the primary, unamplified DNA sample. Conclusion Together, these data suggest that DNA obtained from less than 10% of a standard neonatal blood specimen, stored dry for several years on a Guthrie card, can support a program of genome-wide neonatal genetic testing.

  18. The potential of whole genome NGS for infectious disease diagnosis.

    Science.gov (United States)

    Lecuit, Marc; Eloit, Marc

    2015-01-01

    Non-targeted identification of microbes is now possible directly in biological samples, based on whole-genome-NGS (WG-NGS) techniques that allow deep sequencing of nucleic acids, data mining and sorting out of sequences of pathogens without any a priori hypothesis. WG-NGS was first only used as a research tool due to its cost, complexity and lack of standardization. Recent improvements in sample preparation and bioinformatics pipelines and decrease in cost now allow actionable diagnostics in patients. The potency and limits of WG-NGS and possible future indications are discussed here. WG-NGS will likely soon become a standard procedure in microbiological diagnosis.

  19. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    Energy Technology Data Exchange (ETDEWEB)

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  20. Whole genome comparison of donor and cloned dogs.

    Science.gov (United States)

    Kim, Hak-Min; Cho, Yun Sung; Kim, Hyunmin; Jho, Sungwoong; Son, Bongjun; Choi, Joung Yoon; Kim, Sangsoo; Lee, Byeong Chun; Bhak, Jong; Jang, Goo

    2013-10-21

    Cloning is a process that produces genetically identical organisms. However, the genomic degree of genetic resemblance in clones needs to be determined. In this report, the genomes of a cloned dog and its donor were compared. Compared with a human monozygotic twin, the genome of the cloned dog showed little difference from the genome of the nuclear donor dog in terms of single nucleotide variations, chromosomal instability, and telomere lengths. These findings suggest that cloning by somatic cell nuclear transfer produced an almost identical genome. The whole genome sequence data of donor and cloned dogs can provide a resource for further investigations on epigenetic contributions in phenotypic differences.

  1. Deep whole-genome sequencing of 100 southeast Asian Malays.

    Science.gov (United States)

    Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2013-01-10

    Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

  2. Improving pan-genome annotation using whole genome multiple alignment

    Directory of Open Access Journals (Sweden)

    Salzberg Steven L

    2011-06-01

    Full Text Available Abstract Background Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. Results We introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review. Conclusions Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.

  3. Whole genome amplification and its impact on CGH array profiles

    Directory of Open Access Journals (Sweden)

    Meldrum Cliff

    2008-07-01

    Full Text Available Abstract Background Some array comparative genomic hybridisation (array CGH platforms require a minimum of micrograms of DNA for the generation of reliable and reproducible data. For studies where there are limited amounts of genetic material, whole genome amplification (WGA is an attractive method for generating sufficient quantities of genomic material from miniscule amounts of starting material. A range of WGA methods are available and the multiple displacement amplification (MDA approach has been shown to be highly accurate, although amplification bias has been reported. In the current study, WGA was used to amplify DNA extracted from whole blood. In total, six array CGH experiments were performed to investigate whether the use of whole genome amplified DNA (wgaDNA produces reliable and reproducible results. Four experiments were conducted on amplified DNA compared to unamplified DNA and two experiments on unamplified DNA compared to unamplified DNA. Findings All the experiments involving wgaDNA resulted in a high proportion of losses and gains of genomic material. Previously, amplification bias has been overcome by using amplified DNA in both the test and reference DNA. Our data suggests that this approach may not be effective, as the gains and losses introduced by WGA appears to be random and are not reproducible between different experiments using the same DNA. Conclusion In light of these findings, the use of both amplified test and reference DNA on CGH arrays may not provide an accurate representation of copy number variation in the DNA.

  4. Nitrogen regulation in Sinorhizobium meliloti probed with whole genome arrays.

    Science.gov (United States)

    Davalos, Marcela; Fourment, Joëlle; Lucas, Antoine; Bergès, Hélène; Kahn, Daniel

    2004-12-01

    Using whole genome arrays, we systematically investigated nitrogen regulation in the plant symbiotic bacterium Sinorhizobium meliloti. The use of glutamate instead of ammonium as a nitrogen source induced nitrogen catabolic genes independently of the carbon source, including two glutamine synthetase genes, various aminoacid transporters and the glnKamtB operon. These responses depended on both the ntrC and glnB nitrogen regulators. Glutamate repressible genes included glutamate synthase and a H+-translocating pyrophosphate synthase. The smc01041-ntrBC operon was negatively autoregulated in a glnB-dependent fashion, indicating an involvement of phosphorylated NtrC. In addition to the nitrogen response, glutamate remodelled expression of carbon metabolism by inhibiting expression of the Entner-Doudoroff and pentose phosphate pathways, and by stimulating gluconeogenetic genes independently of ntrC.

  5. Origin of the Yeast Whole-Genome Duplication.

    Directory of Open Access Journals (Sweden)

    Kenneth H Wolfe

    2015-08-01

    Full Text Available Whole-genome duplications (WGDs are rare evolutionary events with profound consequences. They double an organism's genetic content, immediately creating a reproductive barrier between it and its ancestors and providing raw material for the divergence of gene functions between paralogs. Almost all eukaryotic genome sequences bear evidence of ancient WGDs, but the causes of these events and the timing of intermediate steps have been difficult to discern. One of the best-characterized WGDs occurred in the lineage leading to the baker's yeast Saccharomyces cerevisiae. Marcet-Houben and Gabaldón now show that, rather than simply doubling the DNA of a single ancestor, the yeast WGD likely involved mating between two different ancestral species followed by a doubling of the genome to restore fertility.

  6. Whole genome sequencing in clinical and public health microbiology.

    Science.gov (United States)

    Kwong, J C; McCallum, N; Sintchenko, V; Howden, B P

    2015-04-01

    Genomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology.The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology.Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories.As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future.Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.

  7. Whole-genome sequencing reveals oncogenic mutations in mycosis fungoides.

    Science.gov (United States)

    McGirt, Laura Y; Jia, Peilin; Baerenwald, Devin A; Duszynski, Robert J; Dahlman, Kimberly B; Zic, John A; Zwerner, Jeffrey P; Hucks, Donald; Dave, Utpal; Zhao, Zhongming; Eischen, Christine M

    2015-07-23

    The pathogenesis of mycosis fungoides (MF), the most common cutaneous T-cell lymphoma (CTCL), is unknown. Although genetic alterations have been identified, none are considered consistently causative in MF. To identify potential drivers of MF, we performed whole-genome sequencing of MF tumors and matched normal skin. Targeted ultra-deep sequencing of MF samples and exome sequencing of CTCL cell lines were also performed. Multiple mutations were identified that affected the same pathways, including epigenetic, cell-fate regulation, and cytokine signaling, in MF tumors and CTCL cell lines. Specifically, interleukin-2 signaling pathway mutations, including activating Janus kinase 3 (JAK3) mutations, were detected. Treatment with a JAK3 inhibitor significantly reduced CTCL cell survival. Additionally, the mutation data identified 2 other potential contributing factors to MF, ultraviolet light, and a polymorphism in the tumor suppressor p53 (TP53). Therefore, genetic alterations in specific pathways in MF were identified that may be viable, effective new targets for treatment.

  8. A whole genome RNAi screen identifies replication stress response genes.

    Science.gov (United States)

    Kavanaugh, Gina; Ye, Fei; Mohni, Kareem N; Luzwick, Jessica W; Glick, Gloria; Cortez, David

    2015-11-01

    Proper DNA replication is critical to maintain genome stability. When the DNA replication machinery encounters obstacles to replication, replication forks stall and the replication stress response is activated. This response includes activation of cell cycle checkpoints, stabilization of the replication fork, and DNA damage repair and tolerance mechanisms. Defects in the replication stress response can result in alterations to the DNA sequence causing changes in protein function and expression, ultimately leading to disease states such as cancer. To identify additional genes that control the replication stress response, we performed a three-parameter, high content, whole genome siRNA screen measuring DNA replication before and after a challenge with replication stress as well as a marker of checkpoint kinase signalling. We identified over 200 replication stress response genes and subsequently analyzed how they influence cellular viability in response to replication stress. These data will serve as a useful resource for understanding the replication stress response.

  9. Plantagora: modeling whole genome sequencing and assembly of plant genomes.

    Directory of Open Access Journals (Sweden)

    Roger Barthelson

    Full Text Available BACKGROUND: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. METHODOLOGY/PRINCIPAL FINDINGS: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. CONCLUSIONS/SIGNIFICANCE: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly

  10. Psychological and behavioural impact of returning personal results from whole-genome sequencing: the HealthSeq project.

    Science.gov (United States)

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Zinberg, Randi; Wasserstein, Melissa; Kasarskis, Andrew; Diaz, George A; Schadt, Eric E

    2017-02-01

    Providing ostensibly healthy individuals with personal results from whole-genome sequencing could lead to improved health and well-being via enhanced disease risk prediction, prevention, and diagnosis, but also poses practical and ethical challenges. Understanding how individuals react psychologically and behaviourally will be key in assessing the potential utility of personal whole-genome sequencing. We conducted an exploratory longitudinal cohort study in which quantitative surveys and in-depth qualitative interviews were conducted before and after personal results were returned to individuals who underwent whole-genome sequencing. The participants were offered a range of interpreted results, including Alzheimer's disease, type 2 diabetes, pharmacogenomics, rare disease-associated variants, and ancestry. They were also offered their raw data. Of the 35 participants at baseline, 29 (82.9%) completed the 6-month follow-up. In the quantitative surveys, test-related distress was low, although it was higher at 1-week than 6-month follow-up (Z=2.68, P=0.007). In the 6-month qualitative interviews, most participants felt happy or relieved about their results. A few were concerned, particularly about rare disease-associated variants and Alzheimer's disease results. Two of the 29 participants had sought clinical follow-up as a direct or indirect consequence of rare disease-associated variants results. Several had mentioned their results to their doctors. Some participants felt having their raw data might be medically useful to them in the future. The majority reported positive reactions to having their genomes sequenced, but there were notable exceptions to this. The impact and value of returning personal results from whole-genome sequencing when implemented on a larger scale remains to be seen.

  11. Psychological and behavioural impact of returning personal results from whole-genome sequencing: the HealthSeq project

    Science.gov (United States)

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Zinberg, Randi; Wasserstein, Melissa; Kasarskis, Andrew; Diaz, George A; Schadt, Eric E

    2017-01-01

    Providing ostensibly healthy individuals with personal results from whole-genome sequencing could lead to improved health and well-being via enhanced disease risk prediction, prevention, and diagnosis, but also poses practical and ethical challenges. Understanding how individuals react psychologically and behaviourally will be key in assessing the potential utility of personal whole-genome sequencing. We conducted an exploratory longitudinal cohort study in which quantitative surveys and in-depth qualitative interviews were conducted before and after personal results were returned to individuals who underwent whole-genome sequencing. The participants were offered a range of interpreted results, including Alzheimer's disease, type 2 diabetes, pharmacogenomics, rare disease-associated variants, and ancestry. They were also offered their raw data. Of the 35 participants at baseline, 29 (82.9%) completed the 6-month follow-up. In the quantitative surveys, test-related distress was low, although it was higher at 1-week than 6-month follow-up (Z=2.68, P=0.007). In the 6-month qualitative interviews, most participants felt happy or relieved about their results. A few were concerned, particularly about rare disease-associated variants and Alzheimer's disease results. Two of the 29 participants had sought clinical follow-up as a direct or indirect consequence of rare disease-associated variants results. Several had mentioned their results to their doctors. Some participants felt having their raw data might be medically useful to them in the future. The majority reported positive reactions to having their genomes sequenced, but there were notable exceptions to this. The impact and value of returning personal results from whole-genome sequencing when implemented on a larger scale remains to be seen. PMID:28051073

  12. Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Plant Ramona N

    2006-08-01

    Full Text Available Abstract Background Whole genome amplification is an increasingly common technique through which minute amounts of DNA can be multiplied to generate quantities suitable for genetic testing and analysis. Questions of amplification-induced error and template bias generated by these methods have previously been addressed through either small scale (SNPs or large scale (CGH array, FISH methodologies. Here we utilized whole genome sequencing to assess amplification-induced bias in both coding and non-coding regions of two bacterial genomes. Halobacterium species NRC-1 DNA and Campylobacter jejuni were amplified by several common, commercially available protocols: multiple displacement amplification, primer extension pre-amplification and degenerate oligonucleotide primed PCR. The amplification-induced bias of each method was assessed by sequencing both genomes in their entirety using the 454 Sequencing System technology and comparing the results with those obtained from unamplified controls. Results All amplification methodologies induced statistically significant bias relative to the unamplified control. For the Halobacterium species NRC-1 genome, assessed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 119 times greater than those from unamplified material, 164.0 times greater for Repli-G, 165.0 times greater for PEP-PCR and 252.0 times greater than the unamplified controls for DOP-PCR. For Campylobacter jejuni, also analyzed at 100 base resolution, the D-statistics from GenomiPhi-amplified material were 15 times greater than those from unamplified material, 19.8 times greater for Repli-G, 61.8 times greater for PEP-PCR and 220.5 times greater than the unamplified controls for DOP-PCR. Conclusion Of the amplification methodologies examined in this paper, the multiple displacement amplification products generated the least bias, and produced significantly higher yields of amplified DNA.

  13. Bioinformatics tools and databases for whole genome sequence analysis of Mycobacterium tuberculosis.

    Science.gov (United States)

    Faksri, Kiatichai; Tan, Jun Hao; Chaiprasert, Angkana; Teo, Yik-Ying; Ong, Rick Twee-Hee

    2016-11-01

    Tuberculosis (TB) is an infectious disease of global public health importance caused by Mycobacterium tuberculosis complex (MTC) in which M. tuberculosis (Mtb) is the major causative agent. Recent advancements in genomic technologies such as next generation sequencing have enabled high throughput cost-effective generation of whole genome sequence information from Mtb clinical isolates, providing new insights into the evolution, genomic diversity and transmission of the Mtb bacteria, including molecular mechanisms of antibiotic resistance. The large volume of sequencing data generated however necessitated effective and efficient management, storage, analysis and visualization of the data and results through development of novel and customized bioinformatics software tools and databases. In this review, we aim to provide a comprehensive survey of the current freely available bioinformatics software tools and publicly accessible databases for genomic analysis of Mtb for identifying disease transmission in molecular epidemiology and in rapid determination of the antibiotic profiles of clinical isolates for prompt and optimal patient treatment.

  14. Whole-genome landscape of pancreatic neuroendocrine tumours.

    Science.gov (United States)

    Scarpa, Aldo; Chang, David K; Nones, Katia; Corbo, Vincenzo; Patch, Ann-Marie; Bailey, Peter; Lawlor, Rita T; Johns, Amber L; Miller, David K; Mafficini, Andrea; Rusev, Borislav; Scardoni, Maria; Antonello, Davide; Barbi, Stefano; Sikora, Katarzyna O; Cingarlini, Sara; Vicentini, Caterina; McKay, Skye; Quinn, Michael C J; Bruxner, Timothy J C; Christ, Angelika N; Harliwong, Ivon; Idrisoglu, Senel; McLean, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wilson, Peter J; Anderson, Matthew J; Fink, J Lynn; Newell, Felicity; Waddell, Nick; Holmes, Oliver; Kazakoff, Stephen H; Leonard, Conrad; Wood, Scott; Xu, Qinying; Nagaraj, Shivashankar Hiriyur; Amato, Eliana; Dalai, Irene; Bersani, Samantha; Cataldo, Ivana; Dei Tos, Angelo P; Capelli, Paola; Davì, Maria Vittoria; Landoni, Luca; Malpaga, Anna; Miotto, Marco; Whitehall, Vicki L J; Leggett, Barbara A; Harris, Janelle L; Harris, Jonathan; Jones, Marc D; Humphris, Jeremy; Chantrill, Lorraine A; Chin, Venessa; Nagrial, Adnan M; Pajic, Marina; Scarlett, Christopher J; Pinho, Andreia; Rooman, Ilse; Toon, Christopher; Wu, Jianmin; Pinese, Mark; Cowley, Mark; Barbour, Andrew; Mawson, Amanda; Humphrey, Emily S; Colvin, Emily K; Chou, Angela; Lovell, Jessica A; Jamieson, Nigel B; Duthie, Fraser; Gingras, Marie-Claude; Fisher, William E; Dagg, Rebecca A; Lau, Loretta M S; Lee, Michael; Pickett, Hilda A; Reddel, Roger R; Samra, Jaswinder S; Kench, James G; Merrett, Neil D; Epari, Krishna; Nguyen, Nam Q; Zeps, Nikolajs; Falconi, Massimo; Simbolo, Michele; Butturini, Giovanni; Van Buren, George; Partelli, Stefano; Fassan, Matteo; Khanna, Kum Kum; Gill, Anthony J; Wheeler, David A; Gibbs, Richard A; Musgrove, Elizabeth A; Bassi, Claudio; Tortora, Giampaolo; Pederzoli, Paolo; Pearson, John V; Waddell, Nicola; Biankin, Andrew V; Grimmond, Sean M

    2017-03-02

    The diagnosis of pancreatic neuroendocrine tumours (PanNETs) is increasing owing to more sensitive detection methods, and this increase is creating challenges for clinical management. We performed whole-genome sequencing of 102 primary PanNETs and defined the genomic events that characterize their pathogenesis. Here we describe the mutational signatures they harbour, including a deficiency in G:C > T:A base excision repair due to inactivation of MUTYH, which encodes a DNA glycosylase. Clinically sporadic PanNETs contain a larger-than-expected proportion of germline mutations, including previously unreported mutations in the DNA repair genes MUTYH, CHEK2 and BRCA2. Together with mutations in MEN1 and VHL, these mutations occur in 17% of patients. Somatic mutations, including point mutations and gene fusions, were commonly found in genes involved in four main pathways: chromatin remodelling, DNA damage repair, activation of mTOR signalling (including previously undescribed EWSR1 gene fusions), and telomere maintenance. In addition, our gene expression analyses identified a subgroup of tumours associated with hypoxia and HIF signalling.

  15. Evolution after whole-genome duplication: a network perspective.

    Science.gov (United States)

    Zhu, Yun; Lin, Zhenguo; Nakhleh, Luay

    2013-11-06

    Gene duplication plays an important role in the evolution of genomes and interactomes. Elucidating how evolution after gene duplication interplays at the sequence and network level is of great interest. In this work, we analyze a data set of gene pairs that arose through whole-genome duplication (WGD) in yeast. All these pairs have the same duplication time, making them ideal for evolutionary investigation. We investigated the interplay between evolution after WGD at the sequence and network levels and correlated these two levels of divergence with gene expression and fitness data. We find that molecular interactions involving WGD genes evolve at rates that are three orders of magnitude slower than the rates of evolution of the corresponding sequences. Furthermore, we find that divergence of WGD pairs correlates strongly with gene expression and fitness data. Because of the role of gene duplication in determining redundancy in biological systems and particularly at the network level, we investigated the role of interaction networks in elucidating the evolutionary fate of duplicated genes. We find that gene neighborhoods in interaction networks provide a mechanism for inferring these fates, and we developed an algorithm for achieving this task. Further epistasis analysis of WGD pairs categorized by their inferred evolutionary fates demonstrated the utility of these techniques. Finally, we find that WGD pairs and other pairs of paralogous genes of small-scale duplication origin share similar properties, giving good support for generalizing our results from WGD pairs to evolution after gene duplication in general.

  16. Genomic V exons from whole genome shotgun data in reptiles.

    Science.gov (United States)

    Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

    2014-08-01

    Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).

  17. Whole genomes redefine the mutational landscape of pancreatic cancer

    Science.gov (United States)

    Waddell, Nicola; Pajic, Marina; Patch, Ann-Marie; Chang, David K.; Kassahn, Karin S.; Bailey, Peter; Johns, Amber L.; Miller, David; Nones, Katia; Quek, Kelly; Quinn, Michael C. J.; Robertson, Alan J.; Fadlullah, Muhammad Z. H.; Bruxner, Tim J. C.; Christ, Angelika N.; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Wilson, Peter J; Markham, Emma; Cloonan, Nicole; Anderson, Matthew J.; Fink, J. Lynn; Holmes, Oliver; Kazakoff, Stephen H.; Leonard, Conrad; Newell, Felicity; Poudel, Barsha; Song, Sarah; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Wu, Jianmin; Pinese, Mark; Cowley, Mark J.; Lee, Hong C.; Jones, Marc D.; Nagrial, Adnan M.; Humphris, Jeremy; Chantrill, Lorraine A.; Chin, Venessa; Steinmann, Angela M.; Mawson, Amanda; Humphrey, Emily S.; Colvin, Emily K.; Chou, Angela; Scarlett, Christopher J.; Pinho, Andreia V.; Giry-Laterriere, Marc; Rooman, Ilse; Samra, Jaswinder S.; Kench, James G.; Pettitt, Jessica A.; Merrett, Neil D.; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q.; Barbour, Andrew; Zeps, Nikolajs; Jamieson, Nigel B.; Graham, Janet S.; Niclou, Simone P.; Bjerkvig, Rolf; Grützmann, Robert; Aust, Daniela; Hruban, Ralph H.; Maitra, Anirban; Iacobuzio-Donahue, Christine A.; Wolfgang, Christopher L.; Morgan, Richard A.; Lawlor, Rita T.; Corbo, Vincenzo; Bassi, Claudio; Falconi, Massimo; Zamboni, Giuseppe; Tortora, Giampaolo; Tempero, Margaret A.; Gill, Anthony J.; Eshleman, James R.; Pilarsky, Christian; Scarpa, Aldo; Musgrove, Elizabeth A.; Pearson, John V.; Biankin, Andrew V.; Grimmond, Sean M.

    2015-01-01

    Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded. PMID:25719666

  18. Signatures of selection in tilapia revealed by whole genome resequencing.

    Science.gov (United States)

    Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua

    2015-09-16

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.

  19. Cryptococcus gattii in the Age of Whole-Genome Sequencing.

    Science.gov (United States)

    Meyer, Wieland

    2015-11-17

    Cryptococcus gattii, the sister species of Cryptococcus neoformans, is an emerging pathogen which gained importance in connection with the ongoing cryptococcosis outbreak on Vancouver Island. Many molecular studies have divided this species into for major lineages: VGI, VGII, VGIII, and VGIV. This commentary summarizes the whole-genome sequencing (WGS) studies that have been carried out with this species, re-emphasizing the phylogenetic relationships, showing chromosomal rearrangements between those four groups, and identifying VGII as ancestral population within C. gattii. In addition, WGS specific to VGII, containing the Vancouver Island outbreak genotypes and those from the Pacific Northwest region of the United States, has placed the origin of this lineage within South America and identified specific genes responsible for either brain or lung infection. It also showed, that many genotypes are spread across a number of different continents, as has been previously shown by multilocus sequence typing (MLST). In addition, it showed that recombination occurs more frequently between mitochondrial than nuclear genomes.

  20. Current Developments in Prokaryotic Single Cell Whole Genome Amplification

    Energy Technology Data Exchange (ETDEWEB)

    Goudeau, Danielle; Nath, Nandita; Ciobanu, Doina; Cheng, Jan-Fang; Malmstrom, Rex

    2014-03-14

    Our approach to prokaryotic single-cell Whole Genome Amplification at the JGI continues to evolve. To increase both the quality and number of single-cell genomes produced, we explore all aspects of the process from cell sorting to sequencing. For example, we now utilize specialized reagents, acoustic liquid handling, and reduced reaction volumes eliminate non-target DNA contamination in WGA reactions. More specifically, we use a cleaner commercial WGA kit from Qiagen that employs a UV decontamination procedure initially developed at the JGI, and we use the Labcyte Echo for tip-less liquid transfer to set up 2uL reactions. Acoustic liquid handling also dramatically reduces reagent costs. In addition, we are exploring new cell lysis methods including treatment with Proteinase K, lysozyme, and other detergents, in order to complement standard alkaline lysis and allow for more efficient disruption of a wider range of cells. Incomplete lysis represents a major hurdle for WGA on some environmental samples, especially rhizosphere, peatland, and other soils. Finding effective lysis strategies that are also compatible with WGA is challenging, and we are currently assessing the impact of various strategies on genome recovery.

  1. Review:Whole genome amplification in preimplantation genetic diagnosis

    Institute of Scientific and Technical Information of China (English)

    Ying-ming ZHENG; Ning WANG; Lei LI; Fan JIN

    2011-01-01

    Preimplantation genetic diagnosis(PGD)refers to a procedure for genetically analyzing embryos prior to implantation,improving the chance of conception for patients at high risk of transmitting specific inherited disorders.This method has been widely used for a large number of genetic disorders since the first successful application in the early 1990s.Polymerase chain reaction(PCR)and fluorescent in situ hybridization(FISH)are the two main methods in PGD,but there are some inevitable shortcomings limiting the scope of genetic diagnosis.Fortunately,different whole genome amplification(WGA)techniques have been developed to overcome these problems.Sufficient DNA can be amplified and multiple tasks which need abundant DNA can be performed.Moreover,WGA products can be analyzed as a template for multi-loci and multi-gene during the subsequent DNA analysis.In this review,we will focus on the currently available WGA techniques and their applications,as well as the new technical trends from WGA products.

  2. Information recovery from low coverage whole-genome bisulfite sequencing.

    Science.gov (United States)

    Libertini, Emanuele; Heath, Simon C; Hamoudi, Rifat A; Gut, Marta; Ziller, Michael J; Czyz, Agata; Ruotti, Victor; Stunnenberg, Hendrik G; Frontini, Mattia; Ouwehand, Willem H; Meissner, Alexander; Gut, Ivo G; Beck, Stephan

    2016-06-27

    The cost of whole-genome bisulfite sequencing (WGBS) remains a bottleneck for many studies and it is therefore imperative to extract as much information as possible from a given dataset. This is particularly important because even at the recommend 30X coverage for reference methylomes, up to 50% of high-resolution features such as differentially methylated positions (DMPs) cannot be called with current methods as determined by saturation analysis. To address this limitation, we have developed a tool that dynamically segments WGBS methylomes into blocks of comethylation (COMETs) from which lost information can be recovered in the form of differentially methylated COMETs (DMCs). Using this tool, we demonstrate recovery of ∼30% of the lost DMP information content as DMCs even at very low (5X) coverage. This constitutes twice the amount that can be recovered using an existing method based on differentially methylated regions (DMRs). In addition, we explored the relationship between COMETs and haplotypes in lymphoblastoid cell lines of African and European origin. Using best fit analysis, we show COMETs to be correlated in a population-specific manner, suggesting that this type of dynamic segmentation may be useful for integrated (epi)genome-wide association studies in the future.

  3. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  4. A whole-genome association study for pig reproductive traits.

    Science.gov (United States)

    Onteru, S K; Fan, B; Du, Z-Q; Garrick, D J; Stalder, K J; Rothschild, M F

    2012-02-01

    A whole-genome association study was performed for reproductive traits in commercial sows using the PorcineSNP60 BeadChip and Bayesian statistical methods. The traits included total number born (TNB), number born alive (NBA), number of stillborn (SB), number of mummified foetuses at birth (MUM) and gestation length (GL) in each of the first three parities. We report the associations of informative QTL and the genes within the QTL for each reproductive trait in different parities. These results provide evidence of gene effects having temporal impacts on reproductive traits in different parities. Many QTL identified in this study are new for pig reproductive traits. Around 48% of total genes located in the identified QTL regions were predicted to be involved in placental functions. The genomic regions containing genes important for foetal developmental (e.g. MEF2C) and uterine functions (e.g. PLSCR4) were associated with TNB and NBA in the first two parities. Similarly, QTL in other foetal developmental (e.g. HNRNPD and AHR) and placental (e.g. RELL1 and CD96) genes were associated with SB and MUM in different parities. The QTL with genes related to utero-placental blood flow (e.g. VEGFA) and hematopoiesis (e.g. MAFB) were associated with GL differences among sows in this population. Pathway analyses using genes within QTL identified some modest underlying biological pathways, which are interesting candidates (e.g. the nucleotide metabolism pathway for SB) for pig reproductive traits in different parities. Further validation studies on large populations are warranted to improve our understanding of the complex genetic architecture for pig reproductive traits.

  5. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  6. Post-Fragmentation Whole Genome Amplification-Based Method

    Science.gov (United States)

    Benardini, James; LaDuc, Myron T.; Langmore, John

    2011-01-01

    This innovation is derived from a proprietary amplification scheme that is based upon random fragmentation of the genome into a series of short, overlapping templates. The resulting shorter DNA strands (genomic hybridization microarray, SNP analysis, and sequencing. The standard reaction can be performed with minimal hands-on time, and can produce amplified DNA in as little as three hours. Post-fragmentation whole genome amplification-based technology provides a robust and accurate method of amplifying femtogram levels of starting material into microgram yields with no detectable allele bias. The amplified DNA also facilitates the preservation of samples (spacecraft samples) by amplifying scarce amounts of template DNA into microgram concentrations in just a few hours. Based on further optimization of this technology, this could be a feasible technology to use in sample preservation for potential future sample return missions. The research and technology development described here can be pivotal in dealing with backward/forward biological contamination from planetary missions. Such efforts rely heavily on an increasing understanding of the burden and diversity of microorganisms present on spacecraft surfaces throughout assembly and testing. The development and implementation of these technologies could significantly improve the comprehensiveness and resolving power of spacecraft-associated microbial population censuses, and are important to the continued evolution and advancement of planetary protection capabilities. Current molecular procedures for assaying spacecraft-associated microbial burden and diversity have inherent sample loss issues at practically every step, particularly nucleic acid extraction. In engineering a molecular means of amplifying nucleic acids directly from single cells in their native state within the sample matrix, this innovation has circumvented entirely the need for DNA extraction regimes in the sample processing scheme.

  7. Tracing Mycobacterium tuberculosis transmission by whole genome sequencing in a high incidence setting

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Soborg, B; Koch, A;

    2016-01-01

    In East Greenland, a dramatic increase of tuberculosis (TB) incidence has been observed in recent years. Classical genotyping suggests a genetically similar Mycobacterium tuberculosis (Mtb) strain population as cause, however, precise transmission patterns are unclear. We performed whole genome...

  8. Systematic evaluation of bias in microbial community profiles induced by whole genome amplification

    NARCIS (Netherlands)

    Direito, S.O.L.; Zaura, E.; Little, M.; Ehrenfreund, P.; Röling, W.F.M.

    2014-01-01

    Whole genome amplification methods facilitate the detection and characterization of microbial communities in low biomass environments. We examined the extent to which the actual community structure is reliably revealed and factors contributing to bias. One widely used [multiple displacement amplific

  9. New perspectives on microbial community distortion after whole-genome amplification

    Science.gov (United States)

    Whole-genome amplification (WGA) has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the e...

  10. Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data

    NARCIS (Netherlands)

    J.M. Bryant (Josephine); A. Schürch (Anita); H. van Deutekom (Henk); S.R. Harris (Simon); J.L. de Beer (Jessica); V. de Jager (Victor); K. Kremer (Kristin); S.A.F.T. van Hijum (Sacha); R.J. Siezen (Roland); M.W. Borgdorff (Martien ); S.D. Bentley (Stephen); J. Parkhill (Julian); D. van Soolingen (Dick)

    2013-01-01

    textabstractBackground: Mycobacterium tuberculosis is characterised by limited genomic diversity, which makes the application of whole genome sequencing particularly attractive for clinical and epidemiological investigation. However, in order to confidently infer transmission events, an accurate kno

  11. Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data

    NARCIS (Netherlands)

    Bryant, J.M.; Schürch, A.C.; Deutekom, van H.; Harris, S.R.; Beer, de J.L.; Jager, de V.C.L.; Kremer, K.; Hijum, van S.A.F.T.; Siezen, R.J.; Borgdorff, M.; Bentley, S.D.; Parkhill, J.; Soolingen, van D.

    2013-01-01

    BACKGROUND: Mycobacterium tuberculosis is characterised by limited genomic diversity, which makes the application of whole genome sequencing particularly attractive for clinical and epidemiological investigation. However, in order to confidently infer transmission events, an accurate knowledge of th

  12. Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data.

    NARCIS (Netherlands)

    Bryant, J.M.; Schurch, A.C.; Deutekom, H. van; Harris, S.R.; Beer, J.L. de; Jager, V. de; Kremer, K.; Hijum, S.A.F.T. van; Siezen, R.J.; Borgdorff, M.; Bentley, S.D.; Parkhill, J.; Soolingen, D. van

    2013-01-01

    BACKGROUND: Mycobacterium tuberculosis is characterised by limited genomic diversity, which makes the application of whole genome sequencing particularly attractive for clinical and epidemiological investigation. However, in order to confidently infer transmission events, an accurate knowledge of th

  13. A Whole Genome Pairwise Comparative and Functional Analysis of Geobacter sulfurreducens PCA

    OpenAIRE

    2013-01-01

    Geobacter species are involved in electricity production, bioremediations, and various environmental friendly activities. Whole genome comparative analyses of Geobacter sulfurreducens PCA, Geobacter bemidjiensis Bem, Geobacter sp. FRC-32, Geobacter lovleyi SZ, Geobacter sp. M21, Geobacter metallireducens GS-15, Geobacter uraniireducens Rf4 have been made to find out similarities and dissimilarities among them. For whole genome comparison of Geobacter species, an in-house tool, Geobacter Compa...

  14. Whole-Genome Sequence of the Nitrogen-Fixing Symbiotic Rhizobium Mesorhizobium loti Strain TONO

    Science.gov (United States)

    Hirakawa, Hideki; Sato, Shusei; Saeki, Kazuhiko; Hayashi, Makoto

    2016-01-01

    Mesorhizobium loti is the nitrogen-fixing microsymbiont for legumes of the genus Lotus. Here, we report the whole-genome sequence of a Mesorhizobium loti strain, TONO, which is used as a symbiont for the model legume Lotus japonicus. The whole-genome sequence of the strain TONO will be a solid platform for comparative genomics analyses and for the identification of genes responsible for the symbiotic properties of Mesorhizobium species.

  15. High Depth, Whole-Genome Sequencing of Cholera Isolates from Haiti and the Dominican Republic

    Science.gov (United States)

    2012-09-11

    Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of...We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae ...during an epidemic. Keywords Whole-genome sequencing, Vibrio cholerae , Haitian cholera epidemic, Microbial evolution Background Following the

  16. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia

    Science.gov (United States)

    Gurjav, Ulziijargal; Outhred, Alexander C.; Jelfs, Peter; McCallum, Nadine; Wang, Qinning; Hill-Cawthorne, Grant A.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24) genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW), Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841) had complete demographic and genotyping data. East-African Indian (474; 28.0%) and Beijing (470; 27.8%) lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692) and was highest among Beijing lineage strains (35.7%; 168/470). One Beijing and three East-African Indian (EAI) clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates) demonstrated diverse single nucleotide polymorphisms (SNPs) within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings. PMID:27737005

  17. Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly

    Energy Technology Data Exchange (ETDEWEB)

    Shou, S. [Univ. Wisc.-Madison; Kvikstad, E. [Univ. Wisc.-Madison; Kile, A. [Univ. Wisc.-Madison; Severin, J. [Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly; Forrest, D. [Univ. Wisc.-Madison; Runnheim, R. [Univ. Wisc.-Madison; Churas, C. [Univ. Wisc.-Madison; Hickman, J. W. [Univ. Wisc.-Madison; Mackenzie, C. [University of Texas–Houston Medical School; Choudhary, M. [University of Texas–Houston Medical School; Donohue, T. [Univ. Wisc.-Madison; Kaplan, S. [University of Texas–Houston Medical School; Schwartz, D. C. [Univ. Wisc.-Madison

    2003-09-01

    Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verification of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.

  18. PEMapper and PECaller provide a simplified approach to whole-genome sequencing.

    Science.gov (United States)

    Johnston, H Richard; Chopra, Pankaj; Wingo, Thomas S; Patel, Viren; Epstein, Michael P; Mulle, Jennifer G; Warren, Stephen T; Zwick, Michael E; Cutler, David J

    2017-03-07

    The analysis of human whole-genome sequencing data presents significant computational challenges. The sheer size of datasets places an enormous burden on computational, disk array, and network resources. Here, we present an integrated computational package, PEMapper/PECaller, that was designed specifically to minimize the burden on networks and disk arrays, create output files that are minimal in size, and run in a highly computationally efficient way, with the single goal of enabling whole-genome sequencing at scale. In addition to improved computational efficiency, we implement a statistical framework that allows for a base by base error model, allowing this package to perform as well or better than the widely used Genome Analysis Toolkit (GATK) in all key measures of performance on human whole-genome sequences.

  19. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities.

  20. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    Science.gov (United States)

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  1. Whole genome sequence of Enterobacter ludwigii type strain EN-119T, isolated from clinical specimens.

    Science.gov (United States)

    Li, Gengmi; Hu, Zonghai; Zeng, Ping; Zhu, Bing; Wu, Lijuan

    2015-04-01

    Enterobacter ludwigii strain EN-119(T) is the type strain of E. ludwigii, which belongs to the E. cloacae complex (Ecc). This strain was first reported and nominated in 2005 and later been found in many hospitals. In this paper, the whole genome sequencing of this strain was carried out. The total genome size of EN-119(T) is 4952,770 bp with 4578 coding sequences, 88 tRNAs and 10 rRNAs. The genome sequence of EN-119(T) is the first whole genome sequence of E. ludwigii, which will further our understanding of Ecc.

  2. Whole-Genome Sequences of Two Borrelia afzelii and Two Borrelia garinii Lyme Disease Agent Isolates

    Energy Technology Data Exchange (ETDEWEB)

    Casjens, S.R.; Dunn, J.; Mongodin, E. F.; Qiu, W.-G.; Luft, B. J.; Fraser-Liggett, C. M.; Schutzer, S. E.

    2011-12-01

    Human Lyme disease is commonly caused by several species of spirochetes in the Borrelia genus. In Eurasia these species are largely Borrelia afzelii, B. garinii, B. burgdorferi, and B. bavariensis sp. nov. Whole-genome sequencing is an excellent tool for investigating and understanding the influence of bacterial diversity on the pathogenesis and etiology of Lyme disease. We report here the whole-genome sequences of four isolates from two of the Borrelia species that cause human Lyme disease, B. afzelii isolates ACA-1 and PKo and B. garinii isolates PBr and Far04.

  3. Whole-Genome de novo Sequencing Of Quail And Grey Partridge

    DEFF Research Database (Denmark)

    Holm, Lars-Erik; Panitz, Frank; Burt, Dave;

    2011-01-01

    The development in sequencing methods has made it possible to perform whole genome de novo sequencing of species without large commercial interests. Within the EU-financed QUANTOMICS project (KBBE-2A-222664), we have performed de novo sequencing of quail (Coturnix coturnix) and grey partridge...... comparative studies towards the chicken genome and will aid in identifying evolutionarily conserved sequences within the Galliformes. The obtained sequences from quail and partridge represent a beginning of generating the whole genome sequence for these species. The continuation of establishing the genome...

  4. Effective Normalization for Copy Number Variation Detection from Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2012-01-01

    Background Whole genome sequencing enables a high resolution view ofthe human genome and provides unique insights into genome structureat an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools while validatedalso include a number of parame

  5. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer

    NARCIS (Netherlands)

    Wang, Kai; Yuen, Siu Tsan; Xu, Jiangchun; Lee, Siu Po; Yan, Helen H N; Shi, Stephanie T; Siu, Hoi Cheong; Deng, Shibing; Chu, Kent Man; Law, Simon; Chan, Kok Hoe; Chan, Annie S Y; Tsui, Wai Yin; Ho, Siu Lun; Chan, Anthony K W; Man, Jonathan L K; Foglizzo, Valentina; Ng, Man Kin; Chan, April S; Ching, Yick Pang; Cheng, Grace H W; Xie, Tao; Fernandez, Julio; Li, Vivian S W; Clevers, Hans; Rejto, Paul A; Mao, Mao; Leung, Suet Yi

    2014-01-01

    Gastric cancer is a heterogeneous disease with diverse molecular and histological subtypes. We performed whole-genome sequencing in 100 tumor-normal pairs, along with DNA copy number, gene expression and methylation profiling, for integrative genomic analysis. We found subtype-specific genetic and e

  6. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci

    DEFF Research Database (Denmark)

    Rasmusen, L. H.; Dargis, R.; Iversen, Katrine Højholt;

    2016-01-01

    with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were...

  7. Whole-Genome Scans Provide Evidence of Adaptive Evolution in Malawian Plasmodium falciparum Isolates

    DEFF Research Database (Denmark)

    Ocholla, Harold; Preston, Mark D; Mipando, Mwapatsa;

    2014-01-01

    BACKGROUND:  Selection by host immunity and antimalarial drugs has driven extensive adaptive evolution in Plasmodium falciparum and continues to produce ever-changing landscapes of genetic variation. METHODS:  We performed whole-genome sequencing of 69 P. falciparum isolates from Malawi and used ...

  8. Clinical Application of Whole Genome Sequencing In Patients with Primary Immunodeficiency

    Science.gov (United States)

    Mousallem, Talal; Urban, Thomas J.; McSweeney, K. Melodi; Kleinstein, Sarah E.; Zhu, Mingfu; Adeli, Mehdi; Parrott, Roberta E.; Roberts, Joseph L.; Krueger, Brian; Buckley, Rebecca H.; Goldstein, David B

    2016-01-01

    Summary This report illustrates the value of whole genome sequencing (WGS) in elucidating the genetic cause of disease in patients with primary immunodeficiency (PID). As sequencing costs decline, we predict that utilization of next generation sequencing (NGS) in the clinical setting will increase. PMID:25981738

  9. CViT: “Chromosome Visualization Tool” – A whole-genome viewer

    Science.gov (United States)

    CViT (Chromosome Visualization Tool) is a Perl utility for quickly generating images of features on a whole genome at once. It reads GFF3-format data representing chromosomes (linkage groups or pseudomolecules), and features on those chromosomes. It can display features on any chromosomal unit syste...

  10. Whole genome scan to detect quantitative trait loci for bovine milk protein composition

    NARCIS (Netherlands)

    Schopen, G.C.B.; Koks, P.D.; Arendonk, van J.A.M.; Bovenhuis, H.; Visker, M.H.P.W.

    2009-01-01

    The objective of this study was to perform a whole genome scan to detect quantitative trait loci (QTL) for milk protein composition in 849 Holstein–Friesian cows originating from seven sires. One morning milk sample was analysed for the major milk proteins using capillary zone electrophoresis. A gen

  11. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian;

    2015-01-01

    BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate-oligonucleoti...

  12. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing

    DEFF Research Database (Denmark)

    Li, Ying-hui; Zhao, Shan-cen; Ma, Jian-xin;

    2013-01-01

    BACKGROUND:Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re...

  13. Whole Genome PCR Scanning (WGPS) of C. burnetii strains from ruminants

    NARCIS (Netherlands)

    Sidi-Boumedine, Karim; Adam, Gilbert; Angen, Oysten; Aspán, A.; Bossers, A.; Roest, H.I.J.; Prigent, Myriam; Thiéry, R.; Rousset, Elodie

    2015-01-01

    Coxiella burnetii is the causative agent of Q fever, a zoonosis that spreads from ruminants to humans via the inhalation of aerosols contaminated by livestock's birth products. This study aimed to compare the genomes of strains isolated from ruminants by “Whole Genome PCR Scanning (WGPS)” in order t

  14. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans)

    OpenAIRE

    2015-01-01

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy.

  15. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans).

    Science.gov (United States)

    Tran, Phuong N; Tan, Nicholas E H; Lee, Yin Peng; Gan, Han Ming; Polter, Steven J; Dailey, Lucas K; Hudson, André O; Savka, Michael A

    2015-11-19

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy.

  16. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected wi...

  17. Whole genome analysis of Klebsiella pneumoniae T2-1-1 from human oral cavity

    Directory of Open Access Journals (Sweden)

    Kok-Gan Chan

    2016-03-01

    Full Text Available Klebsiella pneumoniae T2-1-1 was isolated from the human tongue debris and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession JAQL00000000.

  18. Whole-genome characterization and genotyping of global WU polyomavirus strains

    NARCIS (Netherlands)

    Bialasiewicz, Seweryn; Rockett, Rebecca; Whiley, David W.; Abed, Yacine; Allander, Tobias; Binks, Michael; Boivin, Guy; Cheng, Allen C.; Chung, Ju-Young; Ferguson, Patricia E.; Gilroy, Nicole M.; Leach, Amanda J.; Lindau, Cecilia; Rossen, John W.; Sorrell, Tania C.; Nissen, Michael D.; Sloots, Theo P.

    2010-01-01

    Exploration of the genetic diversity of WU polyomavirus (WUV) has been limited in terms of the specimen numbers and particularly the sizes of the genomic fragments analyzed. Using whole-genome sequencing of 48 WUV strains collected in four continents over a 5-year period and 16 publicly available wh

  19. Generation of physical map contig-specific sequences useful for whole genome sequence scaffolding.

    Directory of Open Access Journals (Sweden)

    Yanliang Jiang

    Full Text Available Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.

  20. Generation of Physical Map Contig-Specific Sequences Useful for Whole Genome Sequence Scaffolding

    Science.gov (United States)

    Jiang, Yanliang; Ninwichian, Parichart; Liu, Shikai; Zhang, Jiaren; Kucuktas, Huseyin; Sun, Fanyue; Kaltenboeck, Ludmilla; Sun, Luyang; Bao, Lisui; Liu, Zhanjiang

    2013-01-01

    Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge. PMID:24205335

  1. Comparative analysis of copy number detection by whole-genome BAC and oligonucleotide array CGH

    Directory of Open Access Journals (Sweden)

    Bejjani Bassem A

    2010-06-01

    Full Text Available Abstract Background Microarray-based comparative genomic hybridization (aCGH is a powerful diagnostic tool for the detection of DNA copy number gains and losses associated with chromosome abnormalities, many of which are below the resolution of conventional chromosome analysis. It has been presumed that whole-genome oligonucleotide (oligo arrays identify more clinically significant copy-number abnormalities than whole-genome bacterial artificial chromosome (BAC arrays, yet this has not been systematically studied in a clinical diagnostic setting. Results To determine the difference in detection rate between similarly designed BAC and oligo arrays, we developed whole-genome BAC and oligonucleotide microarrays and validated them in a side-by-side comparison of 466 consecutive clinical specimens submitted to our laboratory for aCGH. Of the 466 cases studied, 67 (14.3% had a copy-number imbalance of potential clinical significance detectable by the whole-genome BAC array, and 73 (15.6% had a copy-number imbalance of potential clinical significance detectable by the whole-genome oligo array. However, because both platforms identified copy number variants of unclear clinical significance, we designed a systematic method for the interpretation of copy number alterations and tested an additional 3,443 cases by BAC array and 3,096 cases by oligo array. Of those cases tested on the BAC array, 17.6% were found to have a copy-number abnormality of potential clinical significance, whereas the detection rate increased to 22.5% for the cases tested by oligo array. In addition, we validated the oligo array for detection of mosaicism and found that it could routinely detect mosaicism at levels of 30% and greater. Conclusions Although BAC arrays have faster turnaround times, the increased detection rate of oligo arrays makes them attractive for clinical cytogenetic testing.

  2. Whole genome multilocus sequence typing as an epidemiologic tool for Yersinia pestis.

    Science.gov (United States)

    Kingry, Luke C; Rowe, Lori A; Respicio-Kingry, Laurel B; Beard, Charles B; Schriefer, Martin E; Petersen, Jeannine M

    2016-04-01

    Human plague is a severe and often fatal zoonotic disease caused by Yersinia pestis. For public health investigations of human cases, nonintensive whole genome molecular typing tools, capable of defining epidemiologic relationships, are advantageous. Whole genome multilocus sequence typing (wgMLST) is a recently developed methodology that simplifies genomic analyses by transforming millions of base pairs of sequence into character data for each gene. We sequenced 13 US Y. pestis isolates with known epidemiologic relationships. Sequences were assembled de novo, and multilocus sequence typing alleles were assigned by comparison against 3979 open reading frames from the reference strain CO92. Allele-based cluster analysis accurately grouped the 13 isolates, as well as 9 publicly available Y. pestis isolates, by their epidemiologic relationships. Our findings indicate wgMLST is a simplified, sensitive, and scalable tool for epidemiologic analysis of Y. pestis strains.

  3. Single Cell HLA Matching Feasibility by Whole Genomic Amplification and Nested PCR

    Institute of Scientific and Technical Information of China (English)

    Xiao-hong Li; Fang-yin Meng

    2004-01-01

    @@ PCR based single-cell DNA analysis has been widely used in forensic science, preimplantation genetic diagnosis and so on. However, the original sample cannot be efficiently retrieved following single cell PCR, consequently the amount of information gained is limited. HLA system is too sophisticated that it is very hard to complete HLA typing by single cell. A Taq polymerase-based method using random primers to amplify whole genome termed as whole genome amplification (WGA) has demonstrated to be a useful method in increasing the copies of minimum sample. We establish a technique in this study to amplify HLA-A and HLA-B loci at same time in a single cell using WGA.

  4. Mapping the human genome by using {open_quotes}whole genome{close_quotes} radiation hybrids

    Energy Technology Data Exchange (ETDEWEB)

    Cox, D.R. [Stanford Univ., CA (United States)

    1995-12-31

    An important goal of the Human Genome Project is to construct a map of the human genome at an average resolution of 100 kilobases (kb), which should provide the scientific community with a valuable resource for the localization an isolation of any human DNA sequence of interest. In an effort to complete this map by the projected date of 1998, we have constructed two sets of {open_quotes}whole genome{close_quotes} radiation hybrids. The first set of 83 hamster-human somatic cell hybrids contains human DNA fragments approximately 5 million base pairs in length. Each individual hybrid cell line contains approximately one fifth of the entire human genome. Our mapping results indicate that these whole genome radiation hybrids represent an important resource for constructing the 100 kb map in a timely and cost-effective fashion.

  5. Applications of the double-barreled data in whole-genome shotgun sequence assembly and analysis

    Institute of Scientific and Technical Information of China (English)

    HAN Yujun; WANG Jing; GU Xiaocheng; YU Jun; LI Songgang; NI Peixiang; L(U) Hong; YE Jia; HU Jianfei; CHEN Chen; HUANG Xiangang; CONG Lijuan; LI Guangyuan

    2005-01-01

    Double-barreled (DB) data have been widely used for the assembly of large genomes. Based on the experience of building the whole-genome working draft of Oryza sativa L.ssp. Indica, we present here the prevailing and improved uses of DB data in the assembly procedure and report on novel applications during the following data-mining processes such as acquiring precise insert fragment information of each clone across the genome, and a new kind of Iow-cost whole-genome microarray. With the increasing number of organisms being sequenced,we believe that DB data will play an important role both in other assembly procedures and infuture genomic studies.

  6. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Science.gov (United States)

    Satoh, Soichirou; Mimuro, Mamoru; Tanaka, Ayumi

    2013-01-01

    Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  7. Error rates, PCR recombination, and sampling depth in HIV-1 whole genome deep sequencing.

    Science.gov (United States)

    Zanini, Fabio; Brodin, Johanna; Albert, Jan; Neher, Richard A

    2016-12-27

    Deep sequencing is a powerful and cost-effective tool to characterize the genetic diversity and evolution of virus populations. While modern sequencing instruments readily cover viral genomes many thousand fold and very rare variants can in principle be detected, sequencing errors, amplification biases, and other artifacts can limit sensitivity and complicate data interpretation. For this reason, the number of studies using whole genome deep sequencing to characterize viral quasi-species in clinical samples is still limited. We have previously undertaken a large scale whole genome deep sequencing study of HIV-1 populations. Here we discuss the challenges, error profiles, control experiments, and computational test we developed to quantify the accuracy of variant frequency estimation.

  8. Whole genome sequencing of Mycobacterium tuberculosis SB24 isolated from Sabah, Malaysia

    Directory of Open Access Journals (Sweden)

    Noraini Philip

    2016-09-01

    Full Text Available Mycobacterium tuberculosis (M. tuberculosis is the causative agent of tuberculosis (TB that causes millions of death every year. We have sequenced the genome of M. tuberculosis isolated from cerebrospinal fluid (CSF of a patient diagnosed with tuberculous meningitis (TBM. The isolated strain was referred as M. tuberculosis SB24. Genomic DNA of the M. tuberculosis SB24 was extracted and subjected to whole genome sequencing using PacBio platform. The draft genome size of M. tuberculosis SB24 was determined to be 4,452,489 bp with a G + C content of 65.6%. The whole genome shotgun project has been deposited in NCBI SRA under the accession number SRP076503.

  9. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing

    DEFF Research Database (Denmark)

    Li, Ying-hui; Zhao, Shan-cen; Ma, Jian-xin

    2013-01-01

    BACKGROUND:Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re......-sequencing accessions, which represent wild, domesticated landrace, and Chinese elite soybean populations were analyzed.RESULTS:A total of 5,102,244 single nucleotide polymorphisms (SNPs) and 707,969 insertion/deletions were identified. Among the SNPs detected, 25.5% were not described previously. We found...... that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars. Only a small proportion (2.99%) of the whole genomic regions appear to be affected by artificial selection for preferred agricultural traits...

  10. Microarray-based whole-genome hybridization as a tool for determining procaryotic species relatedness

    Energy Technology Data Exchange (ETDEWEB)

    Wu, L.; Liu, X.; Fields, M.W.; Thompson, D.K.; Bagwell, C.E.; Tiedje, J. M.; Hazen, T.C.; Zhou, J.

    2008-01-15

    The definition and delineation of microbial species are of great importance and challenge due to the extent of evolution and diversity. Whole-genome DNA-DNA hybridization is the cornerstone for defining procaryotic species relatedness, but obtaining pairwise DNA-DNA reassociation values for a comprehensive phylogenetic analysis of procaryotes is tedious and time consuming. A previously described microarray format containing whole-genomic DNA (the community genome array or CGA) was rigorously evaluated as a high-throughput alternative to the traditional DNA-DNA reassociation approach for delineating procaryotic species relationships. DNA similarities for multiple bacterial strains obtained with the CGA-based hybridization were comparable to those obtained with various traditional whole-genome hybridization methods (r=0.87, P<0.01). Significant linear relationships were also observed between the CGA-based genome similarities and those derived from small subunit (SSU) rRNA gene sequences (r=0.79, P<0.0001), gyrB sequences (r=0.95, P<0.0001) or REP- and BOX-PCR fingerprinting profiles (r=0.82, P<0.0001). The CGA hybridization-revealed species relationships in several representative genera, including Pseudomonas, Azoarcus and Shewanella, were largely congruent with previous classifications based on various conventional whole-genome DNA-DNA reassociation, SSU rRNA and/or gyrB analyses. These results suggest that CGA-based DNA-DNA hybridization could serve as a powerful, high-throughput format for determining species relatedness among microorganisms.

  11. Economic evidence on identifying clinically actionable findings with whole-genome sequencing: a scoping review.

    OpenAIRE

    2016-01-01

    The American College of Medical Genetics and Genomics (ACMG) recommends that mutations in 56 genes for 24 conditions are clinically actionable and should be reported as secondary findings after whole-genome sequencing (WGS). Our aim was to identify published economic evaluations of detecting mutations in these genes among the general population or among targeted/high-risk populations and conditions and identify gaps in knowledge. A targeted PubMed search from 1994 through November 2014 was pe...

  12. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation

    OpenAIRE

    2016-01-01

    Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris ...

  13. Whole-Genome Sequence of Rummeliibacillus stabekisii Strain PP9 Isolated from Antarctic Soil.

    Science.gov (United States)

    da Mota, Fábio Faria; Vollú, Renata Estebanez; Jurelevicius, Diogo; Seldin, Lucy

    2016-05-26

    The whole genome of Rummeliibacillus stabekisii PP9, isolated from a soil sample from Antarctica, consists of a circular chromosome of 3,412,092 bp and a circular plasmid of 8,647 bp, with 3,244 protein-coding genes, 12 copies of the 16S-23S-5S rRNA operon, 101 tRNA genes, and 6 noncoding RNAs (ncRNAs).

  14. Evolutionary insight from whole-genome sequencing of Pseudomonas aeruginosa from cystic fibrosis patients

    DEFF Research Database (Denmark)

    Marvig, Rasmus Lykke; Madsen Sommer, Lea Mette; Jelsbak, Lars;

    2015-01-01

    is suggested to be due to the large genetic repertoire of P. aeruginosa and its ability to genetically adapt to the host environment. Here, we review the recent work that has applied whole-genome sequencing to understand P. aeruginosa population genomics, within-host microevolution and diversity, mutational...... mechanisms, genetic adaptation and transmission events. Finally, we summarize the advances in relation to medical applications and laboratory evolution experiments....

  15. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat

    Directory of Open Access Journals (Sweden)

    Huajing Teng

    2016-07-01

    Full Text Available Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches.

  16. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    Science.gov (United States)

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-07-07

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches.

  17. Personalized oncogenomics: clinical experience with malignant peritoneal mesothelioma using whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Brandon S Sheffield

    Full Text Available Peritoneal mesothelioma is a rare and sometimes lethal malignancy that presents a clinical challenge for both diagnosis and management. Recent studies have led to a better understanding of the molecular biology of peritoneal mesothelioma. Translation of the emerging data into better treatments and outcome is needed. From two patients with peritoneal mesothelioma, we derived whole genome sequences, RNA expression profiles, and targeted deep sequencing data. Molecular data were made available for translation into a clinical treatment plan. Treatment responses and outcomes were later examined in the context of molecular findings. Molecular studies presented here provide the first reported whole genome sequences of peritoneal mesothelioma. Mutations in known mesothelioma-related genes NF2, CDKN2A, LATS2, amongst others, were identified. Activation of MET-related signaling pathways was demonstrated in both cases. A hypermutated phenotype was observed in one case (434 vs. 18 single nucleotide variants and was associated with a favourable outcome despite sarcomatoid histology and multifocal disease. This study represents the first report of whole genome analyses of peritoneal mesothelioma, a key step in the understanding and treatment of this disease.

  18. Personalized oncogenomics: clinical experience with malignant peritoneal mesothelioma using whole genome sequencing.

    Science.gov (United States)

    Sheffield, Brandon S; Tinker, Anna V; Shen, Yaoqing; Hwang, Harry; Li-Chang, Hector H; Pleasance, Erin; Ch'ng, Carolyn; Lum, Amy; Lorette, Julie; McConnell, Yarrow J; Sun, Sophie; Jones, Steven J M; Gown, Allen M; Huntsman, David G; Schaeffer, David F; Churg, Andrew; Yip, Stephen; Laskin, Janessa; Marra, Marco A

    2015-01-01

    Peritoneal mesothelioma is a rare and sometimes lethal malignancy that presents a clinical challenge for both diagnosis and management. Recent studies have led to a better understanding of the molecular biology of peritoneal mesothelioma. Translation of the emerging data into better treatments and outcome is needed. From two patients with peritoneal mesothelioma, we derived whole genome sequences, RNA expression profiles, and targeted deep sequencing data. Molecular data were made available for translation into a clinical treatment plan. Treatment responses and outcomes were later examined in the context of molecular findings. Molecular studies presented here provide the first reported whole genome sequences of peritoneal mesothelioma. Mutations in known mesothelioma-related genes NF2, CDKN2A, LATS2, amongst others, were identified. Activation of MET-related signaling pathways was demonstrated in both cases. A hypermutated phenotype was observed in one case (434 vs. 18 single nucleotide variants) and was associated with a favourable outcome despite sarcomatoid histology and multifocal disease. This study represents the first report of whole genome analyses of peritoneal mesothelioma, a key step in the understanding and treatment of this disease.

  19. Analysis on n-gram statistics and linguistic features of whole genome protein sequences

    Institute of Scientific and Technical Information of China (English)

    DONG Qi-wen; WANG Xiao-long; LIN Lei

    2008-01-01

    To obtain the statistical sequence analysis on a large number of genomic and proteomie sequences available for different organisms,the n-grams of whole genome protein sequences from 20 organisms were extracted.Their linguistic features were analyzed by two tests:Zipf power law and Shannon entropy,developed for analysis of natural languages and symbolic sequences.The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered.The results show that:the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4;the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins;a simple unigram model can distinguish different organisms;there exist organism-specific usages of "phrases" in protein sequences.It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence,structure and function.

  20. Using Mendelian inheritance errors as quality control criteria in whole genome sequencing data set.

    Science.gov (United States)

    Pilipenko, Valentina V; He, Hua; Kurowski, Brad G; Alexander, Eileen S; Zhang, Xue; Ding, Lili; Mersha, Tesfaye B; Kottyan, Leah; Fardo, David W; Martin, Lisa J

    2014-01-01

    Although the technical and analytic complexity of whole genome sequencing is generally appreciated, best practices for data cleaning and quality control have not been defined. Family based data can be used to guide the standardization of specific quality control metrics in nonfamily based data. Given the low mutation rate, Mendelian inheritance errors are likely as a result of erroneous genotype calls. Thus, our goal was to identify the characteristics that determine Mendelian inheritance errors. To accomplish this, we used chromosome 3 whole genome sequencing family based data from the Genetic Analysis Workshop 18. Mendelian inheritance errors were provided as part of the GAW18 data set. Additionally, for binary variants we calculated Mendelian inheritance errors using PLINK. Based on our analysis, nonbinary single-nucleotide variants have an inherently high number of Mendelian inheritance errors. Furthermore, in binary variants, Mendelian inheritance errors are not randomly distributed. Indeed, we identified 3 Mendelian inheritance error peaks that were enriched with repetitive elements. However, these peaks can be lessened with the inclusion of a single filter from the sequencing file. In summary, we demonstrated that erroneous sequencing calls are nonrandomly distributed across the genome and quality control metrics can dramatically reduce the number of mendelian inheritance errors. Appropriate quality control will allow optimal use of genetic data to realize the full potential of whole genome sequencing.

  1. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Johanna Hasmats

    Full Text Available Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74% of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  2. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Science.gov (United States)

    Hasmats, Johanna; Gréen, Henrik; Orear, Cedric; Validire, Pierre; Huss, Mikael; Käller, Max; Lundeberg, Joakim

    2014-01-01

    Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74%) of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  3. Whole-genome sequencing of uropathogenic Escherichia coli reveals long evolutionary history of diversity and virulence.

    Science.gov (United States)

    Lo, Yancy; Zhang, Lixin; Foxman, Betsy; Zöllner, Sebastian

    2015-08-01

    Uropathogenic Escherichia coli (UPEC) are phenotypically and genotypically very diverse. This diversity makes it challenging to understand the evolution of UPEC adaptations responsible for causing urinary tract infections (UTI). To gain insight into the relationship between evolutionary divergence and adaptive paths to uropathogenicity, we sequenced at deep coverage (190×) the genomes of 19 E. coli strains from urinary tract infection patients from the same geographic area. Our sample consisted of 14 UPEC isolates and 5 non-UTI-causing (commensal) rectal E. coli isolates. After identifying strain variants using de novo assembly-based methods, we clustered the strains based on pairwise sequence differences using a neighbor-joining algorithm. We examined evolutionary signals on the whole-genome phylogeny and contrasted these signals with those found on gene trees constructed based on specific uropathogenic virulence factors. The whole-genome phylogeny showed that the divergence between UPEC and commensal E. coli strains without known UPEC virulence factors happened over 32 million generations ago. Pairwise diversity between any two strains was also high, suggesting multiple genetic origins of uropathogenic strains in a small geographic region. Contrasting the whole-genome phylogeny with three gene trees constructed from common uropathogenic virulence factors, we detected no selective advantage of these virulence genes over other genomic regions. These results suggest that UPEC acquired uropathogenicity long time ago and used it opportunistically to cause extraintestinal infections.

  4. Enhanced de novo assembly of high throughput pyrosequencing data using whole genome mapping.

    Science.gov (United States)

    Onmus-Leone, Fatma; Hang, Jun; Clifford, Robert J; Yang, Yu; Riley, Matthew C; Kuschner, Robert A; Waterman, Paige E; Lesho, Emil P

    2013-01-01

    Despite major advances in next-generation sequencing, assembly of sequencing data, especially data from novel microorganisms or re-emerging pathogens, remains constrained by the lack of suitable reference sequences. De novo assembly is the best approach to achieve an accurate finished sequence, but multiple sequencing platforms or paired-end libraries are often required to achieve full genome coverage. In this study, we demonstrated a method to assemble complete bacterial genome sequences by integrating shotgun Roche 454 pyrosequencing with optical whole genome mapping (WGM). The whole genome restriction map (WGRM) was used as the reference to scaffold de novo assembled sequence contigs through a stepwise process. Large de novo contigs were placed in the correct order and orientation through alignment to the WGRM. De novo contigs that were not aligned to WGRM were merged into scaffolds using contig branching structure information. These extended scaffolds were then aligned to the WGRM to identify the overlaps to be eliminated and the gaps and mismatches to be resolved with unused contigs. The process was repeated until a sequence with full coverage and alignment with the whole genome map was achieved. Using this method we were able to achieved 100% WGRM coverage without a paired-end library. We assembled complete sequences for three distinct genetic components of a clinical isolate of Providencia stuartii: a bacterial chromosome, a novel bla NDM-1 plasmid, and a novel bacteriophage, without separately purifying them to homogeneity.

  5. Rapid Identification of Potential Drugs for Diabetic Nephropathy Using Whole-Genome Expression Profiles of Glomeruli

    Directory of Open Access Journals (Sweden)

    Jingsong Shi

    2016-01-01

    Full Text Available Objective. To investigate potential drugs for diabetic nephropathy (DN using whole-genome expression profiles and the Connectivity Map (CMAP. Methodology. Eighteen Chinese Han DN patients and six normal controls were included in this study. Whole-genome expression profiles of microdissected glomeruli were measured using the Affymetrix human U133 plus 2.0 chip. Differentially expressed genes (DEGs between late stage and early stage DN samples and the CMAP database were used to identify potential drugs for DN using bioinformatics methods. Results. (1 A total of 1065 DEGs (FDR 1.5 were found in late stage DN patients compared with early stage DN patients. (2 Piperlongumine, 15d-PGJ2 (15-delta prostaglandin J2, vorinostat, and trichostatin A were predicted to be the most promising potential drugs for DN, acting as NF-κB inhibitors, histone deacetylase inhibitors (HDACIs, PI3K pathway inhibitors, or PPARγ agonists, respectively. Conclusion. Using whole-genome expression profiles and the CMAP database, we rapidly predicted potential DN drugs, and therapeutic potential was confirmed by previously published studies. Animal experiments and clinical trials are needed to confirm both the safety and efficacy of these drugs in the treatment of DN.

  6. Selective Whole-Genome Amplification Is a Robust Method That Enables Scalable Whole-Genome Sequencing of Plasmodium vivax from Unprocessed Clinical Samples

    Science.gov (United States)

    Loy, Dorothy E.; Sundararaman, Sesh A.; Valdivia, Hugo; Fisch, Kathleen; Lescano, Andres G.; Baldeviano, G. Christian; Durand, Salomon; Gerbasi, Vince; Sutherland, Colin J.; Nolder, Debbie; Vinetz, Joseph M.; Hahn, Beatrice H.

    2017-01-01

    ABSTRACT Whole-genome sequencing (WGS) of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA), which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax, the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA) can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP) characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens. PMID:28174312

  7. Selective Whole-Genome Amplification Is a Robust Method That Enables Scalable Whole-Genome Sequencing of Plasmodium vivax from Unprocessed Clinical Samples

    Directory of Open Access Journals (Sweden)

    Annie N. Cowell

    2017-02-01

    Full Text Available Whole-genome sequencing (WGS of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA, which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax, the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens.

  8. Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.

    Directory of Open Access Journals (Sweden)

    Nora Rieber

    Full Text Available The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina's HiSeq2000, Life Technologies' SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics' technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies' platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other

  9. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Ahrenfeldt, Johanne; Bellod Cisneros, Jose Luis;

    2016-01-01

    web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes...... and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https......Recent advances in whole genome sequencing have made the technology available for routine use in microbiological laboratories. However, a major obstacle for using this technology is the availability of simple and automatic bioinformatics tools. Based on previously published and already available...

  10. Whole-genome sequencing identifies emergence of a quinolone resistance mutation in a case of Stenotrophomonas maltophilia bacteremia.

    Science.gov (United States)

    Pak, Theodore R; Altman, Deena R; Attie, Oliver; Sebra, Robert; Hamula, Camille L; Lewis, Martha; Deikus, Gintaras; Newman, Leah C; Fang, Gang; Hand, Jonathan; Patel, Gopi; Wallach, Fran; Schadt, Eric E; Huprikar, Shirish; van Bakel, Harm; Kasarskis, Andrew; Bashir, Ali

    2015-11-01

    Whole-genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient before and after development of levofloxacin resistance were assembled de novo and differed by one single-nucleotide variant in smeT, a repressor for multidrug efflux operon smeDEF. Along with sequenced isolates from five contemporaneous cases, they displayed considerable diversity compared against all published complete genomes. Whole-genome sequencing and complete assembly can conclusively identify resistance mechanisms emerging in S. maltophilia strains during clinical therapy.

  11. Whole genome sequence analysis suggests intratumoral heterogeneity in dissemination of breast cancer to lymph nodes.

    Directory of Open Access Journals (Sweden)

    Kevin Blighe

    Full Text Available BACKGROUND: Intratumoral heterogeneity may help drive resistance to targeted therapies in cancer. In breast cancer, the presence of nodal metastases is a key indicator of poorer overall survival. The aim of this study was to identify somatic genetic alterations in early dissemination of breast cancer by whole genome next generation sequencing (NGS of a primary breast tumor, a matched locally-involved axillary lymph node and healthy normal DNA from blood. METHODS: Whole genome NGS was performed on 12 µg (range 11.1-13.3 µg of DNA isolated from fresh-frozen primary breast tumor, axillary lymph node and peripheral blood following the DNA nanoball sequencing protocol. Single nucleotide variants, insertions, deletions, and substitutions were identified through a bioinformatic pipeline and compared to CIN25, a key set of genes associated with tumor metastasis. RESULTS: Whole genome sequencing revealed overlapping variants between the tumor and node, but also variants that were unique to each. Novel mutations unique to the node included those found in two CIN25 targets, TGIF2 and CCNB2, which are related to transcription cyclin activity and chromosomal stability, respectively, and a unique frameshift in PDS5B, which is required for accurate sister chromatid segregation during cell division. We also identified dominant clonal variants that progressed from tumor to node, including SNVs in TP53 and ARAP3, which mediates rearrangements to the cytoskeleton and cell shape, and an insertion in TOP2A, the expression of which is significantly associated with tumor proliferation and can segregate breast cancers by outcome. CONCLUSION: This case study provides preliminary evidence that primary tumor and early nodal metastasis have largely overlapping somatic genetic alterations. There were very few mutations unique to the involved node. However, significant conclusions regarding early dissemination needs analysis of a larger number of patient samples.

  12. Use of whole genome sequencing to determine the microevolution of Mycobacterium tuberculosis during an outbreak.

    Directory of Open Access Journals (Sweden)

    Midori Kato-Maeda

    Full Text Available RATIONALE: Current tools available to study the molecular epidemiology of tuberculosis do not provide information about the directionality and sequence of transmission for tuberculosis cases occurring over a short period of time, such as during an outbreak. Recently, whole genome sequencing has been used to study molecular epidemiology of Mycobacterium tuberculosis over short time periods. OBJECTIVE: To describe the microevolution of M. tuberculosis during an outbreak caused by one drug-susceptible strain. METHOD AND MEASUREMENTS: We included 9 patients with tuberculosis diagnosed during a period of 22 months, from a population-based study of the molecular epidemiology in San Francisco. Whole genome sequencing was performed using Illumina's sequencing by synthesis technology. A custom program written in Python was used to determine single nucleotide polymorphisms which were confirmed by PCR product Sanger sequencing. MAIN RESULTS: We obtained an average of 95.7% (94.1-96.9% coverage for each isolate and an average fold read depth of 73 (1 to 250. We found 7 single nucleotide polymorphisms among the 9 isolates. The single nucleotide polymorphisms data confirmed all except one known epidemiological link. The outbreak strain resulted in 5 bacterial variants originating from the index case A1 with 0-2 mutations per transmission event that resulted in a secondary case. CONCLUSIONS: Whole genome sequencing analysis from a recent outbreak of tuberculosis enabled us to identify microevolutionary events observable during transmission, to determine 0-2 single nucleotide polymorphisms per transmission event that resulted in a secondary case, and to identify new epidemiologic links in the chain of transmission.

  13. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    Energy Technology Data Exchange (ETDEWEB)

    Fröhlich, Eleonore, E-mail: eleonore.froehlich@medunigraz.at [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Meindl, Claudia; Wagner, Karin [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Leitinger, Gerd [Center for Medical Research, Medical University of Graz, Stiftingtalstr. 24, 8010 Graz (Austria); Institute for Cell Biology, Histology and Embryology, Medical University of Graz, Harrachgasse 21, 8010 Graz (Austria); Roblegg, Eva [Institute of Pharmaceutical Sciences, Department of Pharmaceutical Technology, Karl-Franzens-University of Graz, Universitätsplatz 1, 8010 Graz (Austria)

    2014-10-15

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays for NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. - Highlights: • Regulated functions were screened using whole genome expression assays. • Polystyrene particles regulated more genes than short carbon nanotubes. • Protein coating of polystyrene particles did not change regulation pattern. • Functions regulated by microarray were confirmed by cell-based assay.

  14. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling.

    Science.gov (United States)

    Meinel, Thomas; Krause, Antje

    2012-01-01

    In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.

  15. High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic

    Directory of Open Access Journals (Sweden)

    Sealfon Rachel

    2012-09-01

    Full Text Available Abstract Background Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x; four of the seven isolates were previously sequenced. Results Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961, 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Conclusions Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.

  16. Whole genome shotgun sequence of Bacillus amyloliquefaciens TF28, a biocontrol entophytic bacterium.

    Science.gov (United States)

    Zhang, Shumei; Jiang, Wei; Li, Jing; Meng, Liqiang; Cao, Xu; Hu, Jihua; Liu, Yushuai; Chen, Jingyu; Sha, Changqing

    2016-01-01

    Bacillus amyloliquefaciens TF28 is a biocontrol endophytic bacterium that is capable of inhibition of a broad range of plant pathogenic fungi. The strain has the potential to be developed into a biocontrol agent for use in agriculture. Here we report the whole-genome shotgun sequence of the strain. The genome size of B. amyloliquefaciens TF28 is 3,987,635 bp which consists of 3754 protein-coding genes, 65 tandem repeat sequences, 47 minisatellite DNA, 2 microsatellite DNA, 63 tRNA, 7rRNA, 6 sRNA, 3 prophage and CRISPR domains.

  17. When aging meets microgravity: whole genome promoters and enchancers transcription landscape in zebrafish onboard ISS

    Science.gov (United States)

    Arshanovskii, Kirill; Gusev, Oleg; Sychev, Vladimir; Poddubko, Svetlana; Deviatiiarov, Ruslan

    2016-07-01

    In order to gen new insights of gene regulation changes under conditions of real spaceflight, we have conducted whole-genome analysis of dynamic of promotes and enhancers transcriptional changes in zebrafish during prolonged exposure to real spaceflight. In the frame of Russia-Japan joint experiments "Aquatic Habitat"-"Aquarium" we have conducted Cap Analysis of Gene Expression (CAGE) assay of zebrafish in the rage from 7 to 40 days of real spaceflight onboard ISS. The analysis showed that both gene expression patterns and architecture of shapes and types of the promoters are affected by spaceflight environment.

  18. Reflections on the cost of "low-cost" whole genome sequencing: framing the health policy debate.

    Directory of Open Access Journals (Sweden)

    Timothy Caulfield

    2013-11-01

    Full Text Available The cost of whole genome sequencing is dropping rapidly. There has been a great deal of enthusiasm about the potential for this technological advance to transform clinical care. Given the interest and significant investment in genomics, this seems an ideal time to consider what the evidence tells us about potential benefits and harms, particularly in the context of health care policy. The scale and pace of adoption of this powerful new technology should be driven by clinical need, clinical evidence, and a commitment to put patients at the centre of health care policy.

  19. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation.

    Science.gov (United States)

    Sharma, C; Kumar, N; Pandey, R; Meis, J F; Chowdhary, A

    2016-09-01

    Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris strains from four Indian hospitals were highly related, suggesting clonal transmission. Further, all C. auris isolates originated from cases of fungaemia and were resistant to fluconazole (MIC >64 mg/L).

  20. Microfluidic screening and whole-genome sequencing identifies mutations associated with improved protein secretion by yeast

    DEFF Research Database (Denmark)

    Huang, Mingtao; Bai, Yunpeng; Sjostrom, Staffan L.

    2015-01-01

    interest in improving its protein secretion capacity. Due to the complexity of the secretory machinery in eukaryotic cells, it is difficult to apply rational engineering for construction of improved strains. Here we used high-throughput microfluidics for the screening of yeast libraries, generated by UV...... to construct efficient cell factories for protein secretion. The combined use of microfluidics screening and whole-genome sequencing to map the mutations associated with the improved phenotype can easily be adapted for other products and cell types to identify novel engineering targets, and this approach could...

  1. A green-cotyledon/stay-green mutant exemplifies the ancient whole-genome duplications in soybean.

    Science.gov (United States)

    Nakano, Michiharu; Yamada, Tetsuya; Masuda, Yu; Sato, Yutaka; Kobayashi, Hideki; Ueda, Hiroaki; Morita, Ryouhei; Nishimura, Minoru; Kitamura, Keisuke; Kusaba, Makoto

    2014-10-01

    The recent whole-genome sequencing of soybean (Glycine max) revealed that soybean experienced whole-genome duplications 59 million and 13 million years ago, and it has an octoploid-like genome in spite of its diploid nature. We analyzed a natural green-cotyledon mutant line, Tenshin-daiseitou. The physiological analysis revealed that Tenshin-daiseitou shows a non-functional stay-green phenotype in senescent leaves, which is similar to that of the mutant of Mendel's green-cotyledon gene I, the ortholog of SGR in pea. The identification of gene mutations and genetic segregation analysis suggested that defects in GmSGR1 and GmSGR2 were responsible for the green-cotyledon/stay-green phenotype of Tenshin-daiseitou, which was confirmed by RNA interference (RNAi) transgenic soybean experiments using GmSGR genes. The characterized green-cotyledon double mutant d1d2 was found to have the same mutations, suggesting that GmSGR1 and GmSGR2 are D1 and D2. Among the examined d1d2 strains, the d1d2 strain K144a showed a lower Chl a/b ratio in mature seeds than other strains but not in senescent leaves, suggesting a seed-specific genetic factor of the Chl composition in K144a. Analysis of the soybean genome sequence revealed four genomic regions with microsynteny to the Arabidopsis SGR1 region, which included the GmSGR1 and GmSGR2 regions. The other two regions contained GmSGR3a/GmSGR3b and GmSGR4, respectively, which might be pseudogenes or genes with a function that is unrelated to Chl degradation during seed maturation and leaf senescence. These GmSGR genes were thought to be produced by the two whole-genome duplications, and they provide a good example of such whole-genome duplication events in the evolution of the soybean genome.

  2. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing

    OpenAIRE

    Bowers, John E.; Pearl, Stephanie A; Burke, John M.

    2016-01-01

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads f...

  3. The effect of whole genome amplification on samples originating from more than one donor

    DEFF Research Database (Denmark)

    Thacker, C.R.; Balogh, M.K.; Børsting, Claus;

    2006-01-01

    In this study, the GenomiPhi(TM) DNA Amplification Kit (Amersham Biosciences) was used to investigate the potential of whole genome amplification (WGA) when considering samples originating from more than one donor. DNA was extracted from blood samples, quantified and normalised before being mixed...... found to match the expected peak ratios regardless of the starting concentration of DNA. With samples mixed in the ratio of 1:7 and 1:15, and when the concentration of starting material was at the manufacturer's lower limit, too few minor component peaks were found to allow for statistical analysis...

  4. Whole genome sequence analysis of circulating Bluetongue virus serotype 11 strains from the United States including two domestic canine isolates.

    Science.gov (United States)

    Gaudreault, Natasha N; Jasperson, Dane C; Dubovi, Edward J; Johnson, Donna J; Ostlund, Eileen N; Wilson, William C

    2015-07-01

    Bluetongue virus (BTV) is a vector-transmitted pathogen that typically infects and causes disease in domestic and wild ruminants. BTV is also known to infect domestic canines as discovered when dogs were vaccinated with a BTV-contaminated vaccine. Canine BTV infections have been documented through serological surveys, and natural infection by the Culicoides vector has been suggested. The report of isolation of BTV serotype 11 (BTV-11) from 2 separate domestic canine abortion cases in the states of Texas in 2011 and Kansas in 2012, were apparently unrelated to BTV-contaminated vaccination or consumption of BTV-contaminated raw meat as had been previously speculated. To elucidate the origin and relationship of these 2 domestic canine BTV-11 isolates, whole genome sequencing was performed. Six additional BTV-11 field isolates from Texas, Florida, and Washington, submitted for diagnostic investigation during 2011 and 2013, were also fully sequenced and analyzed. The phylogenetic analysis indicates that the BTV-11 domestic canine isolates are virtually identical, and both share high identity with 2 BTV-11 isolates identified from white-tailed deer in Texas in 2011. The results of the current study further support the hypothesis that a BTV-11 strain circulating in the Midwestern states could have been transmitted to the dogs by the infected Culicoides vector. Our study also expands the short list of available BTV-11 sequences, which may aid BTV surveillance and epidemiology.

  5. Whole-Genome Sequencing for Routine Pathogen Surveillance in Public Health: a Population Snapshot of Invasive Staphylococcus aureus in Europe

    Directory of Open Access Journals (Sweden)

    David M. Aanensen

    2016-05-01

    Full Text Available The implementation of routine whole-genome sequencing (WGS promises to transform our ability to monitor the emergence and spread of bacterial pathogens. Here we combined WGS data from 308 invasive Staphylococcus aureus isolates corresponding to a pan-European population snapshot, with epidemiological and resistance data. Geospatial visualization of the data is made possible by a generic software tool designed for public health purposes that is available at the project URL (http://www.microreact.org/project/EkUvg9uY?tt=rc. Our analysis demonstrates that high-risk clones can be identified on the basis of population level properties such as clonal relatedness, abundance, and spatial structuring and by inferring virulence and resistance properties on the basis of gene content. We also show that in silico predictions of antibiotic resistance profiles are at least as reliable as phenotypic testing. We argue that this work provides a comprehensive road map illustrating the three vital components for future molecular epidemiological surveillance: (i large-scale structured surveys, (ii WGS, and (iii community-oriented database infrastructure and analysis tools.

  6. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing.

    Science.gov (United States)

    Ranjan, Ravi; Rani, Asha; Metwally, Ahmed; McGee, Halvor S; Perkins, David L

    2016-01-22

    The human microbiome has emerged as a major player in regulating human health and disease. Translational studies of the microbiome have the potential to indicate clinical applications such as fecal transplants and probiotics. However, one major issue is accurate identification of microbes constituting the microbiota. Studies of the microbiome have frequently utilized sequencing of the conserved 16S ribosomal RNA (rRNA) gene. We present a comparative study of an alternative approach using whole genome shotgun sequencing (WGS). In the present study, we analyzed the human fecal microbiome compiling a total of 194.1 × 10(6) reads from a single sample using multiple sequencing methods and platforms. Specifically, after establishing the reproducibility of our methods with extensive multiplexing, we compared: 1) The 16S rRNA amplicon versus the WGS method, 2) the Illumina HiSeq versus MiSeq platforms, 3) the analysis of reads versus de novo assembled contigs, and 4) the effect of shorter versus longer reads. Our study demonstrates that whole genome shotgun sequencing has multiple advantages compared with the 16S amplicon method including enhanced detection of bacterial species, increased detection of diversity and increased prediction of genes. In addition, increased length, either due to longer reads or the assembly of contigs, improved the accuracy of species detection.

  7. Kernel-based whole-genome prediction of complex traits: a review

    Science.gov (United States)

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics. PMID:25360145

  8. Whole genome analysis of linezolid resistance in Streptococcus pneumoniae reveals resistance and compensatory mutations

    Directory of Open Access Journals (Sweden)

    Légaré Danielle

    2011-10-01

    Full Text Available Abstract Background Several mutations were present in the genome of Streptococcus pneumoniae linezolid-resistant strains but the role of several of these mutations had not been experimentally tested. To analyze the role of these mutations, we reconstituted resistance by serial whole genome transformation of a novel resistant isolate into two strains with sensitive background. We sequenced the parent mutant and two independent transformants exhibiting similar minimum inhibitory concentration to linezolid. Results Comparative genomic analyses revealed that transformants acquired G2576T transversions in every gene copy of 23S rRNA and that the number of altered copies correlated with the level of linezolid resistance and cross-resistance to florfenicol and chloramphenicol. One of the transformants also acquired a mutation present in the parent mutant leading to the overexpression of an ABC transporter (spr1021. The acquisition of these mutations conferred a fitness cost however, which was further enhanced by the acquisition of a mutation in a RNA methyltransferase implicated in resistance. Interestingly, the fitness of the transformants could be restored in part by the acquisition of altered copies of the L3 and L16 ribosomal proteins and by mutations leading to the overexpression of the spr1887 ABC transporter that were present in the original linezolid-resistant mutant. Conclusions Our results demonstrate the usefulness of whole genome approaches at detecting major determinants of resistance as well as compensatory mutations that alleviate the fitness cost associated with resistance.

  9. Rediscovery by Whole Genome Sequencing: Classical Mutations and Genome Polymorphisms in Neurospora crassa

    Energy Technology Data Exchange (ETDEWEB)

    McCluskey, Kevin; Wiest, Aric E.; Grigoriev, Igor V.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Baker, Scott E.

    2011-06-02

    Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.

  10. Comparative whole genome sequence analysis of wild-type and cidofovir-resistant monkeypoxvirus

    Directory of Open Access Journals (Sweden)

    Huggins John

    2010-05-01

    Full Text Available Abstract We performed whole genome sequencing of a cidofovir {[(S-1-(3-hydroxy-2-phosphonylmethoxy-propyl cytosine] [HPMPC]}-resistant (CDV-R strain of Monkeypoxvirus (MPV. Whole-genome comparison with the wild-type (WT strain revealed 55 single-nucleotide polymorphisms (SNPs and one tandem-repeat contraction. Over one-third of all identified SNPs were located within genes comprising the poxvirus replication complex, including the DNA polymerase, RNA polymerase, mRNA capping methyltransferase, DNA processivity factor, and poly-A polymerase. Four polymorphic sites were found within the DNA polymerase gene. DNA polymerase mutations observed at positions 314 and 684 in MPV were consistent with CDV-R loci previously identified in Vaccinia virus (VACV. These data suggest the mechanism of CDV resistance may be highly conserved across Orthopoxvirus (OPV species. SNPs were also identified within virulence genes such as the A-type inclusion protein, serine protease inhibitor-like protein SPI-3, Schlafen ATPase and thymidylate kinase, among others. Aberrant chain extension induced by CDV may lead to diverse alterations in gene expression and viral replication that may result in both adaptive and attenuating mutations. Defining the potential contribution of substitutions in the replication complex and RNA processing machinery reported here may yield further insight into CDV resistance and may augment current therapeutic development strategies.

  11. Whole-Genome Mapping as a Novel High-Resolution Typing Tool for Legionella pneumophila.

    Science.gov (United States)

    Bosch, Thijs; Euser, Sjoerd M; Landman, Fabian; Bruin, Jacob P; IJzerman, Ed P; den Boer, Jeroen W; Schouls, Leo M

    2015-10-01

    Legionella is the causative agent for Legionnaires' disease (LD) and is responsible for several large outbreaks in the world. More than 90% of LD cases are caused by Legionella pneumophila, and studies on the origin and transmission routes of this pathogen rely on adequate molecular characterization of isolates. Current typing of L. pneumophila mainly depends on sequence-based typing (SBT). However, studies have shown that in some outbreak situations, SBT does not have sufficient discriminatory power to distinguish between related and nonrelated L. pneumophila isolates. In this study, we used a novel high-resolution typing technique, called whole-genome mapping (WGM), to differentiate between epidemiologically related and nonrelated L. pneumophila isolates. Assessment of the method by various validation experiments showed highly reproducible results, and WGM was able to confirm two well-documented Dutch L. pneumophila outbreaks. Comparison of whole-genome maps of the two outbreaks together with WGMs of epidemiologically nonrelated L. pneumophila isolates showed major differences between the maps, and WGM yielded a higher discriminatory power than SBT. In conclusion, WGM can be a valuable alternative to perform outbreak investigations of L. pneumophila in real time since the turnaround time from culture to comparison of the L. pneumophila maps is less than 24 h.

  12. Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data.

    Directory of Open Access Journals (Sweden)

    Frederick E Dewey

    2015-10-01

    Full Text Available High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

  13. Computel: computation of mean telomere length from whole-genome next-generation sequencing data.

    Directory of Open Access Journals (Sweden)

    Lilit Nersisyan

    Full Text Available Telomeres are the ends of eukaryotic chromosomes, consisting of consecutive short repeats that protect chromosome ends from degradation. Telomeres shorten with each cell division, leading to replicative cell senescence. Deregulation of telomere length homeostasis is associated with the development of various age-related diseases and cancers. A number of experimental techniques exist for telomere length measurement; however, until recently, the absence of tools for extracting telomere lengths from high-throughput sequencing data has significantly obscured the association of telomere length with molecular processes in normal and diseased conditions. We have developed Computel, a program in R for computing mean telomere length from whole-genome next-generation sequencing data. Computel is open source, and is freely available at https://github.com/lilit-nersisyan/computel. It utilizes a short-read alignment-based approach and integrates various popular tools for sequencing data analysis. We validated it with synthetic and experimental data, and compared its performance with the previously available software. The results have shown that Computel outperforms existing software in accuracy, independence of results from sequencing conditions, stability against inherent sequencing errors, and better ability to distinguish pure telomeric sequences from interstitial telomeric repeats. By providing a highly reliable methodology for determining telomere lengths from whole-genome sequencing data, Computel should help to elucidate the role of telomeres in cellular health and disease.

  14. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing

    Science.gov (United States)

    Ranjan, Ravi; Rani, Asha; Metwally, Ahmed; McGee, Halvor S.; Perkins, David L.

    2016-01-01

    The human microbiome has emerged as a major player in regulating human health and disease. Translation studies of the microbiome have the potential to indicate clinical applications such as fecal transplants and probiotics. However, one major issue is accurate identification of microbes constituting the microbiota. Studies of the microbiome have frequently utilized sequencing of the conserved 16S ribosomal RNA (rRNA) gene. We present a comparative study of an alternative approach using shotgun whole genome sequencing (WGS). In the present study, we analyzed the human fecal microbiome compiling a total of 194.1×106 reads from a single sample using multiple sequencing methods and platforms. Specifically, after establishing the reproducibility of our methods with extensive multiplexing, we compared: 1) The 16S rRNA amplicon versus the WGS method, 2) the Illumina HiSeq versus MiSeq platforms, 3) the analysis of reads versus de novo assembled contigs, and 4) the effect of shorter versus longer reads. Our study demonstrates that shotgun whole genome sequencing has multiple advantages compared with the 16S amplicon method including enhanced detection of bacterial species, increased detection of diversity and increased prediction of genes. In addition, increased length, either due to longer reads or the assembly of contigs, improved the accuracy of species detection. PMID:26718401

  15. Construction of a phylogenetic tree of photosynthetic prokaryotes based on average similarities of whole genome sequences.

    Directory of Open Access Journals (Sweden)

    Soichirou Satoh

    Full Text Available Phylogenetic trees have been constructed for a wide range of organisms using gene sequence information, especially through the identification of orthologous genes that have been vertically inherited. The number of available complete genome sequences is rapidly increasing, and many tools for construction of genome trees based on whole genome sequences have been proposed. However, development of a reasonable method of using complete genome sequences for construction of phylogenetic trees has not been established. We have developed a method for construction of phylogenetic trees based on the average sequence similarities of whole genome sequences. We used this method to examine the phylogeny of 115 photosynthetic prokaryotes, i.e., cyanobacteria, Chlorobi, proteobacteria, Chloroflexi, Firmicutes and nonphotosynthetic organisms including Archaea. Although the bootstrap values for the branching order of phyla were low, probably due to lateral gene transfer and saturated mutation, the obtained tree was largely consistent with the previously reported phylogenetic trees, indicating that this method is a robust alternative to traditional phylogenetic methods.

  16. Whole genome phylogeny of Prochlorococcus marinus group of cyanobacteria: genome alignment and overlapping gene approach.

    Science.gov (United States)

    Prabha, Ratna; Singh, Dhananjaya P; Gupta, Shailendra K; Rai, Anil

    2014-06-01

    Prochlorococcus is the smallest known oxygenic phototrophic marine cyanobacterium dominating the mid-latitude oceans. Physiologically and genetically distinct P. marinus isolates from many oceans in the world were assigned two different groups, a tightly clustered high-light (HL)-adapted and a divergent low-light (LL-) adapted clade. Phylogenetic analysis of this cyanobacterium on the basis of 16S rRNA and other conserved genes did not show consistency with its phenotypic behavior. We analyzed phylogeny of this genus on the basis of complete genome sequences through genome alignment, overlapping-gene content and gene-order approach. Phylogenetic tree of P. marinus obtained by comparing whole genome sequences in contrast to that based on 16S rRNA gene, corresponded well with the HL/LL ecotypic distinction of twelve strains and showed consistency with phenotypic classification of P. marinus. Evidence for the horizontal descent and acquisition of genes within and across the genus was observed. Many genes involved in metabolic functions were found to be conserved across these genomes and many were continuously gained by different strains as per their needs during the course of their evolution. Consistency in the physiological and genetic phylogeny based on whole genome sequence is established. These observations improve our understanding about the adaptation and diversification of these organisms under evolutionary pressure.

  17. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence.

    Directory of Open Access Journals (Sweden)

    Frederick E Dewey

    2011-09-01

    Full Text Available Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs. We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.

  18. Integration of transcriptome and whole genomic resequencing data to identify key genes affecting swine fat deposition.

    Directory of Open Access Journals (Sweden)

    Kai Xing

    Full Text Available Fat deposition is highly correlated with the growth, meat quality, reproductive performance and immunity of pigs. Fatty acid synthesis takes place mainly in the adipose tissue of pigs; therefore, in this study, a high-throughput massively parallel sequencing approach was used to generate adipose tissue transcriptomes from two groups of Songliao black pigs that had opposite backfat thickness phenotypes. The total number of paired-end reads produced for each sample was in the range of 39.29-49.36 millions. Approximately 188 genes were differentially expressed in adipose tissue and were enriched for metabolic processes, such as fatty acid biosynthesis, lipid synthesis, metabolism of fatty acids, etinol, caffeine and arachidonic acid and immunity. Additionally, many genetic variations were detected between the two groups through pooled whole-genome resequencing. Integration of transcriptome and whole-genome resequencing data revealed important genomic variations among the differentially expressed genes for fat deposition, for example, the lipogenic genes. Further studies are required to investigate the roles of candidate genes in fat deposition to improve pig breeding programs.

  19. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data.

    Directory of Open Access Journals (Sweden)

    Can Alkan

    2007-09-01

    Full Text Available The major DNA constituent of primate centromeres is alpha satellite DNA. As much as 2%-5% of sequence generated as part of primate genome sequencing projects consists of this material, which is fragmented or not assembled as part of published genome sequences due to its highly repetitive nature. Here, we develop computational methods to rapidly recover and categorize alpha-satellite sequences from previously uncharacterized whole-genome shotgun sequence data. We present an algorithm to computationally predict potential higher-order array structure based on paired-end sequence data and then experimentally validate its organization and distribution by experimental analyses. Using whole-genome shotgun data from the human, chimpanzee, and macaque genomes, we examine the phylogenetic relationship of these sequences and provide further support for a model for their evolution and mutation over the last 25 million years. Our results confirm fundamental differences in the dispersal and evolution of centromeric satellites in the Old World monkey and ape lineages of evolution.

  20. Whole genome duplication affects evolvability of flowering time in an autotetraploid plant.

    Directory of Open Access Journals (Sweden)

    Sara L Martin

    Full Text Available Whole genome duplications have occurred recurrently throughout the evolutionary history of eukaryotes. The resulting genetic and phenotypic changes can influence physiological and ecological responses to the environment; however, the impact of genome copy number on evolvability has rarely been examined experimentally. Here, we evaluate the effect of genome duplication on the ability to respond to selection for early flowering time in lines drawn from naturally occurring diploid and autotetraploid populations of the plant Chamerion angustifolium (fireweed. We contrast this with the result of four generations of selection on synthesized neoautotetraploids, whose genic variability is similar to diploids but genome copy number is similar to autotetraploids. In addition, we examine correlated responses to selection in all three groups. Diploid and both extant tetraploid and neoautotetraploid lines responded to selection with significant reductions in time to flowering. Evolvability, measured as realized heritability, was significantly lower in extant tetraploids (^b(T =  0.31 than diploids (^b(T =  0.40. Neotetraploids exhibited the highest evolutionary response (^b(T  =  0.55. The rapid shift in flowering time in neotetraploids was associated with an increase in phenotypic variability across generations, but not with change in genome size or phenotypic correlations among traits. Our results suggest that whole genome duplications, without hybridization, may initially alter evolutionary rate, and that the dynamic nature of neoautopolyploids may contribute to the prevalence of polyploidy throughout eukaryotes.

  1. From days to hours: reporting clinically actionable variants from whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Sumit Middha

    Full Text Available As the cost of whole genome sequencing (WGS decreases, clinical laboratories will be looking at broadly adopting this technology to screen for variants of clinical significance. To fully leverage this technology in a clinical setting, results need to be reported quickly, as the turnaround rate could potentially impact patient care. The latest sequencers can sequence a whole human genome in about 24 hours. However, depending on the computing infrastructure available, the processing of data can take several days, with the majority of computing time devoted to aligning reads to genomics regions that are to date not clinically interpretable. In an attempt to accelerate the reporting of clinically actionable variants, we have investigated the utility of a multi-step alignment algorithm focused on aligning reads and calling variants in genomic regions of clinical relevance prior to processing the remaining reads on the whole genome. This iterative workflow significantly accelerates the reporting of clinically actionable variants with no loss of accuracy when compared to genotypes obtained with the OMNI SNP platform or to variants detected with a standard workflow that combines Novoalign and GATK.

  2. Computel: computation of mean telomere length from whole-genome next-generation sequencing data.

    Science.gov (United States)

    Nersisyan, Lilit; Arakelyan, Arsen

    2015-01-01

    Telomeres are the ends of eukaryotic chromosomes, consisting of consecutive short repeats that protect chromosome ends from degradation. Telomeres shorten with each cell division, leading to replicative cell senescence. Deregulation of telomere length homeostasis is associated with the development of various age-related diseases and cancers. A number of experimental techniques exist for telomere length measurement; however, until recently, the absence of tools for extracting telomere lengths from high-throughput sequencing data has significantly obscured the association of telomere length with molecular processes in normal and diseased conditions. We have developed Computel, a program in R for computing mean telomere length from whole-genome next-generation sequencing data. Computel is open source, and is freely available at https://github.com/lilit-nersisyan/computel. It utilizes a short-read alignment-based approach and integrates various popular tools for sequencing data analysis. We validated it with synthetic and experimental data, and compared its performance with the previously available software. The results have shown that Computel outperforms existing software in accuracy, independence of results from sequencing conditions, stability against inherent sequencing errors, and better ability to distinguish pure telomeric sequences from interstitial telomeric repeats. By providing a highly reliable methodology for determining telomere lengths from whole-genome sequencing data, Computel should help to elucidate the role of telomeres in cellular health and disease.

  3. Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

    Directory of Open Access Journals (Sweden)

    Roy Michael Robins-Browne

    2016-11-01

    Full Text Available The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E. coli, including biotyping, serotyping and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods.

  4. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L. via Whole-Genome Resequencing

    Directory of Open Access Journals (Sweden)

    John E. Bowers

    2016-07-01

    Full Text Available Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs of a cross between safflower (Carthamus tinctorius L. and its wild progenitor (C. palaestinus Eig. We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species.

  5. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing.

    Science.gov (United States)

    Bowers, John E; Pearl, Stephanie A; Burke, John M

    2016-07-07

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species.

  6. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce.

    Science.gov (United States)

    Reyes-Chin-Wo, Sebastian; Wang, Zhiwen; Yang, Xinhua; Kozik, Alexander; Arikit, Siwaret; Song, Chi; Xia, Liangfeng; Froenicke, Lutz; Lavelle, Dean O; Truco, María-José; Xia, Rui; Zhu, Shilin; Xu, Chunyan; Xu, Huaqin; Xu, Xun; Cox, Kyle; Korf, Ian; Meyers, Blake C; Michelmore, Richard W

    2017-04-12

    Lettuce (Lactuca sativa) is a major crop and a member of the large, highly successful Compositae family of flowering plants. Here we present a reference assembly for the species and family. This was generated using whole-genome shotgun Illumina reads plus in vitro proximity ligation data to create large superscaffolds; it was validated genetically and superscaffolds were oriented in genetic bins ordered along nine chromosomal pseudomolecules. We identify several genomic features that may have contributed to the success of the family, including genes encoding Cycloidea-like transcription factors, kinases, enzymes involved in rubber biosynthesis and disease resistance proteins that are expanded in the genome. We characterize 21 novel microRNAs, one of which may trigger phasiRNAs from numerous kinase transcripts. We provide evidence for a whole-genome triplication event specific but basal to the Compositae. We detect 26% of the genome in triplicated regions containing 30% of all genes that are enriched for regulatory sequences and depleted for genes involved in defence.

  7. Whole-genome single-nucleotide-polymorphism analysis for discrimination of Clostridium botulinum group I strains.

    Science.gov (United States)

    Gonzalez-Escalona, Narjol; Timme, Ruth; Raphael, Brian H; Zink, Donald; Sharma, Shashi K

    2014-04-01

    Clostridium botulinum is a genetically diverse Gram-positive bacterium producing extremely potent neurotoxins (botulinum neurotoxins A through G [BoNT/A-G]). The complete genome sequences of three strains harboring only the BoNT/A1 nucleotide sequence are publicly available. Although these strains contain a toxin cluster (HA(+) OrfX(-)) associated with hemagglutinin genes, little is known about the genomes of subtype A1 strains (termed HA(-) OrfX(+)) that lack hemagglutinin genes in the toxin gene cluster. We sequenced the genomes of three BoNT/A1-producing C. botulinum strains: two strains with the HA(+) OrfX(-) cluster (69A and 32A) and one strain with the HA(-) OrfX(+) cluster (CDC297). Whole-genome phylogenic single-nucleotide-polymorphism (SNP) analysis of these strains along with other publicly available C. botulinum group I strains revealed five distinct lineages. Strains 69A and 32A clustered with the C. botulinum type A1 Hall group, and strain CDC297 clustered with the C. botulinum type Ba4 strain 657. This study reports the use of whole-genome SNP sequence analysis for discrimination of C. botulinum group I strains and demonstrates the utility of this analysis in quickly differentiating C. botulinum strains harboring identical toxin gene subtypes. This analysis further supports previous work showing that strains CDC297 and 657 likely evolved from a common ancestor and independently acquired separate BoNT/A1 toxin gene clusters at distinct genomic locations.

  8. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    Science.gov (United States)

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  9. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

    Directory of Open Access Journals (Sweden)

    Shea N. Gardner

    2014-01-01

    Full Text Available Background. Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results. A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Each group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions. This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.

  10. Kernel-based whole-genome prediction of complex traits: a review

    Directory of Open Access Journals (Sweden)

    Gota eMorota

    2014-10-01

    Full Text Available Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways, thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  11. A multi-country outbreak of Salmonella Newport gastroenteritis in Europe associated with watermelon from Brazil, confirmed by whole genome sequencing: October 2011 to January 2012.

    Science.gov (United States)

    Byrne, L; Fisher, I; Peters, T; Mather, A; Thomson, N; Rosner, B; Bernard, H; McKeown, P; Cormican, M; Cowden, J; Aiyedun, V; Lane, C

    2014-08-07

    In November 2011, the presence of Salmonella Newport in a ready-to-eat watermelon slice was confirmed as part of a local food survey in England. In late December 2011, cases of S. Newport were reported in England, Wales, Northern Ireland, Scotland, Ireland and Germany. During the outbreak, 63 confirmed cases of S. Newport were reported across all six countries with isolates indistinguishable by pulsed-field gel electrophoresis from the watermelon isolate.A subset of outbreak isolates were whole-genome sequenced and were identical to, or one single nucleotide polymorphism different from the watermelon isolate.In total, 46 confirmed cases were interviewed of which 27 reported watermelon consumption. Further investigations confirmed the outbreak was linked to the consumption of watermelon imported from Brazil.Although numerous Salmonella outbreaks associated with melons have been reported in the United States and elsewhere, this is the first of its kind in Europe.Expansion of the melon import market from Brazil represents a potential threat for future outbreaks. Whole genome sequencing is rapidly becoming more accessible and can provide a compelling level of evidence of linkage between human cases and sources of infection,to support public health interventions in global food markets.

  12. Whole-genome thermodynamic analysis reduces siRNA off-target effects.

    Directory of Open Access Journals (Sweden)

    Xi Chen

    Full Text Available Small interfering RNAs (siRNAs are important tools for knocking down targeted genes, and have been widely applied to biological and biomedical research. To design siRNAs, two important aspects must be considered: the potency in knocking down target genes and the off-target effect on any nontarget genes. Although many studies have produced useful tools to design potent siRNAs, off-target prevention has mostly been delegated to sequence-level alignment tools such as BLAST. We hypothesize that whole-genome thermodynamic analysis can identify potential off-targets with higher precision and help us avoid siRNAs that may have strong off-target effects. To validate this hypothesis, two siRNA sets were designed to target three human genes IDH1, ITPR2 and TRIM28. They were selected from the output of two popular siRNA design tools, siDirect and siDesign. Both siRNA design tools have incorporated sequence-level screening to avoid off-targets, thus their output is believed to be optimal. However, one of the sets we tested has off-target genes predicted by Picky, a whole-genome thermodynamic analysis tool. Picky can identify off-target genes that may hybridize to a siRNA within a user-specified melting temperature range. Our experiments validated that some off-target genes predicted by Picky can indeed be inhibited by siRNAs. Similar experiments were performed using commercially available siRNAs and a few off-target genes were also found to be inhibited as predicted by Picky. In summary, we demonstrate that whole-genome thermodynamic analysis can identify off-target genes that are missed in sequence-level screening. Because Picky prediction is deterministic according to thermodynamics, if a siRNA candidate has no Picky predicted off-targets, it is unlikely to cause off-target effects. Therefore, we recommend including Picky as an additional screening step in siRNA design.

  13. Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data

    Directory of Open Access Journals (Sweden)

    Fujiyama Asao

    2010-04-01

    Full Text Available Abstract Background Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. Results We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for γ-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. Conclusions The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B

  14. Whole genome sequencing reveals genomic heterogeneity and antibiotic purification in Mycobacterium tuberculosis isolates

    KAUST Repository

    Black, PA

    2015-10-24

    Background Whole genome sequencing has revolutionised the interrogation of mycobacterial genomes. Recent studies have reported conflicting findings on the genomic stability of Mycobacterium tuberculosis during the evolution of drug resistance. In an age where whole genome sequencing is increasingly relied upon for defining the structure of bacterial genomes, it is important to investigate the reliability of next generation sequencing to identify clonal variants present in a minor percentage of the population. This study aimed to define a reliable cut-off for identification of low frequency sequence variants and to subsequently investigate genetic heterogeneity and the evolution of drug resistance in M. tuberculosis. Methods Genomic DNA was isolated from single colonies from 14 rifampicin mono-resistant M. tuberculosis isolates, as well as the primary cultures and follow up MDR cultures from two of these patients. The whole genomes of the M. tuberculosis isolates were sequenced using either the Illumina MiSeq or Illumina HiSeq platforms. Sequences were analysed with an in-house pipeline. Results Using next-generation sequencing in combination with Sanger sequencing and statistical analysis we defined a read frequency cut-off of 30 % to identify low frequency M. tuberculosis variants with high confidence. Using this cut-off we demonstrated a high rate of genetic diversity between single colonies isolated from one population, showing that by using the current sequencing technology, single colonies are not a true reflection of the genetic diversity within a whole population and vice versa. We further showed that numerous heterogeneous variants emerge and then disappear during the evolution of isoniazid resistance within individual patients. Our findings allowed us to formulate a model for the selective bottleneck which occurs during the course of infection, acting as a genomic purification event. Conclusions Our study demonstrated true levels of genetic diversity

  15. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

    Science.gov (United States)

    Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

    2016-10-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant

  16. Bioinformatics Workflow for Clinical Whole Genome Sequencing at Partners HealthCare Personalized Medicine

    Directory of Open Access Journals (Sweden)

    Ellen A. Tsai

    2016-02-01

    Full Text Available Effective implementation of precision medicine will be enhanced by a thorough understanding of each patient’s genetic composition to better treat his or her presenting symptoms or mitigate the onset of disease. This ideally includes the sequence information of a complete genome for each individual. At Partners HealthCare Personalized Medicine, we have developed a clinical process for whole genome sequencing (WGS with application in both healthy individuals and those with disease. In this manuscript, we will describe our bioinformatics strategy to efficiently process and deliver genomic data to geneticists for clinical interpretation. We describe the handling of data from FASTQ to the final variant list for clinical review for the final report. We will also discuss our methodology for validating this workflow and the cost implications of running WGS.

  17. A strategic stakeholder approach for addressing further analysis requests in whole genome sequencing research.

    Science.gov (United States)

    Thornock, Bradley Steven O

    2016-01-01

    Whole genome sequencing (WGS) can be a cost-effective and efficient means of diagnosis for some children, but it also raises a number of ethical concerns. One such concern is how researchers derive and communicate results from WGS, including future requests for further analysis of stored sequences. The purpose of this paper is to think about what is at stake, and for whom, in any solution that is developed to deal with such requests. To accomplish this task, this paper will utilize stakeholder theory, a common method used in business ethics. Several scenarios that connect stakeholder concerns and WGS will also posited and analyzed. This paper concludes by developing criteria composed of a series of questions that researchers can answer in order to more effectively address requests for further analysis of stored sequences.

  18. Whole genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing

    Science.gov (United States)

    Harris, Simon R.; Clarke, Ian N.; Seth-Smith, Helena M. B.; Solomon, Anthony W.; Cutcliffe, Lesley T.; Marsh, Peter; Skilton, Rachel J.; Holland, Martin J.; Mabey, David; Peeling, Rosanna W.; Lewis, David A.; Spratt, Brian G.; Unemo, Magnus; Persson, Kenneth; Bjartling, Carina; Brunham, Robert; de Vries, Henry J.C.; Morré, Servaas A.; Speksnijder, Arjen; Bébéar, Cécile M.; Clerc, Maïté; de Barbeyrac, Bertille; Parkhill, Julian; Thomson, Nicholas R.

    2012-01-01

    Chlamydia trachomatis is responsible for both trachoma and sexually transmitted infections causing substantial morbidity and economic cost globally. Despite this, our knowledge of its population and evolutionary genetics is limited. Here we present a detailed whole genome phylogeny from representative strains of both trachoma and lymphogranuloma venereum (LGV) biovars from temporally and geographically diverse sources. Our analysis demonstrates that predicting phylogenetic structure using the ompA gene, traditionally used to classify Chlamydia, is misleading because extensive recombination in this region masks true relationships. We show that in many instances ompA is a chimera that can be exchanged in part or whole, both within and between biovars. We also provide evidence for exchange of, and recombination within, the cryptic plasmid, another important diagnostic target. We have used our phylogenetic framework to show how genetic exchange has manifested itself in ocular, urogenital and LGV C. trachomatis strains, including the epidemic LGV serotype L2b. PMID:22406642

  19. Whole-genome sequence comparisons reveal the evolution of Vibrio cholerae O1.

    Science.gov (United States)

    Kim, Eun Jin; Lee, Chan Hee; Nair, G Balakrish; Kim, Dong Wook

    2015-08-01

    The analysis of the whole-genome sequences of Vibrio cholerae strains from previous and current cholera pandemics has demonstrated that genomic changes and alterations in phage CTX (particularly in the gene encoding the B subunit of cholera toxin) were major features in the evolution of V. cholerae. Recent studies have revealed the genetic mechanisms in these bacteria by which new variants of V. cholerae are generated from type-specific strains; these mechanisms suggest that certain strains are selected by environmental or human factors over time. By understanding the mechanisms and driving forces of historical and current changes in the V. cholerae population, it would be possible to predict the direction of such changes and the evolution of new variants; this has implications for the battle against cholera.

  20. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci.

    Science.gov (United States)

    Rasmussen, L H; Dargis, R; Højholt, K; Christensen, J J; Skovgaard, O; Justesen, U S; Rosenvinge, F S; Moser, C; Lukjancenko, O; Rasmussen, S; Nielsen, X C

    2016-10-01

    Identification of Mitis group streptococci (MGS) to the species level is challenging for routine microbiology laboratories. Correct identification is crucial for the diagnosis of infective endocarditis, identification of treatment failure, and/or infection relapse. Eighty MGS from Danish patients with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were observed in single gene analyses. Species identification based on single gene analysis showed their limitations when more strains were included. In contrast, analyses incorporating more sequence data, like MLSA, SNPs and core-genome analyses, provided more distinct clustering. The core-genome tree showed the most distinct clustering.

  1. Rapid high resolution genotyping of Francisella tularensis by whole genome sequence comparison of annotated genes ("MLST+".

    Directory of Open Access Journals (Sweden)

    Markus H Antwerpen

    Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.

  2. Methylated DNA is over-represented in whole-genome bisulfite sequencing data

    Directory of Open Access Journals (Sweden)

    Lexiang eJi

    2014-10-01

    Full Text Available The development of whole-genome bisulfite sequencing (WGBS has led to a number of exciting discoveries about the role of DNA methylation leading to a plethora of novel testable hypotheses. Methods for constructing sodium bisulfite-converted and amplified libraries have recently advanced to the point that the bottleneck for experiments that use WGBS has shifted to data analysis and interpretation. Here we present empirical evidence for an over-representation of reads from methylated DNA in WGBS. This enrichment for methylated DNA is exacerbated by higher cycles of PCR and is influenced by the type of uracil-insensitive DNA polymerase used for amplifying the sequencing library. Future efforts to computationally correct for this enrichment bias will be essential to increasing the accuracy of determining methylation levels for individual cytosines. It is especially critical for studies that seek to accurately quantify DNA methylation levels in populations that may segregate for allelic DNA methylation states.

  3. A Danish Salmonella Bareilly outbreak investigated by the use of whole genome sequencing

    DEFF Research Database (Denmark)

    Torpdahl, M.; Kiil, K.; Litrup, E.

    2013-01-01

    In 2012, we saw an increase of the Salmonella serotype Bareilly isolated from human infections. Bareilly is a rare serotype in Denmark, isolated from human infections between 2 and 9 times annually over the last 10 years. As a routine in rare serotypes, we use PFGE as the molecular method...... and broilers differed by two bands When using PFGE in outbreak investigation there are some interpretative implications that have to be considered. There are differences on how important band changes are when defining clusters of different serotypes. Some outbreaks have been reported to include PFGE profiles...... with several band changes and others are defined by one PFGE profile thereby excluding closely related profiles. We decided to investigate whether whole genome sequencing (WGS) could resolve this issue and be useful in outbreak investigations. Several analyses were performed, including a SNP tree based...

  4. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences

    KAUST Repository

    Coll, Francesc

    2015-05-27

    Mycobacterium tuberculosis drug resistance (DR) challenges effective tuberculosis disease control. Current molecular tests examine limited numbers of mutations, and although whole genome sequencing approaches could fully characterise DR, data complexity has restricted their clinical application. A library (1,325 mutations) predictive of DR for 15 anti-tuberculosis drugs was compiled and validated for 11 of them using genomic-phenotypic data from 792 strains. A rapid online ‘TB-Profiler’ tool was developed to report DR and strain-type profiles directly from raw sequences. Using our DR mutation library, in silico diagnostic accuracy was superior to some commercial diagnostics and alternative databases. The library will facilitate sequence-based drug-susceptibility testing.

  5. Landscape of somatic mutations in 560 breast cancer whole genome sequences

    Science.gov (United States)

    Nik-Zainal, Serena; Davies, Helen; Staaf, Johan; Ramakrishna, Manasa; Glodzik, Dominik; Zou, Xueqing; Martincorena, Inigo; Alexandrov, Ludmil B.; Martin, Sancha; Wedge, David C.; Van Loo, Peter; Ju, Young Seok; Smid, Marcel; Brinkman, Arie B; Morganella, Sandro; Aure, Miriam R.; Lingjærde, Ole Christian; Langerød, Anita; Ringnér, Markus; Ahn, Sung-Min; Boyault, Sandrine; Brock, Jane E.; Broeks, Annegien; Butler, Adam; Desmedt, Christine; Dirix, Luc; Dronov, Serge; Fatima, Aquila; Foekens, John A.; Gerstung, Moritz; Hooijer, Gerrit KJ; Jang, Se Jin; Jones, David R.; Kim, Hyung-Yong; King, Tari A.; Krishnamurthy, Savitri; Lee, Hee Jin; Lee, Jeong-Yeon; Li, Yilong; McLaren, Stuart; Menzies, Andrew; Mustonen, Ville; O’Meara, Sarah; Pauporté, Iris; Pivot, Xavier; Purdie, Colin A.; Raine, Keiran; Ramakrishnan, Kamna; Rodríguez-González, F. Germán; Romieu, Gilles; Sieuwerts, Anieta M.; Simpson, Peter T; Shepherd, Rebecca; Stebbings, Lucy; Stefansson, Olafur A; Teague, Jon; Tommasi, Stefania; Treilleux, Isabelle; Van den Eynden, Gert G.; Vermeulen, Peter; Vincent-Salomon, Anne; Yates, Lucy; Caldas, Carlos; van’t Veer, Laura; Tutt, Andrew; Knappskog, Stian; Tan, Benita Kiat Tee; Jonkers, Jos; Borg, Åke; Ueno, Naoto T; Sotiriou, Christos; Viari, Alain; Futreal, P. Andrew; Campbell, Peter J; Span, Paul N.; Van Laere, Steven; Lakhani, Sunil R; Eyfjord, Jorunn E.; Thompson, Alastair M.; Birney, Ewan; Stunnenberg, Hendrik G; van de Vijver, Marc J; Martens, John W.M.; Børresen-Dale, Anne-Lise; Richardson, Andrea L.; Kong, Gu; Thomas, Gilles; Stratton, Michael R.

    2016-01-01

    We analysed whole genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. 93 protein-coding cancer genes carried likely driver mutations. Some non-coding regions exhibited high mutation frequencies but most have distinctive structural features probably causing elevated mutation rates and do not harbour driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed 12 base substitution and six rearrangement signatures. Three rearrangement signatures, characterised by tandem duplications or deletions, appear associated with defective homologous recombination based DNA repair: one with deficient BRCA1 function; another with deficient BRCA1 or BRCA2 function; the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operative, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer. PMID:27135926

  6. Determining the repertoire of immunodominant proteins via whole-genome amplification of intracellular pathogens.

    Directory of Open Access Journals (Sweden)

    Michael J Dark

    Full Text Available Culturing many obligate intracellular bacteria is difficult or impossible. However, these organisms have numerous adaptations allowing for infection persistence and immune system evasion, making them some of the most interesting to study. Recent advancements in genome sequencing, pyrosequencing and Phi29 amplification, have allowed for examination of whole-genome sequences of intracellular bacteria without culture. We have applied both techniques to the model obligate intracellular pathogen Anaplasma marginale and the human pathogen Anaplasma phagocytophilum, in order to examine the ability of phi29 amplification to determine the sequence of genes allowing for immune system evasion and long-term persistence in the host. When compared to traditional pyrosequencing, phi29-mediated genome amplification had similar genome coverage, with no additional gaps in coverage. Additionally, all msp2 functional pseudogenes from two strains of A. marginale were detected and extracted from the phi29-amplified genomes, highlighting its utility in determining the full complement of genes involved in immune evasion.

  7. Lysis of a Single Cyanobacterium for Whole Genome Amplification

    Directory of Open Access Journals (Sweden)

    Richard N. Zare

    2013-08-01

    Full Text Available Bacterial species from natural environments, exhibiting a great degree of genetic diversity that has yet to be characterized, pose a specific challenge to whole genome amplification (WGA from single cells. A major challenge is establishing an effective, compatible, and controlled lysis protocol. We present a novel lysis protocol that can be used to extract genomic information from a single cyanobacterium of Synechocystis sp. PCC 6803 known to have multilayer cell wall structures that resist conventional lysis methods. Simple but effective strategies for releasing genomic DNA from captured cells while retaining cellular identities for single-cell analysis are presented. Successful sequencing of genetic elements from single-cell amplicons prepared by multiple displacement amplification (MDA is demonstrated for selected genes (15 loci nearly equally spaced throughout the main chromosome.

  8. Small homologous blocks in phytophthora genomes do not point to an ancient whole-genome duplication.

    Science.gov (United States)

    van Hooff, Jolien J E; Snel, Berend; Seidl, Michael F

    2014-05-01

    Genomes of the plant-pathogenic genus Phytophthora are characterized by small duplicated blocks consisting of two consecutive genes (2HOM blocks) and by an elevated abundance of similarly aged gene duplicates. Both properties, in particular the presence of 2HOM blocks, have been attributed to a whole-genome duplication (WGD) at the last common ancestor of Phytophthora. However, large intraspecies synteny-compelling evidence for a WGD-has not been detected. Here, we revisited the WGD hypothesis by deducing the age of 2HOM blocks. Two independent timing methods reveal that the majority of 2HOM blocks arose after divergence of the Phytophthora lineages. In addition, a large proportion of the 2HOM block copies colocalize on the same scaffold. Therefore, the presence of 2HOM blocks does not support a WGD at the last common ancestor of Phytophthora. Thus, genome evolution of Phytophthora is likely driven by alternative mechanisms, such as bursts of transposon activity.

  9. Draft whole genome sequence of the cyanide-degrading bacterium Pseudomonas pseudoalcaligenes CECT5344.

    Science.gov (United States)

    Luque-Almagro, Víctor M; Acera, Felipe; Igeño, Ma Isabel; Wibberg, Daniel; Roldán, Ma Dolores; Sáez, Lara P; Hennig, Magdalena; Quesada, Alberto; Huertas, Ma José; Blom, Jochen; Merchán, Faustino; Escribano, Ma Paz; Jaenicke, Sebastian; Estepa, Jessica; Guijo, Ma Isabel; Martínez-Luque, Manuel; Macías, Daniel; Szczepanowski, Rafael; Becerra, Gracia; Ramirez, Silvia; Carmona, Ma Isabel; Gutiérrez, Oscar; Manso, Isabel; Pühler, Alfred; Castillo, Francisco; Moreno-Vivián, Conrado; Schlüter, Andreas; Blasco, Rafael

    2013-01-01

    Pseudomonas pseudoalcaligenes CECT5344 is a Gram-negative bacterium able to tolerate cyanide and to use it as the sole nitrogen source. We report here the first draft of the whole genome sequence of a P. pseudoalcaligenes strain that assimilates cyanide. Three aspects are specially emphasized in this manuscript. First, some generalities of the genome are shown and discussed in the context of other Pseudomonadaceae genomes, including genome size, G + C content, core genome and singletons among other features. Second, the genome is analysed in the context of cyanide metabolism, describing genes probably involved in cyanide assimilation, like those encoding nitrilases, and genes related to cyanide resistance, like the cio genes encoding the cyanide insensitive oxidases. Finally, the presence of genes probably involved in other processes with a great biotechnological potential like production of bioplastics and biodegradation of pollutants also is discussed.

  10. Whole-genome sequencing of a malignant granular cell tumor with metabolic response to pazopanib

    Science.gov (United States)

    Wei, Lei; Liu, Song; Conroy, Jeffrey; Wang, Jianmin; Papanicolau-Sengos, Antonios; Glenn, Sean T.; Murakami, Mitsuko; Liu, Lu; Hu, Qiang; Conroy, Jacob; Miles, Kiersten Marie; Nowak, David E.; Liu, Biao; Qin, Maochun; Bshara, Wiam; Omilian, Angela R.; Head, Karen; Bianchi, Michael; Burgher, Blake; Darlak, Christopher; Kane, John; Merzianu, Mihai; Cheney, Richard; Fabiano, Andrew; Salerno, Kilian; Talati, Chetasi; Khushalani, Nikhil I.; Trump, Donald L.; Johnson, Candace S.; Morrison, Carl D.

    2015-01-01

    Granular cell tumors are an uncommon soft tissue neoplasm. Malignant granular cell tumors comprise T transitions, particularly when immediately preceded by a 5′ G. A loss-of-function mutation was detected in a newly recognized tumor suppressor candidate, BRD7. No mutations were found in known targets of pazopanib. However, we identified a receptor tyrosine kinase pathway mutation in GFRA2 that warrants further evaluation. To the best of our knowledge, this is only the second reported case of a malignant granular cell tumor exhibiting a response to pazopanib, and the first whole-genome sequencing of this uncommon tumor type. The findings provide insight into the genetic basis of malignant granular cell tumors and identify potential targets for further investigation. PMID:27148567

  11. Evaluating potential for whole-genome studies in Kosrae, an isolated population in Micronesia.

    Science.gov (United States)

    Bonnen, Penelope E; Pe'er, Itsik; Plenge, Robert M; Salit, Jackie; Lowe, Jennifer K; Shapero, Michael H; Lifton, Richard P; Breslow, Jan L; Daly, Mark J; Reich, David E; Jones, Keith W; Stoffel, Markus; Altshuler, David; Friedman, Jeffrey M

    2006-02-01

    Whole-genome association studies are predicted to be especially powerful in isolated populations owing to increased linkage disequilibrium (LD) and decreased allelic diversity, but this possibility has not been empirically tested. We compared genome-wide data on 113,240 SNPs typed on 30 trios from the Pacific island of Kosrae to the same markers typed in the 270 samples from the International HapMap Project. The extent of LD is longer and haplotype diversity is lower in Kosrae than in the HapMap populations. More than 98% of Kosraen haplotypes are present in HapMap populations, indicating that HapMap will be useful for genetic studies on Kosrae. The long-range LD around common alleles and limited diversity result in improved efficiency in genetic studies in this population and augments the power to detect association of 'hidden SNPs'.

  12. Tolerance of Whole-Genome Doubling Propagates Chromosomal Instability and Accelerates Cancer Genome Evolution

    DEFF Research Database (Denmark)

    Dewhurst, Sally M.; McGranahan, Nicholas; Burrell, Rebecca A.;

    2014-01-01

    The contribution of whole-genome doubling to chromosomal instability (CIN) and tumor evolution is unclear. We use long-term culture of isogenic tetraploid cells from a stable diploid colon cancer progenitor to investigate how a genome-doubling event affects genome stability over time. Rare cells...... that survive genome doubling demonstrate increased tolerance to chromosome aberrations. Tetraploid cells do not exhibit increased frequencies of structural or numerical CIN per chromosome. However, the tolerant phenotype in tetraploid cells, coupled with a doubling of chromosome aberrations per cell, allows...... chromosome abnormalities to evolve specifically in tetraploids, recapitulating chromosomal changes in genomically complex colorectal tumors. Finally, a genome-doubling event is independently predictive of poor relapse-free survival in early-stage disease in two independent cohorts in multivariate analyses...

  13. Whole-genome linkage analysis in mapping alcoholism genes using single-nucleotide polymorphisms and microsatellites.

    Science.gov (United States)

    Wang, Shuang; Huang, Song; Liu, Nianjun; Chen, Liang; Oh, Cheongeun; Zhao, Hongyu

    2005-12-30

    There is currently a great interest in using single-nucleotide polymorphisms (SNPs) in genetic linkage and association studies because of the abundance of SNPs as well as the availability of high-throughput genotyping technologies. In this study, we compared the performance of whole-genome scans using SNPs with microsatellites on 143 pedigrees from the Collaborative Studies on Genetics of Alcoholism provided by Genetic Analysis Workshop 14. A total of 315 microsatellites and 10,081 SNPs from Affymetrix on 22 autosomal chromosomes were used in our analyses. We found that the results from the two scans had good overall concordance. One region on chromosome 2 and two regions on chromosome 7 showed significant linkage signals (i.e., NPL >or= 2) for alcoholism from both the SNP and microsatellite scans. The different results observed between the two scans may be explained by the difference observed in information content between the SNPs and the microsatellites.

  14. Clinical Decision Support for Whole Genome Sequence Information Leveraging a Service-Oriented Architecture: a Prototype

    Science.gov (United States)

    Welch, Brandon M.; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

    2014-01-01

    Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time. PMID:25954430

  15. Overview of HBV whole genome data in public repositories and the Chinese HBV reference sequences

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The number of Hepatitis B virus (HBV) whole genomic sequences in public nucleotide databases (GenBank, EMBL, and DDBJ) had reached 866 by January 1, 2007. Coming from 46 countries and regions, these sequences were categorized as eight genotypes (A-H). With the statistical and phylogenetic analysis on all available complete genomic data of HBV, we here present an overview of HBV sequences in public databases. From all registered 229 HBV genomes in Chinese regions as well as 59 sequencing data from our research group, we report the establishment of reference sequences of HBV strains prevailing in China. These analyses provide clues for the effects of HBV genotypes in host clinical progressions, geographic distribution of the infection, and the viral evolutionary history. Moreover, the viral sequence reference would be helpful in the identification of various HBV mutations. Based on the analysis of various public databases,we suggest that the Chinese HBV database with the clinical information should be constructed.

  16. Clostridium botulinum Group II Isolate Phylogenomic Profiling Using Whole-Genome Sequence Data.

    Science.gov (United States)

    Weedmark, K A; Mabon, P; Hayden, K L; Lambert, D; Van Domselaar, G; Austin, J W; Corbett, C R

    2015-09-01

    Clostridium botulinum group II isolates (n = 163) from different geographic regions, outbreaks, and neurotoxin types and subtypes were characterized in silico using whole-genome sequence data. Two clusters representing a variety of botulinum neurotoxin (BoNT) types and subtypes were identified by multilocus sequence typing (MLST) and core single nucleotide polymorphism (SNP) analysis. While one cluster included BoNT/B4/F6/E9 and nontoxigenic members, the other comprised a wide variety of different BoNT/E subtype isolates and a nontoxigenic strain. In silico MLST and core SNP methods were consistent in terms of clade-level isolate classification; however, core SNP analysis showed higher resolution capability. Furthermore, core SNP analysis correctly distinguished isolates by outbreak and location. This study illustrated the utility of next-generation sequence-based typing approaches for isolate characterization and source attribution and identified discrete SNP loci and MLST alleles for isolate comparison.

  17. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation

    Directory of Open Access Journals (Sweden)

    C. Sharma

    2016-09-01

    Full Text Available Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris strains from four Indian hospitals were highly related, suggesting clonal transmission. Further, all C. auris isolates originated from cases of fungaemia and were resistant to fluconazole (MIC >64 mg/L.

  18. CCor: A whole genome network-based similarity measure between two genes.

    Science.gov (United States)

    Hu, Yiming; Zhao, Hongyu

    2016-12-01

    Measuring the similarity between genes is often the starting point for building gene regulatory networks. Most similarity measures used in practice only consider pairwise information with a few also consider network structure. Although theoretical properties of pairwise measures are well understood in the statistics literature, little is known about their statistical properties of those similarity measures based on network structure. In this article, we consider a new whole genome network-based similarity measure, called CCor, that makes use of information of all the genes in the network. We derive a concentration inequality of CCor and compare it with the commonly used Pearson correlation coefficient for inferring network modules. Both theoretical analysis and real data example demonstrate the advantages of CCor over existing measures for inferring gene modules.

  19. Rapid whole genome sequencing for the detection and characterization of microorganisms directly from clinical samples

    DEFF Research Database (Denmark)

    Hasman, Henrik; Saputra, Dhany; Sicheritz-Pontén, Thomas;

    2014-01-01

    Whole genome sequencing (WGS) is becoming available as a routine tool for clinical microbiology. If applied directly on clinical samples this could further reduce diagnostic time and thereby improve control and treatment. A major bottle-neck is the availability of fast and reliable bioinformatics...... information and drastically reduce diagnostic time. This may prove very useful, but the need for data analysis is still a hurdle to clinical implementation. To overcome this problem a publicly available bioinformatics tool was developed in this study....... tools. This study was conducted to evaluate the applicability of WGS directly on clinical samples and to develop easy-to-use bioinformatics tools for analysis of the sequencing data. Thirty-five random urine samples from patients with suspected urinary tract infections were examined using conventional...

  20. Use of Whole Genome Sequencing and Patient Interviews To Link a Case of Sporadic Listeriosis to Consumption of Prepackaged Lettuce.

    Science.gov (United States)

    Jackson, K A; Stroika, S; Katz, L S; Beal, J; Brandt, E; Nadon, C; Reimer, A; Major, B; Conrad, A; Tarr, C; Jackson, B R; Mody, R K

    2016-05-01

    We report on a case of listeriosis in a patient who probably consumed a prepackaged romaine lettuce-containing product recalled for Listeria monocytogenes contamination. Although definitive epidemiological information demonstrating exposure to the specific recalled product was lacking, the patient reported consumption of a prepackaged romaine lettuce-containing product of either the recalled brand or a different brand. A multinational investigation found that patient and food isolates from the recalled product were indistinguishable by pulsed-field gel electrophoresis and were highly related by whole genome sequencing, differing by four alleles by whole genome multilocus sequence typing and by five high-quality single nucleotide polymorphisms, suggesting a common source. To our knowledge, this is the first time prepackaged lettuce has been identified as a likely source for listeriosis. This investigation highlights the power of whole genome sequencing, as well as the continued need for timely and thorough epidemiological exposure data to identify sources of foodborne infections.

  1. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    Energy Technology Data Exchange (ETDEWEB)

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  2. Phylogenetics and differentiation of Salmonella Newport lineages by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Guojie Cao

    Full Text Available Salmonella Newport has ranked in the top three Salmonella serotypes associated with foodborne outbreaks from 1995 to 2011 in the United States. In the current study, we selected 26 S. Newport strains isolated from diverse sources and geographic locations and then conducted 454 shotgun pyrosequencing procedures to obtain 16-24 × coverage of high quality draft genomes for each strain. Comparative genomic analysis of 28 S. Newport strains (including 2 reference genomes and 15 outgroup genomes identified more than 140,000 informative SNPs. A resulting phylogenetic tree consisted of four sublineages and indicated that S. Newport had a clear geographic structure. Strains from Asia were divergent from those from the Americas. Our findings demonstrated that analysis using whole genome sequencing data resulted in a more accurate picture of phylogeny compared to that using single genes or small sets of genes. We selected loci around the mutS gene of S. Newport to differentiate distinct lineages, including those between invH and mutS genes at the 3' end of Salmonella Pathogenicity Island 1 (SPI-1, ste fimbrial operon, and Clustered, Regularly Interspaced, Short Palindromic Repeats (CRISPR associated-proteins (cas. These genes in the outgroup genomes held high similarity with either S. Newport Lineage II or III at the same loci. S. Newport Lineages II and III have different evolutionary histories in this region and our data demonstrated genetic flow and homologous recombination events around mutS. The findings suggested that S. Newport Lineages II and III diverged early in the serotype evolution and have evolved largely independently. Moreover, we identified genes that could delineate sublineages within the phylogenetic tree and that could be used as potential biomarkers for trace-back investigations during outbreaks. Thus, whole genome sequencing data enabled us to better understand the genetic background of pathogenicity and evolutionary history of S

  3. Light whole genome sequence for SNP discovery across domestic cat breeds

    Directory of Open Access Journals (Sweden)

    Driscoll Carlos

    2010-06-01

    Full Text Available Abstract Background The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus -- FeLV, feline coronavirus -- FECV, feline immunodeficiency virus - FIV that are homologues to human scourges (cancer, SARS, and AIDS respectively. However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP map is required in order to accomplish disease and phenotype association discovery. Description To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. Conclusions These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.

  4. Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs)

    Science.gov (United States)

    Sims, Gregory E.; Kim, Sung-Hou

    2011-01-01

    A whole-genome phylogeny of the Escherichia coli/Shigella group was constructed by using the feature frequency profile (FFP) method. This alignment-free approach uses the frequencies of l-mer features of whole genomes to infer phylogenic distances. We present two phylogenies that accentuate different aspects of E. coli/Shigella genomic evolution: (i) one based on the compositions of all possible features of length l = 24 (∼8.4 million features), which are likely to reveal the phenetic grouping and relationship among the organisms and (ii) the other based on the compositions of core features with low frequency and low variability (∼0.56 million features), which account for ∼69% of all commonly shared features among 38 taxa examined and are likely to have genome-wide lineal evolutionary signal. Shigella appears as a single clade when all possible features are used without filtering of noncore features. However, results using core features show that Shigella consists of at least two distantly related subclades, implying that the subclades evolved into a single clade because of a high degree of convergence influenced by mobile genetic elements and niche adaptation. In both FFP trees, the basal group of the E. coli/Shigella phylogeny is the B2 phylogroup, which contains primarily uropathogenic strains, suggesting that the E. coli/Shigella ancestor was likely a facultative or opportunistic pathogen. The extant commensal strains diverged relatively late and appear to be the result of reductive evolution of genomes. We also identify clade distinguishing features and their associated genomic regions within each phylogroup. Such features may provide useful information for understanding evolution of the groups and for quick diagnostic identification of each phylogroup. PMID:21536867

  5. A model for carbohydrate metabolism in the diatom Phaeodactylum tricornutum deduced from comparative whole genome analysis.

    Directory of Open Access Journals (Sweden)

    Peter G Kroth

    Full Text Available BACKGROUND: Diatoms are unicellular algae responsible for approximately 20% of global carbon fixation. Their evolution by secondary endocytobiosis resulted in a complex cellular structure and metabolism compared to algae with primary plastids. METHODOLOGY/PRINCIPAL FINDINGS: The whole genome sequence of the diatom Phaeodactylum tricornutum has recently been completed. We identified and annotated genes for enzymes involved in carbohydrate pathways based on extensive EST support and comparison to the whole genome sequence of a second diatom, Thalassiosira pseudonana. Protein localization to mitochondria was predicted based on identified similarities to mitochondrial localization motifs in other eukaryotes, whereas protein localization to plastids was based on the presence of signal peptide motifs in combination with plastid localization motifs previously shown to be required in diatoms. We identified genes potentially involved in a C4-like photosynthesis in P. tricornutum and, on the basis of sequence-based putative localization of relevant proteins, discuss possible differences in carbon concentrating mechanisms and CO(2 fixation between the two diatoms. We also identified genes encoding enzymes involved in photorespiration with one interesting exception: glycerate kinase was not found in either P. tricornutum or T. pseudonana. Various Calvin cycle enzymes were found in up to five different isoforms, distributed between plastids, mitochondria and the cytosol. Diatoms store energy either as lipids or as chrysolaminaran (a beta-1,3-glucan outside of the plastids. We identified various beta-glucanases and large membrane-bound glucan synthases. Interestingly most of the glucanases appear to contain C-terminal anchor domains that may attach the enzymes to membranes. CONCLUSIONS/SIGNIFICANCE: Here we present a detailed synthesis of carbohydrate metabolism in diatoms based on the genome sequences of Thalassiosira pseudonana and Phaeodactylum tricornutum

  6. [Pathological Diagnoses and Whole-genome Sequence Analyses of the Jaagsiekte Sheep Retrovirus in Xinjiang, China].

    Science.gov (United States)

    Yang, Sufang; Liang, Tian; Zhao, Qingliang; Zhang, Dianqing; Si Junqiang; Zhang, Jing; Yang, Xia; Sheng, Jinliang

    2015-05-01

    To carry out pathologic diagnoses and whole-genome sequence analyses of the Jaagsiekte sheep retrovirus (JSRV) in Xinjiang, China, we first observed sheep suspected to have the JSRV. Then, the extracted virus suspension was observed by transmission electron microscopy (TEM). Total RNAs from lungs of JSRV-infected sheep were extracted and reverse-transcribed using a cDNA synthesis kit. Six pairs of primers were designed according to the exogenous reference virus strain (AF105220). Reverse transcription-polymerase chain reaction was carried out from JSRV-infected tissue, and the whole genome of the JSRV sequenced. Our results showed: flow of nasal fluid ("wheelbarrow test"); different sizes of adenoma lesions in the lungs; papillary hyperplasia of alveolar epithelial cells; alveolar cavity filled with macrophages; dissolute nuclei in central lesions. TEM revealed JSRV particles with a diameter of 88 nm to 125. 4 nm. The full-length of the viral genome sequence was 7456 bp. BLAST analyses showed nucleotide homology of 96% and 95% compared with that of the representative strain from the USA (AF105220) and UK (AF357971). Nucleotide homology was 89.8% and 89.9% compared with the endogenous Jaagsiekte sheep retrovirus, Inner Mongolia strain (DQ838493) and USA strain (EF680300). The specific pathogenic amino-acid sequence "YXXM" was found in the TM district, similar to the exogenous JSRV: this gene has been reported to be oncogenic. This is the first report of the complete genomic sequence of the exogenous JSRV from Xinjiang, and could lay the foundation for study of the biological characteristics and pathogenic mechanisms of the pulmonary adenomatosis virus in sheep.

  7. Comparison of whole genome sequences from human and non-human Escherichia coli O26 strains

    Directory of Open Access Journals (Sweden)

    Keri N Norman

    2015-03-01

    Full Text Available Shiga toxin-producing Escherichia coli (STEC O26 is the second leading E. coli serogroup responsible for human illness outbreaks behind E. coli O157:H7. Recent outbreaks have been linked to emerging pathogenic O26:H11 strains harboring stx2 only. Cattle have been recognized as an important reservoir of O26 strains harboring stx1; however the reservoir of these emerging stx2 strains is unknown. The objective of this study was to identify nucleotide polymorphisms in human and cattle-derived strains in order to compare differences in polymorphism derived genotypes and virulence gene profiles between the two host species. Whole genome sequencing was performed on 182 epidemiologically unrelated O26 strains, including 109 human-derived strains and 73 non-human-derived strains. A panel of 289 O26 strains (241 STEC and 48 non-STEC was subsequently genotyped using a set of 283 polymorphisms identified by whole genome sequencing, resulting in 64 unique genotypes. Phylogenetic analyses identified seven clusters within the O26 strains. The seven clusters did not distinguish between isolates originating from humans or cattle; however, clusters did correspond with particular virulence gene profiles. Human and non-human-derived strains harboring stx1 clustered separately from strains harboring stx2, strains harboring eae, and non-STEC strains. Strains harboring stx2 were more closely related to non-STEC strains and strains harboring eae than to strains harboring stx1. The finding of human and cattle-derived strains with the same polymorphism derived genotypes and similar virulence gene profiles, provides evidence that similar strains are found in cattle and humans and transmission between the two species may occur.

  8. Whole genome investigation of a divergent clade of the pathogen Streptococcus suis

    Directory of Open Access Journals (Sweden)

    Abiyad eBaig

    2015-11-01

    Full Text Available Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN and cpn60 did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70, of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species.

  9. A first generation whole genome RH map of the river buffalo with comparison to domestic cattle

    Directory of Open Access Journals (Sweden)

    Tantia Madhu S

    2008-12-01

    Full Text Available Abstract Background The recently constructed river buffalo whole-genome radiation hybrid panel (BBURH5000 has already been used to generate preliminary radiation hybrid (RH maps for several chromosomes, and buffalo-bovine comparative chromosome maps have been constructed. Here, we present the first-generation whole genome RH map (WG-RH of the river buffalo generated from cattle-derived markers. The RH maps aligned to bovine genome sequence assembly Btau_4.0, providing valuable comparative mapping information for both species. Results A total of 3990 markers were typed on the BBURH5000 panel, of which 3072 were cattle derived SNPs. The remaining 918 were classified as cattle sequence tagged site (STS, including coding genes, ESTs, and microsatellites. Average retention frequency per chromosome was 27.3% calculated with 3093 scorable markers distributed in 43 linkage groups covering all autosomes (24 and the X chromosomes at a LOD ≥ 8. The estimated total length of the WG-RH map is 36,933 cR5000. Fewer than 15% of the markers (472 could not be placed within any linkage group at a LOD score ≥ 8. Linkage group order for each chromosome was determined by incorporation of markers previously assigned by FISH and by alignment with the bovine genome sequence assembly (Btau_4.0. Conclusion We obtained radiation hybrid chromosome maps for the entire river buffalo genome based on cattle-derived markers. The alignments of our RH maps to the current bovine genome sequence assembly (Btau_4.0 indicate regions of possible rearrangements between the chromosomes of both species. The river buffalo represents an important agricultural species whose genetic improvement has lagged behind other species due to limited prior genomic characterization. We present the first-generation RH map which provides a more extensive resource for positional candidate cloning of genes associated with complex traits and also for large-scale physical mapping of the river buffalo

  10. The genome BLASTatlas-a GeneWiz extension for visualization of whole-genome homology.

    Science.gov (United States)

    Hallin, Peter F; Binnewies, Tim T; Ussery, David W

    2008-05-01

    The development of fast and inexpensive methods for sequencing bacterial genomes has led to a wealth of data, often with many genomes being sequenced of the same species or closely related organisms. Thus, there is a need for visualization methods that will allow easy comparison of many sequenced genomes to a defined reference strain. The BLASTatlas is one such tool that is useful for mapping and visualizing whole genome homology of genes and proteins within a reference strain compared to other strains or species of one or more prokaryotic organisms. We provide examples of BLASTatlases, including the Clostridium tetani plasmid p88, where homologues for toxin genes can be easily visualized in other sequenced Clostridium genomes, and for a Clostridium botulinum genome, compared to 14 other Clostridium genomes. DNA structural information is also included in the atlas to visualize the DNA chromosomal context of regions. Additional information can be added to these plots, and as an example we have added circles showing the probability of the DNA helix opening up under superhelical tension. The tool is SOAP compliant and WSDL (web services description language) files are located on our website: (http://www.cbs.dtu.dk/ws/BLASTatlas), where programming examples are available in Perl. By providing an interoperable method to carry out whole genome visualization of homology, this service offers bioinformaticians as well as biologists an easy-to-adopt workflow that can be directly called from the programming language of the user, hence enabling automation of repeated tasks. This tool can be relevant in many pangenomic as well as in metagenomic studies, by giving a quick overview of clusters of insertion sites, genomic islands and overall homology between a reference sequence and a data set.

  11. Comparative Whole-Genome Mapping To Determine Staphylococcus aureus Genome Size, Virulence Motifs, and Clonality

    Science.gov (United States)

    Pantrang, Madhulatha; Stahl, Buffy; Briska, Adam M.; Stemper, Mary E.; Wagner, Trevor K.; Zentz, Emily B.; Callister, Steven M.; Lovrich, Steven D.; Henkhaus, John K.; Dykes, Colin W.

    2012-01-01

    Despite being a clonal pathogen, Staphylococcus aureus continues to acquire virulence and antibiotic-resistant genes located on mobile genetic elements such as genomic islands, prophages, pathogenicity islands, and the staphylococcal chromosomal cassette mec (SCCmec) by horizontal gene transfer from other staphylococci. The potential virulence of a S. aureus strain is often determined by comparing its pulsed-field gel electrophoresis (PFGE) or multilocus sequence typing profiles to that of known epidemic or virulent clones and by PCR of the toxin genes. Whole-genome mapping (formerly optical mapping), which is a high-resolution ordered restriction mapping of a bacterial genome, is a relatively new genomic tool that allows comparative analysis across entire bacterial genomes to identify regions of genomic similarities and dissimilarities, including small and large insertions and deletions. We explored whether whole-genome maps (WGMs) of methicillin-resistant S. aureus (MRSA) could be used to predict the presence of methicillin resistance, SCCmec type, and Panton-Valentine leukocidin (PVL)-producing genes on an S. aureus genome. We determined the WGMs of 47 diverse clinical isolates of S. aureus, including well-characterized reference MRSA strains, and annotated the signature restriction pattern in SCCmec types, arginine catabolic mobile element (ACME), and PVL-carrying prophage, PhiSa2 or PhiSa2-like regions on the genome. WGMs of these isolates accurately characterized them as MRSA or methicillin-sensitive S. aureus based on the presence or absence of the SCCmec motif, ACME and the unique signature pattern for the prophage insertion that harbored the PVL genes. Susceptibility to methicillin resistance and the presence of mecA, SCCmec types, and PVL genes were confirmed by PCR. A WGM clustering approach was further able to discriminate isolates within the same PFGE clonal group. These results showed that WGMs could be used not only to genotype S. aureus but also to

  12. Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes

    Directory of Open Access Journals (Sweden)

    Feltus F

    2011-04-01

    Full Text Available Abstract Background We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. Results The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size reads (15L-5P on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. Conclusions BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.

  13. Tumor Touch Imprints as Source for Whole Genome Analysis of Neuroblastoma Tumors

    Science.gov (United States)

    Brunner, Clemens; Brunner-Herglotz, Bettina; Ziegler, Andrea; Frech, Christian; Amann, Gabriele; Ladenstein, Ruth; Ambros, Inge M.; Ambros, Peter F.

    2016-01-01

    Introduction Tumor touch imprints (TTIs) are routinely used for the molecular diagnosis of neuroblastomas by interphase fluorescence in-situ hybridization (I-FISH). However, in order to facilitate a comprehensive, up-to-date molecular diagnosis of neuroblastomas and to identify new markers to refine risk and therapy stratification methods, whole genome approaches are needed. We examined the applicability of an ultra-high density SNP array platform that identifies copy number changes of varying sizes down to a few exons for the detection of genomic changes in tumor DNA extracted from TTIs. Material and Methods DNAs were extracted from TTIs of 46 neuroblastoma and 4 other pediatric tumors. The DNAs were analyzed on the Cytoscan HD SNP array platform to evaluate numerical and structural genomic aberrations. The quality of the data obtained from TTIs was compared to that from randomly chosen fresh or fresh frozen solid tumors (n = 212) and I-FISH validation was performed. Results SNP array profiles were obtained from 48 (out of 50) TTI DNAs of which 47 showed genomic aberrations. The high marker density allowed for single gene analysis, e.g. loss of nine exons in the ATRX gene and the visualization of chromothripsis. Data quality was comparable to fresh or fresh frozen tumor SNP profiles. SNP array results were confirmed by I-FISH. Conclusion TTIs are an excellent source for SNP array processing with the advantage of simple handling, distribution and storage of tumor tissue on glass slides. The minimal amount of tumor tissue needed to analyze whole genomes makes TTIs an economic surrogate source in the molecular diagnostic work up of tumor samples. PMID:27560999

  14. Direct DNA Extraction from Mycobacterium tuberculosis Frozen Stocks as a Reculture-Independent Approach to Whole-Genome Sequencing

    DEFF Research Database (Denmark)

    Bjorn-Mortensen, K; Zallet, J; Lillebaek, T

    2015-01-01

    Culturing before DNA extraction represents a major time-consuming step in whole-genome sequencing of slow-growing bacteria, such as Mycobacterium tuberculosis. We report a workflow to extract DNA from frozen isolates without reculturing. Prepared libraries and sequence data were comparable...

  15. Whole-Genome Sequence of Pseudomonas graminis Strain UASWS1507, a Potential Biological Control Agent and Biofertilizer Isolated in Switzerland

    Science.gov (United States)

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien; Schulz, Torsten

    2016-01-01

    We report here the whole-genome shotgun sequence of the strain UASWS1507 of the species Pseudomonas graminis, isolated in Switzerland from an apple tree. This is the first genome registered for this species, which is considered as a potential and valuable resource of biological control agents and biofertilizers for agriculture.

  16. Draft Whole-Genome Sequence of a Haemophilus quentini Strain Isolated from an Infant in the United Kingdom

    Science.gov (United States)

    Baxter, Laura; Thompson, Sarah; Collery, Mark M.; Hand, Daniel C.; Fink, Colin G.

    2016-01-01

    Haemophilus quentini is a rare and distinct genospecies of Haemophilus that has been suggested as a cause of neonatal bacteremia and urinary tract infections in men. We present the draft whole-genome sequence of H. quentini MP1 isolated from an infant in the United Kingdom, aiding future identification and detection of this pathogen.

  17. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections.

    Science.gov (United States)

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Nielsen, Mette T; Rosenqvist Lund, Birthe S; Ameh, James A; Ambali, Abdul G; Sørensen, Gitte; Le Hello, Simon; Aarestrup, Frank M; Hendriksen, Rene S

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections.

  18. Whole-genome amplified DNA from stored dried blood spots is reliable in high resolution melting curve and sequencing analysis

    DEFF Research Database (Denmark)

    Winkel, Bo G; Hollegaard, Mads Vilhelm; Olesen, Morten S;

    2011-01-01

    The use of dried blood spots (DBS) samples in genomic workup has been limited by the relative low amounts of genomic DNA (gDNA) they contain. It remains to be proven that whole genome amplified DNA (wgaDNA) from stored DBS samples, constitutes a reliable alternative to gDNA.We wanted to compare m...

  19. High-Quality Draft Whole-Genome Sequences of Three Strains of Enterobacter Isolated from Jamaican Dioscorea cayenensis (Yellow Yam)

    OpenAIRE

    Gan, Han Ming; Triassi, Alexander J.; Wheatley, Matthew S.; Savka, Michael A.; Hudson, André O.

    2014-01-01

    Here we report the whole-genome sequences of three endophytic bacteria, Enterobacter sp. strain DC1, Enterobacter sp. strain DC3, and Enterobacter sp. strain DC4, from root tubers of the yellow yam plant, Dioscorea cayenensis. Preliminary analyses suggest that the genomes of the three bacteria contain genes involved in acetoin and indole-3-acetic acid metabolism.

  20. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections

    Science.gov (United States)

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Nielsen, Mette T.; Rosenqvist Lund, Birthe S.; Ameh, James A.; Ambali, Abdul G.; Sørensen, Gitte; Le Hello, Simon; Aarestrup, Frank M.; Hendriksen, Rene S.

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections. PMID:27228329

  1. A phylogenetic strategy based on a legume-specific whole genome duplication yields symbiotic cytokinin type-A Response Regulators

    NARCIS (Netherlands)

    Camp, Op den R.; Mita, De S.; Lillo, A.; Cao, Q.; Limpens, E.H.M.; Bisseling, T.; Geurts, R.

    2011-01-01

    Legumes host their rhizobium symbiont in novel root organs, called nodules. Nodules originate from differentiated root cortical cells that de-differentiate and subsequently form nodule primordia, a process controlled by cytokinin. A whole genome duplication (WGD) has occurred at the root of the legu

  2. Whole-genome sequences of 13 endophytic bacteria isolated from shrub willow (salix) grown in geneva, new york.

    Science.gov (United States)

    Gan, Huan You; Gan, Han Ming; Savka, Michael A; Triassi, Alexander J; Wheatley, Matthew S; Smart, Lawrence B; Fabio, Eric S; Hudson, André O

    2014-05-08

    Shrub willow, Salix spp. and hybrids, is an important bioenergy crop. Here we report the whole-genome sequences and annotation of 13 endophytic bacteria from stem tissues of Salix purpurea grown in nature and from commercial cultivars and Salix viminalis × Salix miyabeana grown in bioenergy fields in Geneva, New York.

  3. Selection of Unique Escherichia coli Clones by Random Amplified Polymorphic DNA (RAPD): Evaluation by Whole Genome Sequencing

    Science.gov (United States)

    Nielsen, Karen L.; Godfrey, Paul A.; Stegger, Marc; Andersen, Paal S.; Feldgarden, Michael; Frimodt-Møller, Niels

    2014-01-01

    Identifying and characterizing clonal diversity is important when analysing fecal flora. We evaluated random amplified polymorphic DNA (RAPD) PCR, applied for selection of Escherichia coli isolates, by whole genome sequencing. RAPD was fast, and reproducible as screening method for selection of distinct E. coli clones in fecal swabs. PMID:24912108

  4. Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

    NARCIS (Netherlands)

    Pandit, Aridaman; de Boer, Rob J

    2014-01-01

    BACKGROUND: Following transmission, HIV-1 evolves into a diverse population, and next generation sequencing enables us to detect variants occurring at low frequencies. Studying viral evolution at the level of whole genomes was hitherto not possible because next generation sequencing delivers relativ

  5. Real-time whole-genome sequencing for routine typing, surveillance, and outbreak detection of verotoxigenic Escherichia coli

    DEFF Research Database (Denmark)

    Joensen, Katrine Grimstrup; Scheutz, Flemming; Lund, Ole;

    2014-01-01

    Fast and accurate identification and typing of pathogens are essential for effective surveillance and outbreak detection. The current routine procedure is based on a variety of techniques, making the procedure laborious, time-consuming, and expensive. With whole-genome sequencing (WGS) becoming c...

  6. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections.

    Directory of Open Access Journals (Sweden)

    Pimlapas Leekitcharoenphon

    Full Text Available Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections.

  7. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Thorup Nielsen, Mette

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely...

  8. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang;

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...

  9. Whole-genome pyrosequencing of an epidemic multidrug-resistant Acinetobacter baumannii strain belonging to the European clone II group

    DEFF Research Database (Denmark)

    Iacono, M.; Villa, L.; Fortini, D.

    2008-01-01

    The whole-genome sequence of an epidemic, multidrug-resistant Acinetobacter baumannii strain (strain ACICU) belonging to the European clone II group and carrying the plasmid-mediated bla(OXA-58) carbapenem resistance gene was determined. The A. baumannii ACICU genome was compared with the genomes...

  10. Comparing Whole-Genome Sequencing with Sanger Sequencing for spa Typing of Methicillin-Resistant Staphylococcus aureus

    DEFF Research Database (Denmark)

    Bartels, Mette Damkjaer; Petersen, Andreas; Worning, Peder;

    2014-01-01

    spa typing of methicillin-resistant Staphylococcus aureus (MRSA) has traditionally been done by PCR amplification and Sanger sequencing of the spa repeat region. At Hvidovre Hospital, Denmark, whole-genome sequencing (WGS) of all MRSA isolates has been performed routinely since January 2013...

  11. Bacterial whole genome-based phylogeny: construction of a new benchmarking dataset and assessment of some existing methods

    DEFF Research Database (Denmark)

    Ahrenfeldt, Johanne; Skaarup, Carina; Hasman, Henrik;

    2017-01-01

    , consensus whole-genome sequences, as well as descriptions of the known phylogeny in a variety of formats) publicly available, with the hope that other groups may find this data useful for benchmarking and exploring the performance of epidemiological methods. All data is freely available at: https://cge.cbs.dtu.dk/services/evolution_data.php....

  12. Whole-genome gene expression profiling of formalin-fixed, paraffin-embedded tissue samples.

    Directory of Open Access Journals (Sweden)

    Craig April

    Full Text Available BACKGROUND: We have developed a gene expression assay (Whole-Genome DASL, capable of generating whole-genome gene expression profiles from degraded samples such as formalin-fixed, paraffin-embedded (FFPE specimens. METHODOLOGY/PRINCIPAL FINDINGS: We demonstrated a similar level of sensitivity in gene detection between matched fresh-frozen (FF and FFPE samples, with the number and overlap of probes detected in the FFPE samples being approximately 88% and 95% of that in the corresponding FF samples, respectively; 74% of the differentially expressed probes overlapped between the FF and FFPE pairs. The WG-DASL assay is also able to detect 1.3-1.5 and 1.5-2 -fold changes in intact and FFPE samples, respectively. The dynamic range for the assay is approximately 3 logs. Comparing the WG-DASL assay with an in vitro transcription-based labeling method yielded fold-change correlations of R(2 approximately 0.83, while fold-change comparisons with quantitative RT-PCR assays yielded R(2 approximately 0.86 and R(2 approximately 0.55 for intact and FFPE samples, respectively. Additionally, the WG-DASL assay yielded high self-correlations (R(2>0.98 with low intact RNA inputs ranging from 1 ng to 100 ng; reproducible expression profiles were also obtained with 250 pg total RNA (R(2 approximately 0.92, with approximately 71% of the probes detected in 100 ng total RNA also detected at the 250 pg level. When FFPE samples were assayed, 1 ng total RNA yielded self-correlations of R(2 approximately 0.80, while still maintaining a correlation of R(2 approximately 0.75 with standard FFPE inputs (200 ng. CONCLUSIONS/SIGNIFICANCE: Taken together, these results show that WG-DASL assay provides a reliable platform for genome-wide expression profiling in archived materials. It also possesses utility within clinical settings where only limited quantities of samples may be available (e.g. microdissected material or when minimally invasive procedures are performed (e

  13. Whole genome sequencing of Saccharomyces cerevisiae: from genotype to phenotype for improved metabolic engineering applications

    Directory of Open Access Journals (Sweden)

    Asadollahi Mohammad A

    2010-12-01

    Full Text Available Abstract Background The need for rapid and efficient microbial cell factory design and construction are possible through the enabling technology, metabolic engineering, which is now being facilitated by systems biology approaches. Metabolic engineering is often complimented by directed evolution, where selective pressure is applied to a partially genetically engineered strain to confer a desirable phenotype. The exact genetic modification or resulting genotype that leads to the improved phenotype is often not identified or understood to enable further metabolic engineering. Results In this work we performed whole genome high-throughput sequencing and annotation can be used to identify single nucleotide polymorphisms (SNPs between Saccharomyces cerevisiae strains S288c and CEN.PK113-7D. The yeast strain S288c was the first eukaryote sequenced, serving as the reference genome for the Saccharomyces Genome Database, while CEN.PK113-7D is a preferred laboratory strain for industrial biotechnology research. A total of 13,787 high-quality SNPs were detected between both strains (reference strain: S288c. Considering only metabolic genes (782 of 5,596 annotated genes, a total of 219 metabolism specific SNPs are distributed across 158 metabolic genes, with 85 of the SNPs being nonsynonymous (e.g., encoding amino acid modifications. Amongst metabolic SNPs detected, there was pathway enrichment in the galactose uptake pathway (GAL1, GAL10 and ergosterol biosynthetic pathway (ERG8, ERG9. Physiological characterization confirmed a strong deficiency in galactose uptake and metabolism in S288c compared to CEN.PK113-7D, and similarly, ergosterol content in CEN.PK113-7D was significantly higher in both glucose and galactose supplemented cultivations compared to S288c. Furthermore, DNA microarray profiling of S288c and CEN.PK113-7D in both glucose and galactose batch cultures did not provide a clear hypothesis for major phenotypes observed, suggesting that

  14. The "most wanted" taxa from the human microbiome for whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Anthony A Fodor

    Full Text Available The goal of the Human Microbiome Project (HMP is to generate a comprehensive catalog of human-associated microorganisms including reference genomes representing the most common species. Toward this goal, the HMP has characterized the microbial communities at 18 body habitats in a cohort of over 200 healthy volunteers using 16S rRNA gene (16S sequencing and has generated nearly 1,000 reference genomes from human-associated microorganisms. To determine how well current reference genome collections capture the diversity observed among the healthy microbiome and to guide isolation and future sequencing of microbiome members, we compared the HMP's 16S data sets to several reference 16S collections to create a 'most wanted' list of taxa for sequencing. Our analysis revealed that the diversity of commonly occurring taxa within the HMP cohort microbiome is relatively modest, few novel taxa are represented by these OTUs and many common taxa among HMP volunteers recur across different populations of healthy humans. Taken together, these results suggest that it should be possible to perform whole-genome sequencing on a large fraction of the human microbiome, including the 'most wanted', and that these sequences should serve to support microbiome studies across multiple cohorts. Also, in stark contrast to other taxa, the 'most wanted' organisms are poorly represented among culture collections suggesting that novel culture- and single-cell-based methods will be required to isolate these organisms for sequencing.

  15. Whole genome sequence analysis of Cryptococcus gattii from the Pacific Northwest reveals unexpected diversity.

    Directory of Open Access Journals (Sweden)

    John D Gillece

    Full Text Available A recent emergence of Cryptococcus gattii in the Pacific Northwest involves strains that fall into three primarily clonal molecular subtypes: VGIIa, VGIIb and VGIIc. Multilocus sequence typing (MLST and variable number tandem repeat analysis appear to identify little diversity within these molecular subtypes. Given the apparent expansion of these subtypes into new geographic areas and their ability to cause disease in immunocompetent individuals, differentiation of isolates belonging to these subtypes could be very important from a public health perspective. We used whole genome sequence typing (WGST to perform fine-scale phylogenetic analysis on 20 C. gattii isolates, 18 of which are from the VGII molecular type largely responsible for the Pacific Northwest emergence. Analysis both including and excluding (289,586 SNPs and 56,845 SNPs, respectively molecular types VGI and VGIII isolates resulted in phylogenetic reconstructions consistent, for the most part, with MLST analysis but with far greater resolution among isolates. The WGST analysis presented here resulted in identification of over 100 SNPs among eight VGIIc isolates as well as unique genotypes for each of the VGIIa, VGIIb and VGIIc isolates. Similar levels of genetic diversity were found within each of the molecular subtype isolates, despite the fact that the VGIIb clade is thought to have emerged much earlier. The analysis presented here is the first multi-genome WGST study to focus on the C. gattii molecular subtypes involved in the Pacific Northwest emergence and describes the tools that will further our understanding of this emerging pathogen.

  16. Whole-genome amplification of single-cell genomes for next-generation sequencing.

    Science.gov (United States)

    Korfhage, Christian; Fisch, Evelyn; Fricke, Evelyn; Baedker, Silke; Loeffert, Dirk

    2013-10-11

    DNA sequence analysis and genotyping of biological samples using next-generation sequencing (NGS), microarrays, or real-time PCR is often limited by the small amount of sample available. A single cell contains only one to four copies of the genomic DNA, depending on the organism (haploid or diploid organism) and the cell-cycle phase. The DNA content of a single cell ranges from a few femtograms in bacteria to picograms in mammalia. In contrast, a deep analysis of the genome currently requires a few hundred nanograms up to micrograms of genomic DNA for library formation necessary for NGS sequencing or labeling protocols (e.g., microarrays). Consequently, accurate whole-genome amplification (WGA) of single-cell DNA is required for reliable genetic analysis (e.g., NGS) and is particularly important when genomic DNA is limited. The use of single-cell WGA has enabled the analysis of genomic heterogeneity of individual cells (e.g., somatic genomic variation in tumor cells). This unit describes how the genome of single cells can be used for WGA for further genomic studies, such as NGS. Recommendations for isolation of single cells are given and common sources of errors are discussed.

  17. Novel Altered Region for Biomarker Discovery in Hepatocellular Carcinoma (HCC Using Whole Genome SNP Array

    Directory of Open Access Journals (Sweden)

    Esraa M. Hashem

    2016-04-01

    Full Text Available cancer represents one of the greatest medical causes of mortality. The majority of Hepatocellular carcinoma arises from the accumulation of genetic abnormalities, and possibly induced by exterior etiological factors especially HCV and HBV infections. There is a need for new tools to analysis the large sum of data to present relevant genetic changes that may be critical for both understanding how cancers develop and determining how they could ultimately be treated. Gene expression profiling may lead to new biomarkers that may help develop diagnostic accuracy for detecting Hepatocellular carcinoma. In this work, statistical technique (discrete stationary wavelet transform for detection of copy number alternations to analysis high-density single-nucleotide polymorphism array of 30 cell lines on specific chromosomes, which are frequently detected in Hepatocellular carcinoma have been proposed. The results demonstrate the feasibility of whole-genome fine mapping of copy number alternations via high-density single-nucleotide polymorphism genotyping, Results revealed that a novel altered chromosomal region is discovered; region amplification (4q22.1 have been detected in 22 out of 30-Hepatocellular carcinoma cell lines (73%. This region strike, AFF1 and DSPP, tumor suppressor genes. This finding has not previously reported to be involved in liver carcinogenesis; it can be used to discover a new HCC biomarker, which helps in a better understanding of hepatocellular carcinoma.

  18. Inference of gorilla demographic and selective history from whole-genome sequence data.

    Science.gov (United States)

    McManus, Kimberly F; Kelley, Joanna L; Song, Shiya; Veeramah, Krishna R; Woerner, August E; Stevison, Laurie S; Ryder, Oliver A; Ape Genome Project, Great; Kidd, Jeffrey M; Wall, Jeffrey D; Bustamante, Carlos D; Hammer, Michael F

    2015-03-01

    Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection.

  19. Genomic view of bipolar disorder revealed by whole genome sequencing in a genetic isolate.

    Directory of Open Access Journals (Sweden)

    Benjamin Georgi

    2014-03-01

    Full Text Available Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders.

  20. Genome management and mismanagement--cell-level opportunities and challenges of whole-genome duplication.

    Science.gov (United States)

    Yant, Levi; Bomblies, Kirsten

    2015-12-01

    Whole-genome duplication (WGD) doubles the DNA content in the nucleus and leads to polyploidy. In whole-organism polyploids, WGD has been implicated in adaptability and the evolution of increased genome complexity, but polyploidy can also arise in somatic cells of otherwise diploid plants and animals, where it plays important roles in development and likely environmental responses. As with whole organisms, WGD can also promote adaptability and diversity in proliferating cell lineages, although whether WGD is beneficial is clearly context-dependent. WGD is also sometimes associated with aging and disease and may be a facilitator of dangerous genetic and karyotypic diversity in tumorigenesis. Scaling changes can affect cell physiology, but problems associated with WGD in large part seem to arise from problems with chromosome segregation in polyploid cells. Here we discuss both the adaptive potential and problems associated with WGD, focusing primarily on cellular effects. We see value in recognizing polyploidy as a key player in generating diversity in development and cell lineage evolution, with intriguing parallels across kingdoms.

  1. Molecular analysis of single oocyst of Eimeria by whole genome amplification (WGA) based nested PCR.

    Science.gov (United States)

    Wang, Yunzhou; Tao, Geru; Cui, Yujuan; Lv, Qiyao; Xie, Li; Li, Yuan; Suo, Xun; Qin, Yinghe; Xiao, Lihua; Liu, Xianyong

    2014-09-01

    PCR-based molecular tools are widely used for the identification and characterization of protozoa. Here we report the molecular analysis of Eimeria species using combined methods of whole genome amplification (WGA) and nested PCR. Single oocyst of Eimeria stiedai or Eimeriamedia was directly used for random amplification of the genomic DNA with either primer extension preamplification (PEP) or multiple displacement amplification (MDA), and then the WGA product was used as template in nested PCR with species-specific primers for ITS-1, 18S rDNA and 23S rDNA of E. stiedai and E. media. WGA-based PCR was successful for the amplification of these genes from single oocyst. For the species identification of single oocyst isolated from mixed E. stiedai or E. media, the results from WGA-based PCR were exactly in accordance with those from morphological identification, suggesting the availability of this method in molecular analysis of eimerian parasites at the single oocyst level. WGA-based PCR method can also be applied for the identification and genetic characterization of other protists.

  2. Unique features of a Japanese 'Candidatus Liberibacter asiaticus' strain revealed by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Hiroshi Katoh

    Full Text Available Citrus greening (huanglongbing is the most destructive disease of citrus worldwide. It is spread by citrus psyllids and is associated with phloem-limited bacteria of three species of α-Proteobacteria, namely, 'Candidatus Liberibacter asiaticus', 'Ca. L. americanus', and 'Ca. L. africanus'. Recent findings suggested that some Japanese strains lack the bacteriophage-type DNA polymerase region (DNA pol, in contrast to the Floridian psy62 strain. The whole genome sequence of the pol-negative 'Ca. L. asiaticus' Japanese isolate Ishi-1 was determined by metagenomic analysis of DNA extracted from 'Ca. L. asiaticus'-infected psyllids and leaf midribs. The 1.19-Mb genome has an average 36.32% GC content. Annotation revealed 13 operons encoding rRNA and 44 tRNA genes, but no typical bacterial pathogenesis-related genes were located within the genome, similar to the Floridian psy62 and Chinese gxpsy. In contrast to other 'Ca. L. asiaticus' strains, the genome of the Japanese Ishi-1 strain lacks a prophage-related region.

  3. Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank; Platt, Darren

    2006-02-06

    The DOE Joint Genome Institute has sequenced over 50 eukaryotic genomes, ranging in size from 15 MB to 1.6 GB, over a wide range of organism types. In the course of doing so, it has become clear that a substantial fraction of these data sets contains bonus organisms, usually prokaryotes, in addition to the desired genome. While some of these additional organisms are extraneous contamination, they are sometimes symbionts, and so can be of biological interest. Therefore, it is desirable to assemble the bonus organisms along with the main genome. This transforms the problem into one of metagenomic assembly, which is considerably more challenging than traditional whole-genome shotgun (WGS) assembly. The different organisms will usually be present at different sequence depths, which is difficult to handle in most WGS assemblers. In addition, with multiple distinct genomes present, chimerism can produce cross-organism combinations. Finally, there is no guarantee that only a single bonus organism will be present. For example, one JGI project contained at least two different prokaryotic contaminants, plus a 145 KB plasmid of unknown origin. We have developed techniques to routinely identify and handle such bonus organisms in a high-throughput sequencing environment. Approaches include screening and partitioning the unassembled data, and iterative subassemblies. These methods are applicable not only to bonus organisms, but also to desired components such as organelles. These procedures have the additional benefit of identifying, and allowing for the removal of, cloning artifacts such as E.coli and spurious vector inclusions.

  4. Whole-genome sequencing reveals the effect of vaccination on the evolution of Bordetella pertussis.

    Science.gov (United States)

    Xu, Yinghua; Liu, Bin; Gröndahl-Yli-Hannuksila, Kirsi; Tan, Yajun; Feng, Lu; Kallonen, Teemu; Wang, Lichan; Peng, Ding; He, Qiushui; Wang, Lei; Zhang, Shumin

    2015-08-18

    Herd immunity can potentially induce a change of circulating viruses. However, it remains largely unknown that how bacterial pathogens adapt to vaccination. In this study, Bordetella pertussis, the causative agent of whooping cough, was selected as an example to explore possible effect of vaccination on the bacterial pathogen. We sequenced and analysed the complete genomes of 40 B. pertussis strains from Finland and China, as well as 11 previously sequenced strains from the Netherlands, where different vaccination strategies have been used over the past 50 years. The results showed that the molecular clock moved at different rates in these countries and in distinct periods, which suggested that evolution of the B. pertussis population was closely associated with the country vaccination coverage. Comparative whole-genome analyses indicated that evolution in this human-restricted pathogen was mainly characterised by ongoing genetic shift and gene loss. Furthermore, 116 SNPs were specifically detected in currently circulating ptxP3-containing strains. The finding might explain the successful emergence of this lineage and its spread worldwide. Collectively, our results suggest that the immune pressure of vaccination is one major driving force for the evolution of B. pertussis, which facilitates further exploration of the pathogenicity of B. pertussis.

  5. Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data.

    Science.gov (United States)

    Tsuji, Junko; Weng, Zhiping

    2016-11-01

    Cytosine methylation regulates many biological processes such as gene expression, chromatin structure and chromosome stability. The whole genome bisulfite sequencing (WGBS) technique measures the methylation level at each cytosine throughout the genome. There are an increasing number of publicly available pipelines for analyzing WGBS data, reflecting many choices of read mapping algorithms as well as preprocessing and postprocessing methods. We simulated single-end and paired-end reads based on three experimental data sets, and comprehensively evaluated 192 combinations of three preprocessing, five postprocessing and five widely used read mapping algorithms. We also compared paired-end data with single-end data at the same sequencing depth for performance of read mapping and methylation level estimation. Bismark and LAST were the most robust mapping algorithms. We found that Mott trimming and quality filtering individually improved the performance of both read mapping and methylation level estimation, but combining them did not lead to further improvement. Furthermore, we confirmed that paired-end sequencing reduced error rate and enhanced sensitivity for both read mapping and methylation level estimation, especially for short reads and in repetitive regions of the human genome.

  6. Homoeologous chromosomes of Xenopus laevis are highly conserved after whole-genome duplication.

    Science.gov (United States)

    Uno, Y; Nishida, C; Takagi, C; Ueno, N; Matsuda, Y

    2013-11-01

    It has been suggested that whole-genome duplication (WGD) occurred twice during the evolutionary process of vertebrates around 450 and 500 million years ago, which contributed to an increase in the genomic and phenotypic complexities of vertebrates. However, little is still known about the evolutionary process of homoeologous chromosomes after WGD because many duplicate genes have been lost. Therefore, Xenopus laevis (2n=36) and Xenopus (Silurana) tropicalis (2n=20) are good animal models for studying the process of genomic and chromosomal reorganization after WGD because X. laevis is an allotetraploid species that resulted from WGD after the interspecific hybridization of diploid species closely related to X. tropicalis. We constructed a comparative cytogenetic map of X. laevis using 60 complimentary DNA clones that covered the entire chromosomal regions of 10 pairs of X. tropicalis chromosomes. We consequently identified all nine homoeologous chromosome groups of X. laevis. Hybridization signals on two pairs of X. laevis homoeologous chromosomes were detected for 50 of 60 (83%) genes, and the genetic linkage is highly conserved between X. tropicalis and X. laevis chromosomes except for one fusion and one inversion and also between X. laevis homoeologous chromosomes except for two inversions. These results indicate that the loss of duplicated genes and inter- and/or intrachromosomal rearrangements occurred much less frequently in this lineage, suggesting that these events were not essential for diploidization of the allotetraploid genome in X. laevis after WGD.

  7. Whole genome sequencing reveals a de novo SHANK3 mutation in familial autism spectrum disorder.

    Directory of Open Access Journals (Sweden)

    Sergio I Nemirovsky

    Full Text Available Clinical genomics promise to be especially suitable for the study of etiologically heterogeneous conditions such as Autism Spectrum Disorder (ASD. Here we present three siblings with ASD where we evaluated the usefulness of Whole Genome Sequencing (WGS for the diagnostic approach to ASD.We identified a family segregating ASD in three siblings with an unidentified cause. We performed WGS in the three probands and used a state-of-the-art comprehensive bioinformatic analysis pipeline and prioritized the identified variants located in genes likely to be related to ASD. We validated the finding by Sanger sequencing in the probands and their parents.Three male siblings presented a syndrome characterized by severe intellectual disability, absence of language, autism spectrum symptoms and epilepsy with negative family history for mental retardation, language disorders, ASD or other psychiatric disorders. We found germline mosaicism for a heterozygous deletion of a cytosine in the exon 21 of the SHANK3 gene, resulting in a missense sequence of 5 codons followed by a premature stop codon (NM_033517:c.3259_3259delC, p.Ser1088Profs*6.We reported an infrequent form of familial ASD where WGS proved useful in the clinic. We identified a mutation in SHANK3 that underscores its relevance in Autism Spectrum Disorder.

  8. Using whole-genome sequencing to determine appropriate streptomycin epidemiological cutoffs for Salmonella and Escherichia coli.

    Science.gov (United States)

    Tyson, Gregory H; Li, Cong; Ayers, Sherry; McDermott, Patrick F; Zhao, Shaohua

    2016-02-01

    For Enterobacteriaceae such as Salmonella spp. and Escherichia coli, no unified interpretive resistance criteria exist for streptomycin, an epidemiologically important antibiotic. As part of the National Antimicrobial Resistance Monitoring System, we had previously used a minimum inhibitory concentration of ≥ 64 μg mL(-1) as an epidemiological cutoff value (ECV) to define non-wild-type isolates. To identify whether this ECV correlated with genetic determinants of resistance, we performed whole-genome sequencing of 463 Salmonella and E. coli isolates to identify streptomycin resistance genotypes. From this analysis, we found that using a streptomycin resistance breakpoint of ≥ 64 μg mL(-1) classified over 20% of strains possessing aadA or strA/strB resistance genes as wild-type. Therefore, to improve the concordance between genotypic and phenotypic data, we propose reducing the phenotypic cutoff values to ≥ 32 μg mL(-1) for both Salmonella and E. coli, to be used widely as ECVs to categorize non-wild-type isolates.

  9. Whole-Genome Sequencing Uncovers the Genetic Basis of Chronic Mountain Sickness in Andean Highlanders

    Science.gov (United States)

    Zhou, Dan; Udpa, Nitin; Ronen, Roy; Stobdan, Tsering; Liang, Junbin; Appenzeller, Otto; Zhao, Huiwen W.; Yin, Yi; Du, Yuanping; Guo, Lixia; Cao, Rui; Wang, Yu; Jin, Xin; Huang, Chen; Jia, Wenlong; Cao, Dandan; Guo, Guangwu; Gamboa, Jorge L.; Villafuerte, Francisco; Callacondo, David; Xue, Jin; Liu, Siqi; Frazer, Kelly A.; Li, Yingrui; Bafna, Vineet; Haddad, Gabriel G.

    2013-01-01

    The hypoxic conditions at high altitudes present a challenge for survival, causing pressure for adaptation. Interestingly, many high-altitude denizens (particularly in the Andes) are maladapted, with a condition known as chronic mountain sickness (CMS) or Monge disease. To decode the genetic basis of this disease, we sequenced and compared the whole genomes of 20 Andean subjects (10 with CMS and 10 without). We discovered 11 regions genome-wide with significant differences in haplotype frequencies consistent with selective sweeps. In these regions, two genes (an erythropoiesis regulator, SENP1, and an oncogene, ANP32D) had a higher transcriptional response to hypoxia in individuals with CMS relative to those without. We further found that downregulating the orthologs of these genes in flies dramatically enhanced survival rates under hypoxia, demonstrating that suppression of SENP1 and ANP32D plays an essential role in hypoxia tolerance. Our study provides an unbiased framework to identify and validate the genetic basis of adaptation to high altitudes and identifies potentially targetable mechanisms for CMS treatment. PMID:23954164

  10. Whole-Genome Enrichment Provides Deep Insights into Vibrio cholerae Metagenome from an African River.

    Science.gov (United States)

    Vezzulli, L; Grande, C; Tassistro, G; Brettar, I; Höfle, M G; Pereira, R P A; Mushi, D; Pallavicini, A; Vassallo, P; Pruzzo, C

    2016-11-25

    The detection and typing of Vibrio cholerae in natural aquatic environments encounter major methodological challenges related to the fact that the bacterium is often present in environmental matrices at very low abundance in nonculturable state. This study applied, for the first time to our knowledge, a whole-genome enrichment (WGE) and next-generation sequencing (NGS) approach for direct genotyping and metagenomic analysis of low abundant V. cholerae DNA (V. cholerae metagenomic DNA via hybridization. An enriched V. cholerae metagenome library was generated and sequenced on an Illumina MiSeq platform. Up to 1.8 × 10(7) bp (4.5× mean read depth) were found to map against V. cholerae reference genome sequences representing an increase of about 2500 times in target DNA coverage compared to theoretical calculations of performance for shotgun metagenomics. Analysis of metagenomic data revealed the presence of several V. cholerae virulence and virulence associated genes in river water including major virulence regions (e.g. CTX prophage and Vibrio pathogenicity island-1) and genetic markers of epidemic strains (e.g. O1-antigen biosynthesis gene cluster) that were not detectable by standard culture and molecular techniques. Overall, besides providing a powerful tool for direct genotyping of V. cholerae in complex environmental matrices, this study provides a 'proof of concept' on the methodological gap that might currently preclude a more comprehensive understanding of toxigenic V. cholerae emergence from natural aquatic environments.

  11. Whole genome grey and white matter DNA methylation profiles in dorsolateral prefrontal cortex.

    Science.gov (United States)

    Sanchez-Mut, Jose Vicente; Heyn, Holger; Vidal, Enrique; Delgado-Morales, Raúl; Moran, Sebastian; Sayols, Sergi; Sandoval, Juan; Ferrer, Isidre; Esteller, Manel; Gräff, Johannes

    2017-01-20

    The brain's neocortex is anatomically organized into grey and white matter, which are mainly composed by neuronal and glial cells, respectively. The neocortex can be further divided in different Brodmann areas according to their cytoarchitectural organization, which are associated with distinct cortical functions. There is increasing evidence that brain development and function are governed by epigenetic processes, yet their contribution to the functional organization of the neocortex remains incompletely understood. Herein, we determined the DNA methylation patterns of grey and white matter of dorsolateral prefrontal cortex (Brodmann area 9), an important region for higher cognitive skills that is particularly affected in various neurological diseases. For avoiding interindividual differences, we analyzed white and grey matter from the same donor using whole genome bisulfite sequencing, and for validating their biological significance, we used Infinium HumanMethylation450 BeadChip and pyrosequencing in ten and twenty independent samples, respectively. The combination of these analysis indicated robust grey-white matter differences in DNA methylation. What is more, cell type-specific markers were enriched among the most differentially methylated genes. Interestingly, we also found an outstanding number of grey-white matter differentially methylated genes that have previously been associated with Alzheimer's, Parkinson's, and Huntington's disease, as well as Multiple and Amyotrophic lateral sclerosis. The data presented here thus constitute an important resource for future studies not only to gain insight into brain regional as well as grey and white matter differences, but also to unmask epigenetic alterations that might underlie neurological and neurodegenerative diseases.

  12. RepARK--de novo creation of repeat libraries from whole-genome NGS reads.

    Science.gov (United States)

    Koch, Philipp; Platzer, Matthias; Downie, Bryan R

    2014-05-01

    Generation of repeat libraries is a critical step for analysis of complex genomes. In the era of next-generation sequencing (NGS), such libraries are usually produced using a whole-genome shotgun (WGS) derived reference sequence whose completeness greatly influences the quality of derived repeat libraries. We describe here a de novo repeat assembly method--RepARK (Repetitive motif detection by Assembly of Repetitive K-mers)--which avoids potential biases by using abundant k-mers of NGS WGS reads without requiring a reference genome. For validation, repeat consensuses derived from simulated and real Drosophila melanogaster NGS WGS reads were compared to repeat libraries generated by four established methods. RepARK is orders of magnitude faster than the other methods and generates libraries that are: (i) composed almost entirely of repetitive motifs, (ii) more comprehensive and (iii) almost completely annotated by TEclass. Additionally, we show that the RepARK method is applicable to complex genomes like human and can even serve as a diagnostic tool to identify repetitive sequences contaminating NGS datasets.

  13. Automated whole-genome multiple alignment of rat, mouse, and human

    Energy Technology Data Exchange (ETDEWEB)

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  14. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    Energy Technology Data Exchange (ETDEWEB)

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  15. Whole-Genome Sequencing of Native Sheep Provides Insights into Rapid Adaptations to Extreme Environments.

    Science.gov (United States)

    Yang, Ji; Li, Wen-Rong; Lv, Feng-Hua; He, San-Gang; Tian, Shi-Lin; Peng, Wei-Feng; Sun, Ya-Wei; Zhao, Yong-Xin; Tu, Xiao-Long; Zhang, Min; Xie, Xing-Long; Wang, Yu-Tao; Li, Jin-Quan; Liu, Yong-Gang; Shen, Zhi-Qiang; Wang, Feng; Liu, Guang-Jian; Lu, Hong-Feng; Kantanen, Juha; Han, Jian-Lin; Li, Meng-Hua; Liu, Ming-Jun

    2016-10-01

    Global climate change has a significant effect on extreme environments and a profound influence on species survival. However, little is known of the genome-wide pattern of livestock adaptations to extreme environments over a short time frame following domestication. Sheep (Ovis aries) have become well adapted to a diverse range of agroecological zones, including certain extreme environments (e.g., plateaus and deserts), during their post-domestication (approximately 8-9 kya) migration and differentiation. Here, we generated whole-genome sequences from 77 native sheep, with an average effective sequencing depth of ∼5× for 75 samples and ∼42× for 2 samples. Comparative genomic analyses among sheep in contrasting environments, that is, plateau (>4,000 m above sea level) versus lowland (1500 m) versus low-altitude region (600 mm), and arid zone (400 mm), detected a novel set of candidate genes as well as pathways and GO categories that are putatively associated with hypoxia responses at high altitudes and water reabsorption in arid environments. In addition, candidate genes and GO terms functionally related to energy metabolism and body size variations were identified. This study offers novel insights into rapid genomic adaptations to extreme environments in sheep and other animals, and provides a valuable resource for future research on livestock breeding in response to climate change.

  16. Whole-genome analyses of Korean native and Holstein cattle breeds by massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Jung-Woo Choi

    Full Text Available A main goal of cattle genomics is to identify DNA differences that account for variations in economically important traits. In this study, we performed whole-genome analyses of three important cattle breeds in Korea--Hanwoo, Jeju Heugu, and Korean Holstein--using the Illumina HiSeq 2000 sequencing platform. We achieved 25.5-, 29.6-, and 29.5-fold coverage of the Hanwoo, Jeju Heugu, and Korean Holstein genomes, respectively, and identified a total of 10.4 million single nucleotide polymorphisms (SNPs, of which 54.12% were found to be novel. We also detected 1,063,267 insertions-deletions (InDels across the genomes (78.92% novel. Annotations of the datasets identified a total of 31,503 nonsynonymous SNPs and 859 frameshift InDels that could affect phenotypic variations in traits of interest. Furthermore, genome-wide copy number variation regions (CNVRs were detected by comparing the Hanwoo, Jeju Heugu, and previously published Chikso genomes against that of Korean Holstein. A total of 992, 284, and 1881 CNVRs, respectively, were detected throughout the genome. Moreover, 53, 65, 45, and 82 putative regions of homozygosity (ROH were identified in Hanwoo, Jeju Heugu, Chikso, and Korean Holstein respectively. The results of this study provide a valuable foundation for further investigations to dissect the molecular mechanisms underlying variation in economically important traits in cattle and to develop genetic markers for use in cattle breeding.

  17. Multidrug-resistant Escherichia coli soft tissue infection investigated with bacterial whole genome sequencing

    Science.gov (United States)

    Buchanan, Ruaridh; Stoesser, Nicole; Crook, Derrick; Bowler, Ian C J W

    2014-01-01

    A 45-year-old man with dilated cardiomyopathy presented with acute leg pain and erythema suggestive of necrotising fasciitis. Initial surgical exploration revealed no necrosis and treatment for a soft tissue infection was started. Blood and tissue cultures unexpectedly grew a Gram-negative bacillus, subsequently identified by an automated broth microdilution phenotyping system as an extended-spectrum β-lactamase producing Escherichia coli. The patient was treated with a 3-week course of antibiotics (ertapenem followed by ciprofloxacin) and debridement for small areas of necrosis, followed by skin grafting. The presence of E. coli triggered investigation of both host and pathogen. The patient was found to have previously undiagnosed liver disease, a risk factor for E. coli soft tissue infection. Whole genome sequencing of isolates from all specimens confirmed they were clonal, of sequence type ST131 and associated with a likely plasmid-associated AmpC (CMY-2), several other resistance genes and a number of virulence factors. PMID:25331151

  18. Sensitive and specific KRAS somatic mutation analysis on whole-genome amplified DNA from archival tissues.

    Science.gov (United States)

    van Eijk, Ronald; van Puijenbroek, Marjo; Chhatta, Amiet R; Gupta, Nisha; Vossen, Rolf H A M; Lips, Esther H; Cleton-Jansen, Anne-Marie; Morreau, Hans; van Wezel, Tom

    2010-01-01

    Kirsten RAS (KRAS) is a small GTPase that plays a key role in Ras/mitogen-activated protein kinase signaling; somatic mutations in KRAS are frequently found in many cancers. The most common KRAS mutations result in a constitutively active protein. Accurate detection of KRAS mutations is pivotal to the molecular diagnosis of cancer and may guide proper treatment selection. Here, we describe a two-step KRAS mutation screening protocol that combines whole-genome amplification (WGA), high-resolution melting analysis (HRM) as a prescreen method for mutation carrying samples, and direct Sanger sequencing of DNA from formalin-fixed, paraffin-embedded (FFPE) tissue, from which limited amounts of DNA are available. We developed target-specific primers, thereby avoiding amplification of homologous KRAS sequences. The addition of herring sperm DNA facilitated WGA in DNA samples isolated from as few as 100 cells. KRAS mutation screening using high-resolution melting analysis on wgaDNA from formalin-fixed, paraffin-embedded tissue is highly sensitive and specific; additionally, this method is feasible for screening of clinical specimens, as illustrated by our analysis of pancreatic cancers. Furthermore, PCR on wgaDNA does not introduce genotypic changes, as opposed to unamplified genomic DNA. This method can, after validation, be applied to virtually any potentially mutated region in the genome.

  19. Whole-genome duplication and molecular evolution in Cornus L. (Cornaceae) – Insights from transcriptome sequences

    Science.gov (United States)

    Yu, Yan; Xiang, Qiuyun; Manos, Paul S.; Soltis, Douglas E.; Soltis, Pamela S.; Song, Bao-Hua; Cheng, Shifeng; Liu, Xin; Wong, Gane

    2017-01-01

    The pattern and rate of genome evolution have profound consequences in organismal evolution. Whole-genome duplication (WGD), or polyploidy, has been recognized as an important evolutionary mechanism of plant diversification. However, in non-model plants the molecular signals of genome duplications have remained largely unexplored. High-throughput transcriptome data from next-generation sequencing have set the stage for novel investigations of genome evolution using new bioinformatic and methodological tools in a phylogenetic framework. Here we compare ten de novo-assembled transcriptomes representing the major lineages of the angiosperm genus Cornus (dogwood) and relevant outgroups using a customized pipeline for analyses. Using three distinct approaches, molecular dating of orthologous genes, analyses of the distribution of synonymous substitutions between paralogous genes, and examination of substitution rates through time, we detected a shared WGD event in the late Cretaceous across all taxa sampled. The inferred doubling event coincides temporally with the paleoclimatic changes associated with the initial divergence of the genus into three major lineages. Analyses also showed an acceleration of rates of molecular evolution after WGD. The highest rates of molecular evolution were observed in the transcriptome of the herbaceous lineage, C. canadensis, a species commonly found at higher latitudes, including the Arctic. Our study demonstrates the value of transcriptome data for understanding genome evolution in closely related species. The results suggest dramatic increase in sea surface temperature in the late Cretaceous may have contributed to the evolution and diversification of flowering plants. PMID:28225773

  20. Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications.

    Science.gov (United States)

    Tank, David C; Eastman, Jonathan M; Pennell, Matthew W; Soltis, Pamela S; Soltis, Douglas E; Hinchliff, Cody E; Brown, Joseph W; Sessa, Emily B; Harmon, Luke J

    2015-07-01

    Our growing understanding of the plant tree of life provides a novel opportunity to uncover the major drivers of angiosperm diversity. Using a time-calibrated phylogeny, we characterized hot and cold spots of lineage diversification across the angiosperm tree of life by modeling evolutionary diversification using stepwise AIC (MEDUSA). We also tested the whole-genome duplication (WGD) radiation lag-time model, which postulates that increases in diversification tend to lag behind established WGD events. Diversification rates have been incredibly heterogeneous throughout the evolutionary history of angiosperms and reveal a pattern of 'nested radiations' - increases in net diversification nested within other radiations. This pattern in turn generates a negative relationship between clade age and diversity across both families and orders. We suggest that stochastically changing diversification rates across the phylogeny explain these patterns. Finally, we demonstrate significant statistical support for the WGD radiation lag-time model. Across angiosperms, nested shifts in diversification led to an overall increasing rate of net diversification and declining relative extinction rates through time. These diversification shifts are only rarely perfectly associated with WGD events, but commonly follow them after a lag period.

  1. Whole-genome transcriptional analysis of heavy metal stresses inCaulobacter crescentus

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Ping; Brodie, Eoin L.; Suzuki, Yohey; McAdams, Harley H.; Andersen, Gary L.

    2005-09-21

    The bacterium Caulobacter crescentus and related stalkbacterial species are known for their distinctive ability to live in lownutrient environments, a characteristic of most heavy metal contaminatedsites. Caulobacter crescentus is a model organism for studying cell cycleregulation with well developed genetics. We have identified the pathwaysresponding to heavy metal toxicity in C. crescentus to provide insightsfor possible application of Caulobacter to environmental restoration. Weexposed C. crescentus cells to four heavy metals (chromium, cadmium,selenium and uranium) and analyzed genome wide transcriptional activitiespost exposure using a Affymetrix GeneChip microarray. C. crescentusshowed surprisingly high tolerance to uranium, a possible mechanism forwhich may be formation of extracellular calcium-uranium-phosphateprecipitates. The principal response to these metals was protectionagainst oxidative stress (up-regulation of manganese-dependent superoxidedismutase, sodA). Glutathione S-transferase, thioredoxin, glutaredoxinsand DNA repair enzymes responded most strongly to cadmium and chromate.The cadmium and chromium stress response also focused on reducing theintracellular metal concentration, with multiple efflux pumps employed toremove cadmium while a sulfate transporter was down-regulated to reducenon-specific uptake of chromium. Membrane proteins were also up-regulatedin response to most of the metals tested. A two-component signaltransduction system involved in the uranium response was identified.Several differentially regulated transcripts from regions previously notknown to encode proteins were identified, demonstrating the advantage ofevaluating the transcriptome using whole genome microarrays.

  2. Whole genome amplification and de novo assembly of single bacterial cells.

    Directory of Open Access Journals (Sweden)

    Sébastien Rodrigue

    Full Text Available BACKGROUND: Single-cell genome sequencing has the potential to allow the in-depth exploration of the vast genetic diversity found in uncultured microbes. We used the marine cyanobacterium Prochlorococcus as a model system for addressing important challenges facing high-throughput whole genome amplification (WGA and complete genome sequencing of individual cells. METHODOLOGY/PRINCIPAL FINDINGS: We describe a pipeline that enables single-cell WGA on hundreds of cells at a time while virtually eliminating non-target DNA from the reactions. We further developed a post-amplification normalization procedure that mitigates extreme variations in sequencing coverage associated with multiple displacement amplification (MDA, and demonstrated that the procedure increased sequencing efficiency and facilitated genome assembly. We report genome recovery as high as 99.6% with reference-guided assembly, and 95% with de novo assembly starting from a single cell. We also analyzed the impact of chimera formation during MDA on de novo assembly, and discuss strategies to minimize the presence of incorrectly joined regions in contigs. CONCLUSIONS/SIGNIFICANCE: The methods describe in this paper will be useful for sequencing genomes of individual cells from a variety of samples.

  3. Digital Droplet Multiple Displacement Amplification (ddMDA for Whole Genome Sequencing of Limited DNA Samples.

    Directory of Open Access Journals (Sweden)

    Minsoung Rhee

    Full Text Available Multiple displacement amplification (MDA is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently, the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet, ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology.

  4. Paired tumor and normal whole genome sequencing of metastatic olfactory neuroblastoma.

    Directory of Open Access Journals (Sweden)

    Glen J Weiss

    Full Text Available BACKGROUND: Olfactory neuroblastoma (ONB is a rare cancer of the sinonasal tract with little molecular characterization. We performed whole genome sequencing (WGS on paired normal and tumor DNA from a patient with metastatic-ONB to identify the somatic alterations that might be drivers of tumorigenesis and/or metastatic progression. METHODOLOGY/PRINCIPAL FINDINGS: Genomic DNA was isolated from fresh frozen tissue from a metastatic lesion and whole blood, followed by WGS at >30X depth, alignment and mapping, and mutation analyses. Sanger sequencing was used to confirm selected mutations. Sixty-two somatic short nucleotide variants (SNVs and five deletions were identified inside coding regions, each causing a non-synonymous DNA sequence change. We selected seven SNVs and validated them by Sanger sequencing. In the metastatic ONB samples collected several months prior to WGS, all seven mutations were present. However, in the original surgical resection specimen (prior to evidence of metastatic disease, mutations in KDR, MYC, SIN3B, and NLRC4 genes were not present, suggesting that these were acquired with disease progression and/or as a result of post-treatment effects. CONCLUSIONS/SIGNIFICANCE: This work provides insight into the evolution of ONB cancer cells and provides a window into the more complex factors, including tumor clonality and multiple driver mutations.

  5. Whole Genome Sequencing of Field Isolates Reveals Extensive Genetic Diversity in Plasmodium vivax from Colombia.

    Science.gov (United States)

    Winter, David J; Pacheco, M Andreína; Vallejo, Andres F; Schwartz, Rachel S; Arevalo-Herrera, Myriam; Herrera, Socrates; Cartwright, Reed A; Escalante, Ananias A

    2015-12-01

    Plasmodium vivax is the most prevalent malarial species in South America and exerts a substantial burden on the populations it affects. The control and eventual elimination of P. vivax are global health priorities. Genomic research contributes to this objective by improving our understanding of the biology of P. vivax and through the development of new genetic markers that can be used to monitor efforts to reduce malaria transmission. Here we analyze whole-genome data from eight field samples from a region in Cordóba, Colombia where malaria is endemic. We find considerable genetic diversity within this population, a result that contrasts with earlier studies suggesting that P. vivax had limited diversity in the Americas. We also identify a selective sweep around a substitution known to confer resistance to sulphadoxine-pyrimethamine (SP). This is the first observation of a selective sweep for SP resistance in this species. These results indicate that P. vivax has been exposed to SP pressure even when the drug is not in use as a first line treatment for patients afflicted by this parasite. We identify multiple non-synonymous substitutions in three other genes known to be involved with drug resistance in Plasmodium species. Finally, we found extensive microsatellite polymorphisms. Using this information we developed 18 polymorphic and easy to score microsatellite loci that can be used in epidemiological investigations in South America.

  6. A novel strategy for clustering major depression individuals using whole-genome sequencing variant data

    Science.gov (United States)

    Yu, Chenglong; Baune, Bernhard T.; Licinio, Julio; Wong, Ma-Li

    2017-01-01

    Major depressive disorder (MDD) is highly prevalent, resulting in an exceedingly high disease burden. The identification of generic risk factors could lead to advance prevention and therapeutics. Current approaches examine genotyping data to identify specific variations between cases and controls. Compared to genotyping, whole-genome sequencing (WGS) allows for the detection of private mutations. In this proof-of-concept study, we establish a conceptually novel computational approach that clusters subjects based on the entirety of their WGS. Those clusters predicted MDD diagnosis. This strategy yielded encouraging results, showing that depressed Mexican-American participants were grouped closer; in contrast ethnically-matched controls grouped away from MDD patients. This implies that within the same ancestry, the WGS data of an individual can be used to check whether this individual is within or closer to MDD subjects or to controls. We propose a novel strategy to apply WGS data to clinical medicine by facilitating diagnosis through genetic clustering. Further studies utilising our method should examine larger WGS datasets on other ethnical groups. PMID:28287625

  7. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.

    Science.gov (United States)

    Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G

    2012-12-07

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.

  8. Insights into three whole-genome duplications gleaned from the Paramecium caudatum genome sequence.

    Science.gov (United States)

    McGrath, Casey L; Gout, Jean-Francois; Doak, Thomas G; Yanagi, Akira; Lynch, Michael

    2014-08-01

    Paramecium has long been a model eukaryote. The sequence of the Paramecium tetraurelia genome reveals a history of three successive whole-genome duplications (WGDs), and the sequences of P. biaurelia and P. sexaurelia suggest that these WGDs are shared by all members of the aurelia species complex. Here, we present the genome sequence of P. caudatum, a species closely related to the P. aurelia species group. P. caudatum shares only the most ancient of the three WGDs with the aurelia complex. We found that P. caudatum maintains twice as many paralogs from this early event as the P. aurelia species, suggesting that post-WGD gene retention is influenced by subsequent WGDs and supporting the importance of selection for dosage in gene retention. The availability of P. caudatum as an outgroup allows an expanded analysis of the aurelia intermediate and recent WGD events. Both the Guanine+Cytosine (GC) content and the expression level of preduplication genes are significant predictors of duplicate retention. We find widespread asymmetrical evolution among aurelia paralogs, which is likely caused by gradual pseudogenization rather than by neofunctionalization. Finally, cases of divergent resolution of intermediate WGD duplicates between aurelia species implicate this process acts as an ongoing reinforcement mechanism of reproductive isolation long after a WGD event.

  9. Whole genome data for omics-based research on the self-fertilizing fish Kryptolebias marmoratus.

    Science.gov (United States)

    Rhee, Jae-Sung; Lee, Jae-Seong

    2014-08-30

    Genome resources have advantages for understanding diverse areas such as biological patterns and functioning of organisms. Omics platforms are useful approaches for the study of organs and organisms. These approaches can be powerful screening tools for whole genome, proteome, and metabolome profiling, and can be used to understand molecular changes in response to internal and external stimuli. This methodology has been applied successfully in freshwater model fish such as the zebrafish Danio rerio and the Japanese medaka Oryzias latipes in research areas such as basic physiology, developmental biology, genetics, and environmental biology. However, information is still scarce about model fish that inhabit brackish water or seawater. To develop the self-fertilizing killifish Kryptolebias marmoratus as a potential model species with unique characteristics and research merits, we obtained genomic information about K. marmoratus. We address ways to use these data for genome-based molecular mechanistic studies. We review the current state of genome information on K. marmoratus to initiate omics approaches. We evaluate the potential applications of integrated omics platforms for future studies in environmental science, developmental biology, and biomedical research. We conclude that information about the K. marmoratus genome will provide a better understanding of the molecular functions of genes, proteins, and metabolites that are involved in the biological functions of this species. Omics platforms, particularly combined technologies that make effective use of bioinformatics, will provide powerful tools for hypothesis-driven investigations and discovery-driven discussions on diverse aspects of this species and on fish and vertebrates in general.

  10. Whole genome sequencing of Gir cattle for identifying polymorphisms and loci under selection.

    Science.gov (United States)

    Liao, Xiaoping; Peng, Fred; Forni, Selma; McLaren, David; Plastow, Graham; Stothard, Paul

    2013-10-01

    Genetic variation in Gir cattle (Bos indicus) has so far not been well characterized. In this study, we used whole genome sequencing of three Gir bulls and a pooled sample from another 11 bulls to identify polymorphisms and loci under selection. A total of 9 990 733 single nucleotide polymorphisms (SNPs) and 604 308 insertion/deletions (indels) were discovered in Gir samples, of which 62.34% and 83.62%, respectively, are previously unknown. Moreover, we detected 79 putative selective sweeps using the sequence data of the pooled sample. One of the most striking sweeps harbours several genes belonging to the cathelicidin gene family, such as CAMP, CATHL1, CATHL2, and CATHL3, which are related to pathogen- and parasite-resistance. Another interesting region harbours genes encoding mitogen-activated protein kinases, which are involved in directing cellular responses to a variety of stimuli, such as osmotic stress and heat shock. These findings are particularly interesting because Gir is resistant to hot temperatures and tropical diseases. This initial selective sweep analysis of Gir cattle has revealed a number of loci that could be important for their adaptation to tropical climates.

  11. Whole Genome Expression Profiling and Signal Pathway Screening of MSCs in Ankylosing Spondylitis

    Directory of Open Access Journals (Sweden)

    Yuxi Li

    2014-01-01

    Full Text Available The pathogenesis of dysfunctional immunoregulation of mesenchymal stem cells (MSCs in ankylosing spondylitis (AS is thought to be a complex process that involves multiple genetic alterations. In this study, MSCs derived from both healthy donors and AS patients were cultured in normal media or media mimicking an inflammatory environment. Whole genome expression profiling analysis of 33,351 genes was performed and differentially expressed genes related to AS were analyzed by GO term analysis and KEGG pathway analysis. Our results showed that in normal media 676 genes were differentially expressed in AS, 354 upregulated and 322 downregulated, while in an inflammatory environment 1767 genes were differentially expressed in AS, 1230 upregulated and 537 downregulated. GO analysis showed that these genes were mainly related to cellular processes, physiological processes, biological regulation, regulation of biological processes, and binding. In addition, by KEGG pathway analysis, 14 key genes from the MAPK signaling and 8 key genes from the TLR signaling pathway were identified as differentially regulated. The results of qRT-PCR verified the expression variation of the 9 genes mentioned above. Our study found that in an inflammatory environment ankylosing spondylitis pathogenesis may be related to activation of the MAPK and TLR signaling pathways.

  12. Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits.

    Science.gov (United States)

    Kessner, Darren; Novembre, John

    2015-04-01

    Evolve and resequence studies combine artificial selection experiments with massively parallel sequencing technology to study the genetic basis for complex traits. In these experiments, individuals are selected for extreme values of a trait, causing alleles at quantitative trait loci (QTL) to increase or decrease in frequency in the experimental population. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework that explicitly models whole genomes of individuals, quantitative traits, and selection based on individual trait values. We find that explicitly modeling QTL provides qualitatively different insights than considering independent loci with constant selection coefficients. Specifically, we observe how interference between QTL under selection affects the trajectories and lengthens the fixation times of selected alleles. We also show that a substantial portion of the genetic variance of the trait (50-100%) can be explained by detected QTL in as little as 20 generations of selection, depending on the trait architecture and experimental design. Furthermore, we show that power depends crucially on the opportunity for recombination during the experiment. Finally, we show that an increase in power is obtained by leveraging founder haplotype information to obtain allele frequency estimates.

  13. Kuwaiti population subgroup of nomadic Bedouin ancestry—Whole genome sequence and analysis

    Directory of Open Access Journals (Sweden)

    Sumi Elsa John

    2015-03-01

    Full Text Available Kuwaiti native population comprises three distinct genetic subgroups of Persian, “city-dwelling” Saudi Arabian tribe, and nomadic “tent-dwelling” Bedouin ancestry. Bedouin subgroup is characterized by presence of 17% African ancestry; it owes it origin to nomadic tribes of the deserts of Arabian Peninsula and North Africa. By sequencing whole genome of a Kuwaiti male from this subgroup at 41X coverage, we report 3,752,878 SNPs, 411,839 indels, and 8451 structural variations. Neighbor-joining tree, based on shared variant positions carrying disease-risk alleles between the Bedouin and other continental genomes, places Bedouin genome at the nexus of African, Asian, and European genomes in concordance with geographical location of Kuwait and Peninsula. In congruence with participant's medical history for morbid obesity and bronchial asthma, risk alleles are seen at deleterious SNPs associated with obesity and asthma. Many of the observed deleterious ‘novel’ variants lie in genes associated with autosomal recessive disorders characteristic of the region.

  14. Examining phylogenetic relationships of Erwinia and Pantoea species using whole genome sequence data.

    Science.gov (United States)

    Zhang, Yucheng; Qiu, Sai

    2015-11-01

    The genera Erwinia and Pantoea contain species that are devastating plant pathogens, non-pathogen epiphytes, and opportunistic human pathogens. However, some controversies persist in the taxonomic classification of these two closely related genera. The phylogenomic analysis of these two genera was investigated via a comprehensive analysis of 25 Erwinia genomes and 23 Pantoea genomes. Single-copy orthologs could be extracted from the Erwinia/Pantoea core-genome to reconstruct the Erwinia/Pantoea phylogeny. This tree has strong bootstrap support for almost all branches. We also estimated the in silico DNA-DNA hybridization (isDDH) and the average nucleotide identity (ANI) values between each genome; strains from the same species showed ANI values ≥96% and isDDH values >70%. These data confirm that whole genome sequence data provides a powerful tool to resolve the complex taxonomic questions of Erwinia/Pantoea, e.g. Pantoea agglomerans 299R was not clustered into a single group with other P. agglomerans strains, and the ANI values and isDDH values between them were Erwinia/Pantoea phylogeny.

  15. Whole-genome copy number variation analysis in anophthalmia and microphthalmia.

    Science.gov (United States)

    Schilter, K F; Reis, L M; Schneider, A; Bardakjian, T M; Abdul-Rahman, O; Kozel, B A; Zimmerman, H H; Broeckel, U; Semina, E V

    2013-11-01

    Anophthalmia/microphthalmia (A/M) represent severe developmental ocular malformations. Currently, mutations in known genes explain less than 40% of A/M cases. We performed whole-genome copy number variation analysis in 60 patients affected with isolated or syndromic A/M. Pathogenic deletions of 3q26 (SOX2) were identified in four independent patients with syndromic microphthalmia. Other variants of interest included regions with a known role in human disease (likely pathogenic) as well as novel rearrangements (uncertain significance). A 2.2-Mb duplication of 3q29 in a patient with non-syndromic anophthalmia and an 877-kb duplication of 11p13 (PAX6) and a 1.4-Mb deletion of 17q11.2 (NF1) in two independent probands with syndromic microphthalmia and other ocular defects were identified; while ocular anomalies have been previously associated with 3q29 duplications, PAX6 duplications, and NF1 mutations in some cases, the ocular phenotypes observed here are more severe than previously reported. Three novel regions of possible interest included a 2q14.2 duplication which cosegregated with microphthalmia/microcornea and congenital cataracts in one family, and 2q21 and 15q26 duplications in two additional cases; each of these regions contains genes that are active during vertebrate ocular development. Overall, this study identified causative copy number mutations and regions with a possible role in ocular disease in 17% of A/M cases.

  16. Use of whole genome shotgun metagenomics: a practical guide for the microbiome-minded physician scientist.

    Science.gov (United States)

    Ma, Jun; Prince, Amanda; Aagaard, Kjersti M

    2014-01-01

    Whole genome shotgun sequencing (WGS) has been increasingly recognized as the most comprehensive and robust approach for metagenomics research. When compared with 16S-based metagenomics, it offers the advantage of identification of species level taxonomy and the estimation of metabolic pathway activities from human and environmental samples. Several large-scale metagenomic projects have been recently conducted or are currently underway utilizing WGS. With the generation of vast amounts of data, the bioinformatics and computational analysis of WGS results become vital for the success of a metagenomics study. However, each step in the WGS data analysis, including metagenome assembly, gene prediction, taxonomy identification, function annotation, and pathway analysis, is complicated by the shear amount of data. Algorithms and tools have been developed specifically to handle WGS-generated metagenomics data with the hope of reducing the requirement on computational time and storage space. Here, we present an overview of the current state of metagenomics through WGS sequencing, challenges frequently encountered, and up-to-date solutions. Several applications that are uniquely applicable to microbiome studies in reproductive and perinatal medicine are also discussed.

  17. Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic

    Directory of Open Access Journals (Sweden)

    Samantha B. Foley

    2015-01-01

    Full Text Available Despite the potential of whole-genome sequencing (WGS to improve patient diagnosis and care, the empirical value of WGS in the cancer genetics clinic is unknown. We performed WGS on members of two cohorts of cancer genetics patients: those with BRCA1/2 mutations (n = 176 and those without (n = 82. Initial analysis of potentially pathogenic variants (PPVs, defined as nonsynonymous variants with allele frequency < 1% in ESP6500 in 163 clinically-relevant genes suggested that WGS will provide useful clinical results. This is despite the fact that a majority of PPVs were novel missense variants likely to be classified as variants of unknown significance (VUS. Furthermore, previously reported pathogenic missense variants did not always associate with their predicted diseases in our patients. This suggests that the clinical use of WGS will require large-scale efforts to consolidate WGS and patient data to improve accuracy of interpretation of rare variants. While loss-of-function (LoF variants represented only a small fraction of PPVs, WGS identified additional cancer risk LoF PPVs in patients with known BRCA1/2 mutations and led to cancer risk diagnoses in 21% of non-BRCA cancer genetics patients after expanding our analysis to 3209 ClinVar genes. These data illustrate how WGS can be used to improve our ability to discover patients' cancer genetic risks.

  18. Whole genome association study of rheumatoid arthritis using 27 039 microsatellites.

    Science.gov (United States)

    Tamiya, Gen; Shinya, Minori; Imanishi, Tadashi; Ikuta, Tomoki; Makino, Satoshi; Okamoto, Koichi; Furugaki, Koh; Matsumoto, Toshiko; Mano, Shuhei; Ando, Satoshi; Nozaki, Yasuyuki; Yukawa, Wataru; Nakashige, Ryo; Yamaguchi, Daisuke; Ishibashi, Hideo; Yonekura, Manabu; Nakami, Yuu; Takayama, Seiken; Endo, Takaho; Saruwatari, Takuya; Yagura, Masaru; Yoshikawa, Yoko; Fujimoto, Kei; Oka, Akira; Chiku, Suenori; Linsen, Samuel E V; Giphart, Marius J; Kulski, Jerzy K; Fukazawa, Toru; Hashimoto, Hiroshi; Kimura, Minoru; Hoshina, Yuuichi; Suzuki, Yasuo; Hotta, Tomomitsu; Mochida, Joji; Minezaki, Takatoshi; Komai, Koichiro; Shiozawa, Shunichi; Taniguchi, Atsuo; Yamanaka, Hisashi; Kamatani, Naoyuki; Gojobori, Takashi; Bahram, Seiamak; Inoko, Hidetoshi

    2005-08-15

    A major goal of current human genome-wide studies is to identify the genetic basis of complex disorders. However, the availability of an unbiased, reliable, cost efficient and comprehensive methodology to analyze the entire genome for complex disease association is still largely lacking or problematic. Therefore, we have developed a practical and efficient strategy for whole genome association studies of complex diseases by charting the human genome at 100 kb intervals using a collection of 27,039 microsatellites and the DNA pooling method in three successive genomic screens of independent case-control populations. The final step in our methodology consists of fine mapping of the candidate susceptible DNA regions by single nucleotide polymorphisms (SNPs) analysis. This approach was validated upon application to rheumatoid arthritis, a destructive joint disease affecting up to 1% of the population. A total of 47 candidate regions were identified. The top seven loci, withstanding the most stringent statistical tests, were dissected down to individual genes and/or SNPs on four chromosomes, including the previously known 6p21.3-encoded Major Histocompatibility Complex gene, HLA-DRB1. Hence, microsatellite-based genome-wide association analysis complemented by end stage SNP typing provides a new tool for genetic dissection of multifactorial pathologies including common diseases.

  19. Whole genome sequence of Staphylococcus saprophyticus reveals the pathogenesis of uncomplicated urinary tract infection.

    Science.gov (United States)

    Kuroda, Makoto; Yamashita, Atsushi; Hirakawa, Hideki; Kumano, Miyuki; Morikawa, Kazuya; Higashide, Masato; Maruyama, Atsushi; Inose, Yumiko; Matoba, Kimio; Toh, Hidehiro; Kuhara, Satoru; Hattori, Masahira; Ohta, Toshiko

    2005-09-13

    Staphylococcus saprophyticus is a uropathogenic Staphylococcus frequently isolated from young female outpatients presenting with uncomplicated urinary tract infections. We sequenced the whole genome of S. saprophyticus type strain ATCC 15305, which harbors a circular chromosome of 2,516,575 bp with 2,446 ORFs and two plasmids. Comparative genomic analyses with the strains of two other species, Staphylococcus aureus and Staphylococcus epidermidis, as well as experimental data, revealed the following characteristics of the S. saprophyticus genome. S. saprophyticus does not possess any virulence factors found in S. aureus, such as coagulase, enterotoxins, exoenzymes, and extracellular matrix-binding proteins, although it does have a remarkable paralog expansion of transport systems related to highly variable ion contents in the urinary environment. A further unique feature is that only a single ORF is predictable as a cell wall-anchored protein, and it shows positive hemagglutination and adherence to human bladder cell associated with initial colonization in the urinary tract. It also shows significantly high urease activity in S. saprophyticus. The uropathogenicity of S. saprophyticus can be attributed to its genome that is needed for its survival in the human urinary tract by means of novel cell wall-anchored adhesin and redundant uro-adaptive transport systems, together with urease.

  20. Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units

    Science.gov (United States)

    Saunders, Carol Jean; Miller, Neil Andrew; Soden, Sarah Elizabeth; Dinwiddie, Darrell Lee; Noll, Aaron; Alnadi, Noor Abu; Andraws, Nevene; Patterson, Melanie LeAnn; Krivohlavek, Lisa Ann; Fellis, Joel; Humphray, Sean; Saffrey, Peter; Kingsbury, Zoya; Weir, Jacqueline Claire; Betley, Jason; Grocock, Russell James; Margulies, Elliott Harrison; Farrow, Emily Gwendolyn; Artman, Michael; Safina, Nicole Pauline; Petrikin, Joshua Erin; Hall, Kevin Peter; Kingsmore, Stephen Francis

    2014-01-01

    Monogenic diseases are frequent causes of neonatal morbidity and mortality, and disease presentations are often undifferentiated at birth. More than 3500 monogenic diseases have been characterized, but clinical testing is available for only some of them and many feature clinical and genetic heterogeneity. Hence, an immense unmet need exists for improved molecular diagnosis in infants. Because disease progression is extremely rapid, albeit heterogeneous, in newborns, molecular diagnoses must occur quickly to be relevant for clinical decision-making. We describe 50-hour differential diagnosis of genetic disorders by whole-genome sequencing (WGS) that features automated bioinformatic analysis and is intended to be a prototype for use in neonatal intensive care units. Retrospective 50-hour WGS identified known molecular diagnoses in two children. Prospective WGS disclosed potential molecular diagnosis of a severe GJB2-related skin disease in one neonate; BRAT1-related lethal neonatal rigidity and multifocal seizure syndrome in another infant; identified BCL9L as a novel, recessive visceral heterotaxy gene (HTX6) in a pedigree; and ruled out known candidate genes in one infant. Sequencing of parents or affected siblings expedited the identification of disease genes in prospective cases. Thus, rapid WGS can potentially broaden and foreshorten differential diagnosis, resulting in fewer empirical treatments and faster progression to genetic and prognostic counseling. PMID:23035047

  1. Single Cell Analysis of Dystrophin and SRY Gene by Using Whole Genome Amplification

    Institute of Scientific and Technical Information of China (English)

    徐晨明; 金帆; 黄荷凤; 陶冶; 叶英辉

    2001-01-01

    Objective To develop a reliable and sensitive method for detection of sex and multiloci of Duchenne muscular dystrophy (DMD) gene in single cell Materials & methods Whole genome of single cell were amplified by using 15-base random primers (primer extension preamplification, PEP), then a small aliquot of PEP product were analyzed by using locus-specific nest PCR amplification. The procedure was evaluated by detection dystrophin exons 8, 17, 19, 44, 45, 48 and human testis-determining gene (SRY)in single lymphocytes from known sources and single blastomeres from the couples with no family history of DMD.Results The amplification efficiency rate of six dystrophin exons from single lymphocytes and single blastomeres were 97. 2% (175/180) and 100% (60/60) respectively.Results of SRY showed that 100% (15/15) amplification in single male-derived lymphocytes and 0% (0/15) amplification in single female-derived lymphocytes. Conclusion The technique of single cell PEP-nest PCR for dystrophin exons 8, 17,19, 44, 45, 48 and SRY is highly specifc. PEP-nest PCR is suitable for Preimplantation genetic diagnosis (PGD) of DMD at single cell level.

  2. Construction and Evaluation of Desulfovibrio vulgaris Whole-Genome Oligonucleotide Microarrays

    Energy Technology Data Exchange (ETDEWEB)

    Z. He; Q. He; L. Wu; M.E. Clark; J.D. Wall; Jizhong Zhou; Matthew W. Fields

    2004-03-17

    Desulfovibrio vulgaris Hildenborough has been the focus of biochemical and physiological studies in the laboratory, and the metabolic versatility of this organism has been largely recognized, particularly the reduction of sulfate, fumarate, iron, uranium and chromium. In addition, a Desulfovibrio sp. has been shown to utilize uranium as the sole electron acceptor. D. vulgaris is a d-Proteobacterium with a genome size of 3.6 Mb and 3584 ORFs. The whole-genome microarrays of D. vulgaris have been constructed using 70mer oligonucleotides. All ORFs in the genome were represented with 3471 (97.1%) unique probes and 103 (2.9%) non-specific probes that may have cross-hybridization with other ORFs. In preparation for use of the experimental microarrays, artificial probes and targets were designed to assess specificity and sensitivity and identify optimal hybridization conditions for oligonucleotide microarrays. The results indicated that for 50mer and 70mer oligonucleotide arrays, hybridization at 45 C to 50 C, washing at 37 C and a wash time of 2.5 to 5 minutes obtained specific and strong hybridization signals. In order to evaluate the performance of the experimental microarrays, growth conditions were selected that were expected to give significant hybridization differences for different sets of genes. The initial evaluations were performed using D. vulgaris cells grown at logarithmic and stationary phases. Transcriptional analysis of D. vulgaris cells sampled during logarithmic phase growth indicated that 25% of annotated ORFs were up-regulated and 3% of annotated ORFs were downregulated compared to stationary phase cells. The up-regulated genes included ORFs predicted to be involved with acyl chain biosynthesis, amino acid ABC transporter, translational initiation factors, and ribosomal proteins. In the stationary phase growth cells, the two most up-regulated ORFs (70-fold) were annotated as a carboxynorspermidine decarboxylase and a 2C-methyl-D-erythritol-2

  3. Genomic analysis of the basal lineage fungus Rhizopus oryzae reveals a whole-genome duplication.

    Directory of Open Access Journals (Sweden)

    Li-Jun Ma

    2009-07-01

    Full Text Available Rhizopus oryzae is the primary cause of mucormycosis, an emerging, life-threatening infection characterized by rapid angioinvasive growth with an overall mortality rate that exceeds 50%. As a representative of the paraphyletic basal group of the fungal kingdom called "zygomycetes," R. oryzae is also used as a model to study fungal evolution. Here we report the genome sequence of R. oryzae strain 99-880, isolated from a fatal case of mucormycosis. The highly repetitive 45.3 Mb genome assembly contains abundant transposable elements (TEs, comprising approximately 20% of the genome. We predicted 13,895 protein-coding genes not overlapping TEs, many of which are paralogous gene pairs. The order and genomic arrangement of the duplicated gene pairs and their common phylogenetic origin provide evidence for an ancestral whole-genome duplication (WGD event. The WGD resulted in the duplication of nearly all subunits of the protein complexes associated with respiratory electron transport chains, the V-ATPase, and the ubiquitin-proteasome systems. The WGD, together with recent gene duplications, resulted in the expansion of multiple gene families related to cell growth and signal transduction, as well as secreted aspartic protease and subtilase protein families, which are known fungal virulence factors. The duplication of the ergosterol biosynthetic pathway, especially the major azole target, lanosterol 14alpha-demethylase (ERG11, could contribute to the variable responses of R. oryzae to different azole drugs, including voriconazole and posaconazole. Expanded families of cell-wall synthesis enzymes, essential for fungal cell integrity but absent in mammalian hosts, reveal potential targets for novel and R. oryzae-specific diagnostic and therapeutic treatments.

  4. Clinical application of whole-genome sequencing to inform treatment for multidrug-resistant tuberculosis cases.

    Science.gov (United States)

    Witney, Adam A; Gould, Katherine A; Arnold, Amber; Coleman, David; Delgado, Rachel; Dhillon, Jasvir; Pond, Marcus J; Pope, Cassie F; Planche, Tim D; Stoker, Neil G; Cosgrove, Catherine A; Butcher, Philip D; Harrison, Thomas S; Hinds, Jason

    2015-05-01

    The treatment of drug-resistant tuberculosis cases is challenging, as drug options are limited, and the existing diagnostics are inadequate. Whole-genome sequencing (WGS) has been used in a clinical setting to investigate six cases of suspected extensively drug-resistant Mycobacterium tuberculosis (XDR-TB) encountered at a London teaching hospital between 2008 and 2014. Sixteen isolates from six suspected XDR-TB cases were sequenced; five cases were analyzed in a clinically relevant time frame, with one case sequenced retrospectively. WGS identified mutations in the M. tuberculosis genes associated with antibiotic resistance that are likely to be responsible for the phenotypic resistance. Thus, an evidence base was developed to inform the clinical decisions made around antibiotic treatment over prolonged periods. All strains in this study belonged to the East Asian (Beijing) lineage, and the strain relatedness was consistent with the expectations from the case histories, confirming one contact transmission event. We demonstrate that WGS data can be produced in a clinically relevant time scale some weeks before drug sensitivity testing (DST) data are available, and they actively help clinical decision-making through the assessment of whether an isolate (i) has a particular resistance mutation where there are absent or contradictory DST results, (ii) has no further resistance markers and therefore is unlikely to be XDR, or (iii) is identical to an isolate of known resistance (i.e., a likely transmission event). A small number of discrepancies between the genotypic predictions and phenotypic DST results are discussed in the wider context of the interpretation and reporting of WGS results.

  5. Whole genome evaluation of horizontal transfers in the pathogenic fungus Aspergillus fumigatus

    Directory of Open Access Journals (Sweden)

    Deschavanne Patrick

    2010-03-01

    Full Text Available Abstract Background Numerous cases of horizontal transfers (HTs have been described for eukaryote genomes, but in contrast to prokaryote genomes, no whole genome evaluation of HTs has been carried out. This is mainly due to a lack of parametric methods specially designed to take the intrinsic heterogeneity of eukaryote genomes into account. We applied a simple and tested method based on local variations of genomic signatures to analyze the genome of the pathogenic fungus Aspergillus fumigatus. Results We detected 189 atypical regions containing 214 genes, accounting for about 1 Mb of DNA sequences. However, the fraction of atypical DNA detected was smaller than the average amount detected in the same conditions in prokaryote genomes (3.1% vs 5.6%. It appeared that about one third of these regions contained no annotated genes, a proportion far greater than in prokaryote genomes. When analyzing the origin of these HTs by comparing their signatures to a home made database of species signatures, 3 groups of donor species emerged: bacteria (40%, fungi (25%, and viruses (22%. It is to be noticed that though inter-domain exchanges are confirmed, we only put in evidence very few exchanges between eukaryotic kingdoms. Conclusions In conclusion, we demonstrated that HTs are not negligible in eukaryote genomes, bearing in mind that in our stringent conditions this amount is a floor value, though of a lesser extent than in prokaryote genomes. The biological mechanisms underlying those transfers remain to be elucidated as well as the biological functions of the transferred genes.

  6. Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata.

    Directory of Open Access Journals (Sweden)

    Marco Fracassetti

    Full Text Available Sequencing pooled DNA of multiple individuals from a population instead of sequencing individuals separately has become popular due to its cost-effectiveness and simple wet-lab protocol, although some criticism of this approach remains. Here we validated a protocol for pooled whole-genome re-sequencing (Pool-seq of Arabidopsis lyrata libraries prepared with low amounts of DNA (1.6 ng per individual. The validation was based on comparing single nucleotide polymorphism (SNP frequencies obtained by pooling with those obtained by individual-based Genotyping By Sequencing (GBS. Furthermore, we investigated the effect of sample number, sequencing depth per individual and variant caller on population SNP frequency estimates. For Pool-seq data, we compared frequency estimates from two SNP callers, VarScan and Snape; the former employs a frequentist SNP calling approach while the latter uses a Bayesian approach. Results revealed concordance correlation coefficients well above 0.8, confirming that Pool-seq is a valid method for acquiring population-level SNP frequency data. Higher accuracy was achieved by pooling more samples (25 compared to 14 and working with higher sequencing depth (4.1× per individual compared to 1.4× per individual, which increased the concordance correlation coefficient to 0.955. The Bayesian-based SNP caller produced somewhat higher concordance correlation coefficients, particularly at low sequencing depth. We recommend pooling at least 25 individuals combined with sequencing at a depth of 100× to produce satisfactory frequency estimates for common SNPs (minor allele frequency above 0.05.

  7. Is gene activity in plant cells affected by UMTS-irradiation? A whole genome approach

    Directory of Open Access Journals (Sweden)

    Julia C Engelmann

    2008-10-01

    Full Text Available Julia C Engelmann3,* Rosalia Deeken1,* Tobias Müller3, Günter Nimtz2, M Rob G Roelfsema1, Rainer Hedrich11Molecular Plant Physiology and Biophysics, Julius-von-Sachs Institute for Biosciences; 2Institute of Physics II, University of Cologne, Cologne, Germany; 3Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany; *These authors contributed equally to this workAbstract: Mobile phone technology makes use of radio frequency (RF electromagnetic fields transmitted through a dense network of base stations in Europe. Possible harmful effects of RF fields on humans and animals are discussed, but their effect on plants has received little attention. In search for physiological processes of plant cells sensitive to RF fields, cell suspension cultures of Arabidopsis thaliana were exposed for 24 h to a RF field protocol representing typical microwave exposition in an urban environment. mRNA of exposed cultures and controls was used to hybridize Affymetrix-ATH1 whole genome microarrays. Differential expression analysis revealed significant changes in transcription of 10 genes, but they did not exceed a fold change of 2.5. Besides that 3 of them are dark-inducible, their functions do not point to any known responses of plants to environmental stimuli. The changes in transcription of these genes were compared with published microarray datasets and revealed a weak similarity of the microwave to light treatment experiments. Considering the large changes described in published experiments, it is questionable if the small alterations caused by a 24 h continuous microwave exposure would have any impact on the growth and reproduction of whole plants.Keywords: suspension cultured plant cells, radio frequency electromagnetic fields, microarrays, Arabidopsis thaliana

  8. Microbiota present in cystic fibrosis lungs as revealed by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Philippe M Hauser

    Full Text Available Determination of the precise composition and variation of microbiota in cystic fibrosis lungs is crucial since chronic inflammation due to microorganisms leads to lung damage and ultimately, death. However, this constitutes a major technical challenge. Culturing of microorganisms does not provide a complete representation of a microbiota, even when using culturomics (high-throughput culture. So far, only PCR-based metagenomics have been investigated. However, these methods are biased towards certain microbial groups, and suffer from uncertain quantification of the different microbial domains. We have explored whole genome sequencing (WGS using the Illumina high-throughput technology applied directly to DNA extracted from sputa obtained from two cystic fibrosis patients. To detect all microorganism groups, we used four procedures for DNA extraction, each with a different lysis protocol. We avoided biases due to whole DNA amplification thanks to the high efficiency of current Illumina technology. Phylogenomic classification of the reads by three different methods produced similar results. Our results suggest that WGS provides, in a single analysis, a better qualitative and quantitative assessment of microbiota compositions than cultures and PCRs. WGS identified a high quantity of Haemophilus spp. (patient 1 or Staphylococcus spp. plus Streptococcus spp. (patient 2 together with low amounts of anaerobic (Veillonella, Prevotella, Fusobacterium and aerobic bacteria (Gemella, Moraxella, Granulicatella. WGS suggested that fungal members represented very low proportions of the microbiota, which were detected by cultures and PCRs because of their selectivity. The future increase of reads' sizes and decrease in cost should ensure the usefulness of WGS for the characterisation of microbiota.

  9. Colorectal Cancer and the Human Gut Microbiome: Reproducibility with Whole-Genome Shotgun Sequencing.

    Directory of Open Access Journals (Sweden)

    Emily Vogtmann

    Full Text Available Accumulating evidence indicates that the gut microbiota affects colorectal cancer development, but previous studies have varied in population, technical methods, and associations with cancer. Understanding these variations is needed for comparisons and for potential pooling across studies. Therefore, we performed whole-genome shotgun sequencing on fecal samples from 52 pre-treatment colorectal cancer cases and 52 matched controls from Washington, DC. We compared findings from a previously published 16S rRNA study to the metagenomics-derived taxonomy within the same population. In addition, metagenome-predicted genes, modules, and pathways in the Washington, DC cases and controls were compared to cases and controls recruited in France whose specimens were processed using the same platform. Associations between the presence of fecal Fusobacteria, Fusobacterium, and Porphyromonas with colorectal cancer detected by 16S rRNA were reproduced by metagenomics, whereas higher relative abundance of Clostridia in cancer cases based on 16S rRNA was merely borderline based on metagenomics. This demonstrated that within the same sample set, most, but not all taxonomic associations were seen with both methods. Considering significant cancer associations with the relative abundance of genes, modules, and pathways in a recently published French metagenomics dataset, statistically significant associations in the Washington, DC population were detected for four out of 10 genes, three out of nine modules, and seven out of 17 pathways. In total, colorectal cancer status in the Washington, DC study was associated with 39% of the metagenome-predicted genes, modules, and pathways identified in the French study. More within and between population comparisons are needed to identify sources of variation and disease associations that can be reproduced despite these variations. Future studies should have larger sample sizes or pool data across studies to have sufficient

  10. Utilization of touch preparations and whole genome amplification for loss of heterozygosity analysis in prostate cancer

    Energy Technology Data Exchange (ETDEWEB)

    Wick, M.J.; Halling, K.; Thibodeau, S.N. [Mayo Clinic and Foundation, Rochester, MN (United States)

    1994-09-01

    Loss of heterozygosity (LOH) analyses have been used extensively to identify tumor suppressor genes in a variety of tumor systems. In an effort to localize such genes in prostate cancer, we have examined tissue for LOH with the use of PCR-based assays for a variety of microsatellites. However, the highly infiltrative nature of prostate carcinoma makes it virtually impossible, by conventional methods, to obtain tumor DNA that is uncontaminated with DNA from normal cells. Thus, we have examined the use of touch preparations as a means to increase the percentage of tumor DNA for our LOH analyses. This method, which involves lightly touching the cut surface of fresh prostate tissue to the surface of a microscope slide, allows for selection of tumor cell clusters. DNA from these cells can then be used in a variety of PCR-based assays. In this study, we demonstrate that tumor cell clusters can be used effectively for LOH analysis. Our studies also demonstrate that use of the touch preparation technique reduces or eliminates normal cell contamination. However, the small quantity of DNA in these clusters prohibits analysis at multiple loci. Therefore, we have examined whole genome amplification (WGA) of tumor cells clusters as a method of avoiding this difficulty. Random 15 base oligonucleotides were used as primers for WGA of cell cluster DNA. Aliquots of the WGA were then subjected to a second round of PCR in which microsatellite markers demonstrating allelic loss in prostate cancer were amplified. Our studies indicate that analysis of limited quantities of prostate tumor DNA at multiple loci can be accomplished through coupling of the touch preparation technique with WGA. This method may have ramifications for the analysis of tissue in which procurement of sufficient quantities of DNA is difficult.

  11. Whole Genome Sequencing for Surveillance of Antimicrobial Resistance in Actinobacillus pleuropneumoniae

    Science.gov (United States)

    Bossé, Janine T.; Li, Yanwen; Rogers, Jon; Fernandez Crespo, Roberto; Li, Yinghui; Chaudhuri, Roy R.; Holden, Matthew T. G.; Maskell, Duncan J.; Tucker, Alexander W.; Wren, Brendan W.; Rycroft, Andrew N.; Langford, Paul R.

    2017-01-01

    The aim of this study was to evaluate the correlation between antimicrobial resistance (AMR) profiles of 96 clinical isolates of Actinobacillus pleuropneumoniae, an important porcine respiratory pathogen, and the identification of AMR genes in whole genome sequence (wgs) data. Susceptibility of the isolates to nine antimicrobial agents (ampicillin, enrofloxacin, erythromycin, florfenicol, sulfisoxazole, tetracycline, tilmicosin, trimethoprim, and tylosin) was determined by agar dilution susceptibility test. Except for the macrolides tested, elevated MICs were highly correlated to the presence of AMR genes identified in wgs data using ResFinder or BLASTn. Of the isolates tested, 57% were resistant to tetracycline [MIC ≥ 4 mg/L; 94.8% with either tet(B) or tet(H)]; 48% to sulfisoxazole (MIC ≥ 256 mg/L or DD = 6; 100% with sul2), 20% to ampicillin (MIC ≥ 4 mg/L; 100% with blaROB-1), 17% to trimethoprim (MIC ≥ 32 mg/L; 100% with dfrA14), and 6% to enrofloxacin (MIC ≥ 0.25 mg/L; 100% with GyrAS83F). Only 33% of the isolates did not have detectable AMR genes, and were sensitive by MICs for the antimicrobial agents tested. Although 23 isolates had MIC ≥ 32 mg/L for tylosin, all isolates had MIC ≤ 16 mg/L for both erythromycin and tilmicosin, and no macrolide resistance genes or known point mutations were detected. Other than the GyrAS83F mutation, the AMR genes detected were mapped to potential plasmids. In addition to presence on plasmid(s), the tet(B) gene was also found chromosomally either as part of a 56 kb integrative conjugative element (ICEApl1) in 21, or as part of a Tn7 insertion in 15 isolates. Our results indicate that, with the exception of macrolides, wgs data can be used to accurately predict resistance of A. pleuropneumoniae to the tested antimicrobial agents and provides added value for routine surveillance.

  12. Whole Genome Duplications Shaped the Receptor Tyrosine Kinase Repertoire of Jawed Vertebrates.

    Science.gov (United States)

    Brunet, Frédéric G; Volff, Jean-Nicolas; Schartl, Manfred

    2016-06-03

    The receptor tyrosine kinase (RTK) gene family, involved primarily in cell growth and differentiation, comprises proteins with a common enzymatic tyrosine kinase intracellular domain adjacent to a transmembrane region. The amino-terminal portion of RTKs is extracellular and made of different domains, the combination of which characterizes each of the 20 RTK subfamilies among mammals. We analyzed a total of 7,376 RTK sequences among 143 vertebrate species to provide here the first comprehensive census of the jawed vertebrate repertoire. We ascertained the 58 genes previously described in the human and mouse genomes and established their phylogenetic relationships. We also identified five additional RTKs amounting to a total of 63 genes in jawed vertebrates. We found that the vertebrate RTK gene family has been shaped by the two successive rounds of whole genome duplications (WGD) called 1R and 2R (1R/2R) that occurred at the base of the vertebrates. In addition, the Vegfr and Ephrin receptor subfamilies were expanded by single gene duplications. In teleost fish, 23 additional RTK genes have been retained after another expansion through the fish-specific third round (3R) of WGD. Several lineage-specific gene losses were observed. For instance, birds have lost three RTKs, and different genes are missing in several fish sublineages. The RTK gene family presents an unusual high gene retention rate from the vertebrate WGDs (58.75% after 1R/2R, 64.4% after 3R), resulting in an expansion that might be correlated with the evolution of complexity of vertebrate cellular communication and intracellular signaling.

  13. Whole genome scan to detect quantitative trait loci for bovine milk protein composition.

    Science.gov (United States)

    Schopen, G C B; Koks, P D; van Arendonk, J A M; Bovenhuis, H; Visker, M H P W

    2009-08-01

    The objective of this study was to perform a whole genome scan to detect quantitative trait loci (QTL) for milk protein composition in 849 Holstein-Friesian cows originating from seven sires. One morning milk sample was analysed for the major milk proteins using capillary zone electrophoresis. A genetic map was constructed with 1341 single nucleotide polymorphisms, covering 2829 centimorgans (cM) and 95% of the cattle genome. The chromosomal regions most significantly related to milk protein composition (P(genome) casein, alpha(S2)-casein, beta-casein and kappa-casein. The QTL on BTA11 was found at 124 cM, and affected beta-lactoglobulin, and the QTL on BTA14 was found at 0 cM, and affected protein percentage. The proportion of phenotypic variance explained by the QTL was 3.6% for beta-casein and 7.9% for kappa-casein on BTA6, 28.3% for beta-lactoglobulin on BTA11, and 8.6% for protein percentage on BTA14. The QTL affecting alpha(S2)-casein on BTA6 and 17 showed a significant interaction. We investigated the extent to which the detected QTL affecting milk protein composition could be explained by known polymorphisms in beta-casein, kappa-casein, beta-lactoglobulin and DGAT1 genes. Correction for these polymorphisms decreased the proportion of phenotypic variance explained by the QTL previously found on BTA6, 11 and 14. Thus, several significant QTL affecting milk protein composition were found, of which some QTL could partially be explained by polymorphisms in milk protein genes.

  14. Whole Genome Analysis of Leptospira licerasiae Provides Insight into Leptospiral Evolution and Pathogenicity

    Science.gov (United States)

    Selengut, Jeremy D.; Harkins, Derek M.; Patra, Kailash P.; Moreno, Angelo; Lehmann, Jason S.; Purushe, Janaki; Sanka, Ravi; Torres, Michael; Webster, Nicholas J.; Vinetz, Joseph M.; Matthias, Michael A.

    2012-01-01

    The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835) provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae) that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT). Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for its infectiousness

  15. Whole-genome synthesis and characterization of viable S13-like bacteriophages.

    Directory of Open Access Journals (Sweden)

    Yuchen Liu

    Full Text Available BACKGROUND: Unprecedented progresses in high-throughput DNA sequencing and de novo gene synthesis technologies have allowed us to create living organisms in the absence of natural template. METHODOLOGY/PRINCIPAL FINDINGS: The sequence of wild-type S13 phage genome was downloaded from GenBank. Two synonymous mutations were introduced into wt-S13 genome to generate m1-S13 genome. Another mutant, m2-S13 genome, was obtained by engineering two nonsynonymous mutations in the capsid protein coding region of wt-S13 genome. A chimeric phage genome was designed by replacing the F capsid protein open reading frame (ORF from phage S13 with the F capsid protein ORF from phage G4. The whole genomes of all four phages were assembled from a series of chemically synthesized short overlapping oligonucleotides. The linear synthesized genomes were circularized and electroporated into E.coli C, the standard laboratory host of S13 phage. All four phages were recovered and plaques were visualized. The results of sequencing showed the accuracy of these synthetic genomes. The synthetic phages were capable of lysing their bacterial host and tolerating general environmental conditions. While no phenotypic differences among the variant strains were observed when grown in LB medium with CaCl(2, the S13/G4 chimera was found to be much more sensitive to the absence of calcium and to have a lower adsorption rate under calcium free condition. CONCLUSIONS/SIGNIFICANCE: The bacteriophage S13 and its variants can be chemically synthesized. The major capsid gene of phage G4 is functional in the phage S13 life cycle. These results support an evolutional hypothesis which has been proposed that a homologous recombination event involving gene F of quite divergent ancestral lineages should be included in the history of the microvirid family.

  16. Insight into Shiga toxin genes encoded by Escherichia coli O157 from whole genome sequencing

    Directory of Open Access Journals (Sweden)

    Philip M. Ashton

    2015-02-01

    Full Text Available The ability of Shiga toxin-producing Escherichia coli (STEC to cause severe illness in humans is determined by multiple host factors and bacterial characteristics, including Shiga toxin (Stx subtype. Given the link between Stx2a subtype and disease severity, we sought to identify the stx subtypes present in whole genome sequences (WGS of 444 isolates of STEC O157. Difficulties in assembling the stx genes in some strains were overcome by using two complementary bioinformatics methods: mapping and de novo assembly. We compared the WGS analysis with the results obtained using a PCR approach and investigated the diversity within and between the subtypes. All strains of STEC O157 in this study had stx1a, stx2a or stx2c or a combination of these three genes. There was over 99% (442/444 concordance between PCR and WGS. When common source strains were excluded, 236/349 strains of STEC O157 had multiple copies of different Stx subtypes and 54 had multiple copies of the same Stx subtype. Of those strains harbouring multiple copies of the same Stx subtype, 33 had variants between the alleles while 21 had identical copies. Strains harbouring Stx2a only were most commonly found to have multiple alleles of the same subtype (42%. Both the PCR and WGS approach to stx subtyping provided a good level of sensitivity and specificity. In addition, the WGS data also showed there were a significant proportion of strains harbouring multiple alleles of the same Stx subtype associated with clinical disease in England.

  17. Genome-Wide Association Study of HIV Whole Genome Sequences Validated using Drug Resistance

    Science.gov (United States)

    Power, Robert A.; Davaniah, Siva; Derache, Anne; Wilkinson, Eduan; Tanser, Frank; Pillay, Deenan; de Oliveira, Tulio

    2016-01-01

    Background Genome-wide association studies (GWAS) have considerably advanced our understanding of human traits and diseases. With the increasing availability of whole genome sequences (WGS) for pathogens, it is important to establish whether GWAS of viral genomes could reveal important biological insights. Here we perform the first proof of concept viral GWAS examining drug resistance (DR), a phenotype with well understood genetics. Method We performed a GWAS of DR in a sample of 343 HIV subtype C patients failing 1st line antiretroviral treatment in rural KwaZulu-Natal, South Africa. The majority and minority variants within each sequence were called using PILON, and GWAS was performed within PLINK. HIV WGS from patients failing on different antiretroviral treatments were compared to sequences derived from individuals naïve to the respective treatment. Results GWAS methodology was validated by identifying five associations on a genetic level that led to amino acid changes known to cause DR. Further, we highlighted the ability of GWAS to identify epistatic effects, identifying two replicable variants within amino acid 68 of the reverse transcriptase protein previously described as potential fitness compensatory mutations. A possible additional DR variant within amino acid 91 of the matrix region of the Gag protein was associated with tenofovir failure, highlighting GWAS’s ability to identify variants outside classical candidate genes. Our results also suggest a polygenic component to DR. Conclusions These results validate the applicability of GWAS to HIV WGS data even in relative small samples, and emphasise how high throughput sequencing can provide novel and clinically relevant insights. Further they suggested that for viruses like HIV, population structure was only minor concern compared to that seen in bacteria or parasite GWAS. Given the small genome length and reduced burden for multiple testing, this makes HIV an ideal candidate for GWAS. PMID:27677172

  18. Whole genome analysis of Mycobacterium tuberculosis isolates from recurrent episodes of tuberculosis, Finland, 1995-2013.

    Science.gov (United States)

    Korhonen, V; Smit, P W; Haanperä, M; Casali, N; Ruutu, P; Vasankari, T; Soini, H

    2016-06-01

    Recurrent tuberculosis (TB) is caused by an endogenous re-activation of the same strain of Mycobacterium tuberculosis (relapse) or exogenous infection with a new strain (re-infection). Recurrence of TB in Finland was analysed in a population-based, 19-year study, and genotyping was used to define relapse and re-infection. The M. tuberculosis isolates from patients with suspected relapse were further analysed by whole genome sequencing (WGS) to determine the number and type of mutations occurring in the bacterial genome between the first and second disease episodes. In addition, publicly available tools (PhyResSE and SpolPred) were used to predict drug resistance and spoligotype profile from the WGS data. Of the 8299 notified TB cases, 48 (0.6%) patients had episodes classified as recurrent. Forty-two patients had more than one culture-confirmed TB episode, and isolates from two episodes in 21 patients were available for genotyping. In 18 patients, the M. tuberculosis isolates obtained from the first and second TB episodes had identical spoligotypes. The WGS analysis of the 36 M. tuberculosis isolates from the 18 suspected relapse patients (average time between isolates 2.8 years) revealed 0 to 38 single nucleotide polymorphisms (median 1, mean 3.78) between the first and second isolate. There seemed to be no direct relation between the number of years between the two isolates, or treatment outcome, and the number of single nucleotide polymorphisms. The results suggest that the mutation rate may depend on multiple host-, strain- and treatment-related factors.

  19. Utility of Whole-Genome Sequencing in Characterizing Acinetobacter Epidemiology and Analyzing Hospital Outbreaks.

    Science.gov (United States)

    Fitzpatrick, Margaret A; Ozer, Egon A; Hauser, Alan R

    2016-03-01

    Acinetobacter baumannii frequently causes nosocomial infections and outbreaks. Whole-genome sequencing (WGS) is a promising technique for strain typing and outbreak investigations. We compared the performance of conventional methods with WGS for strain typing clinical Acinetobacter isolates and analyzing a carbapenem-resistant A. baumannii (CRAB) outbreak. We performed two band-based typing techniques (pulsed-field gel electrophoresis and repetitive extragenic palindromic-PCR), multilocus sequence type (MLST) analysis, and WGS on 148 Acinetobacter calcoaceticus-A. baumannii complex bloodstream isolates collected from a single hospital from 2005 to 2012. Phylogenetic trees inferred from core-genome single nucleotide polymorphisms (SNPs) confirmed three Acinetobacter species within this collection. Four major A. baumannii clonal lineages (as defined by MLST) circulated during the study, three of which are globally distributed and one of which is novel. WGS indicated that a threshold of 2,500 core SNPs accurately distinguished A. baumannii isolates from different clonal lineages. The band-based techniques performed poorly in assigning isolates to clonal lineages and exhibited little agreement with sequence-based techniques. After applying WGS to a CRAB outbreak that occurred during the study, we identified a threshold of 2.5 core SNPs that distinguished nonoutbreak from outbreak strains. WGS was more discriminatory than the band-based techniques and was used to construct a more accurate transmission map that resolved many of the plausible transmission routes suggested by epidemiologic links. Our study demonstrates that WGS is superior to conventional techniques for A. baumannii strain typing and outbreak analysis. These findings support the incorporation of WGS into health care infection prevention efforts.

  20. Whole-genome SNP association analysis of reproduction traits in the Finnish Landrace pig breed

    Directory of Open Access Journals (Sweden)

    Uimari Pekka

    2011-12-01

    Full Text Available Abstract Background Good genetic progress for pig reproduction traits has been achieved using a quantitative genetics-based multi-trait BLUP evaluation system. At present, whole-genome single nucleotide polymorphisms (SNP panels provide a new tool for pig selection. The purpose of this study was to identify SNP associated with reproduction traits in the Finnish Landrace pig breed using the Illumina PorcineSNP60 BeadChip. Methods Association of each SNP with different traits was tested with a weighted linear model, using SNP genotype as a covariate and animal as a random variable. Deregressed estimated breeding values of the progeny tested boars were used as the dependent variable and weights were based on their reliabilities. Statistical significance of the associations was based on Bonferroni-corrected P-values. Results Deregressed estimated breeding values were available for 328 genotyped boars. Of the 62 163 SNP in the chip, 57 868 SNP had a call rate > 0.9 and 7 632 SNP were monomorphic. Statistically significant results (P-value P-value P-value = 1.69E-08 more than unfavourable double homozygote animals. A region on chromosome 9 (66 Mb was statistically significant for piglet mortality between birth and weaning in later parity (0.44 piglets between homozygotes, P-value = 6.94E-08. Conclusions Three separate regions on chromosome 9 gave significant results for litter size and pig mortality. The frequencies of favourable alleles of the significant SNP are moderate in the Finnish Landrace population and these SNP are thus valuable candidates for possible marker-assisted selection.

  1. Whole-Genome Saliva and Blood DNA Methylation Profiling in Individuals with a Respiratory Allergy.

    Science.gov (United States)

    Langie, Sabine A S; Szarc Vel Szic, Katarzyna; Declerck, Ken; Traen, Sophie; Koppen, Gudrun; Van Camp, Guy; Schoeters, Greet; Vanden Berghe, Wim; De Boever, Patrick

    2016-01-01

    The etiology of respiratory allergies (RA) can be partly explained by DNA methylation changes caused by adverse environmental and lifestyle factors experienced early in life. Longitudinal, prospective studies can aid in the unravelment of the epigenetic mechanisms involved in the disease development. High compliance rates can be expected in these studies when data is collected using non-invasive and convenient procedures. Saliva is an attractive biofluid to analyze changes in DNA methylation patterns. We investigated in a pilot study the differential methylation in saliva of RA (n = 5) compared to healthy controls (n = 5) using the Illumina Methylation 450K BeadChip platform. We evaluated the results against the results obtained in mononuclear blood cells from the same individuals. Differences in methylation patterns from saliva and mononuclear blood cells were clearly distinguishable (PAdj0.2), though the methylation status of about 96% of the cg-sites was comparable between peripheral blood mononuclear cells and saliva. When comparing RA cases with healthy controls, the number of differentially methylated sites (DMS) in saliva and blood were 485 and 437 (P0.1), respectively, of which 216 were in common. The methylation levels of these sites were significantly correlated between blood and saliva. The absolute levels of methylation in blood and saliva were confirmed for 3 selected DMS in the PM20D1, STK32C, and FGFR2 genes using pyrosequencing analysis. The differential methylation could only be confirmed for DMS in PM20D1 and STK32C genes in saliva. We show that saliva can be used for genome-wide methylation analysis and that it is possible to identify DMS when comparing RA cases and healthy controls. The results were replicated in blood cells of the same individuals and confirmed by pyrosequencing analysis. This study provides proof-of-concept for the applicability of saliva-based whole-genome methylation analysis in the field of respiratory allergy.

  2. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

    Science.gov (United States)

    Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike; van den Berg, Jan; Vriezen, Wim; Janssen, Antoine; Datema, Erwin; Jahrman, Torben; Moquet, Frederic; Bonnet, Julien; Peters, Sander

    2014-10-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies.

  3. Genome-wide association analysis of milk yield traits in Nordic Red Cattle using imputed whole genome sequence variants

    DEFF Research Database (Denmark)

    Iso-Touru, T; Sahana, G; Guldbrandtsen, B;

    2016-01-01

    variants behind them. In this study, we used whole genome sequence level data from 4280 progeny tested Nordic Red Cattle bulls to scan the genome for loci affecting milk, fat and protein yields. RESULTS: Using a genome-wise significance threshold, regions on Bos taurus chromosomes 5, 14, 23, 25 and 26 were...... traits via biological networks. CONCLUSION: This is the first time when whole genome sequence data is utilized to study genomic regions affecting milk production in the Nordic Red Cattle population. Sequence level data offers the possibility to study quantitative traits in detail but still cannot......BACKGROUND: The Nordic Red Cattle consisting of three different populations from Finland, Sweden and Denmark are under a joint breeding value estimation system. The long history of recording of production and health traits offers a great opportunity to study production traits and identify causal...

  4. Whole genome sequence and genome annotation of Colletotrichum acutatum, causal agent of anthracnose in pepper plants in South Korea

    Directory of Open Access Journals (Sweden)

    Joon-Hee Han

    2016-06-01

    Full Text Available Colletotrichum acutatum is a destructive fungal pathogen which causes anthracnose in a wide range of crops. Here we report the whole genome sequence and annotation of C. acutatum strain KC05, isolated from an infected pepper in Kangwon, South Korea. Genomic DNA from the KC05 strain was used for the whole genome sequencing using a PacBio sequencer and the MiSeq system. The KC05 genome was determined to be 52,190,760 bp in size with a G + C content of 51.73% in 27 scaffolds and to contain 13,559 genes with an average length of 1516 bp. Gene prediction and annotation were performed by incorporating RNA-Seq data. The genome sequence of the KC05 was deposited at DDBJ/ENA/GenBank under the accession number LUXP00000000.

  5. A novel whole genome amplification method using type IIS restriction enzymes to create overhangs with random sequences.

    Science.gov (United States)

    Pan, Xiaoming; Wan, Baihui; Li, Chunchuan; Liu, Yu; Wang, Jing; Mou, Haijin; Liang, Xingguo

    2014-08-20

    Ligation-mediated polymerase chain reaction (LM-PCR) is a whole genome amplification (WGA) method, for which genomic DNA is cleaved into numerous fragments and then all of the fragments are amplified by PCR after attaching a universal end sequence. However, the self-ligation of these fragments could happen and may cause biased amplification and restriction of its application. To decrease the self-ligation probability, here we use type IIS restriction enzymes to digest genomic DNA into fragments with 4-5nt long overhangs with random sequences. After ligation to an adapter with random end sequences to above fragments, PCR is carried out and almost all present DNA sequences are amplified. In this study, whole genome of Vibrio parahaemolyticus was amplified and the amplification efficiency was evaluated by quantitative PCR. The results suggested that our approach could provide sufficient genomic DNA with good quality to meet requirements of various genetic analyses.

  6. Beyond the Whole-Genome Duplication: Phylogenetic Evidence for an Ancient Interspecies Hybridization in the Baker's Yeast Lineage.

    Directory of Open Access Journals (Sweden)

    Marina Marcet-Houben

    2015-08-01

    Full Text Available Whole-genome duplications have shaped the genomes of several vertebrate, plant, and fungal lineages. Earlier studies have focused on establishing when these events occurred and on elucidating their functional and evolutionary consequences, but we still lack sufficient understanding of how genome duplications first originated. We used phylogenomics to study the ancient genome duplication occurred in the yeast Saccharomyces cerevisiae lineage and found compelling evidence for the existence of a contemporaneous interspecies hybridization. We propose that the genome doubling was a direct consequence of this hybridization and that it served to provide stability to the recently formed allopolyploid. This scenario provides a mechanism for the origin of this ancient duplication and the lineage that originated from it and brings a new perspective to the interpretation of the origin and consequences of whole-genome duplications.

  7. snpTree - a web-server to identify and construct SNP trees from whole genome sequence data

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Kaas, Rolf Sommer; Thomsen, Martin Christen Frølund;

    2012-01-01

    Background The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differe......Background The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis...... from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script. The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evalution results for the first three...

  8. Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis

    OpenAIRE

    Ayele, Mulu; Haas, Brian J.; Kumar, Nikhil; Wu, Hank; Xiao, Yongli; Van Aken, Susan; Utterback, Teresa R.; WORTMAN, Jennifer R.; White, Owen R.; Town, Christopher D

    2005-01-01

    Through comparative studies of the model organism Arabidopsis thaliana and its close relative Brassica oleracea, we have identified conserved regions that represent potentially functional sequences overlooked by previous Arabidopsis genome annotation methods. A total of 454,274 whole genome shotgun sequences covering 283 Mb (0.44×) of the estimated 650 Mb Brassica genome were searched against the Arabidopsis genome, and conserved Arabidopsis genome sequences (CAGSs) were identified. Of these ...

  9. Whole-Genome Sequencing of Vibrio cholerae O1 El Tor Strains Isolated in Ukraine (2011) and Russia (2014)

    Science.gov (United States)

    Smirnova, Nina I.; Agafonova, Elena Y.; Shchelkanova, Elena Y.; Alkhova, Zhanna V.; Kutyrev, Vladimir V.

    2017-01-01

    ABSTRACT Here, we present the draft whole-genome sequence of Vibrio cholerae O1 El Tor strains 76 and M3265/80, isolated in Mariupol, Ukraine, and Moscow, Russia. The presence of various mutations detected in virulence-associated mobile elements indicates high genetic similarity of the strains reported here with new highly virulent variants of the cholera agent V. cholerae. PMID:28232438

  10. Whole-genome sequence of Sunxiuqinia dokdonensis DH1(T), isolated from deep sub-seafloor sediment in Dokdo Island.

    Science.gov (United States)

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-09-01

    Sunxiuqinia dokdonensis DH1(T) was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island) in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  11. Whole-genome sequence of Sunxiuqinia dokdonensis DH1T, isolated from deep sub-seafloor sediment in Dokdo Island

    OpenAIRE

    Sooyeon Lim; Dong-Ho Chang; Byoung-Chan Kim

    2016-01-01

    Sunxiuqinia dokdonensis DH1T was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island) in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  12. Whole-genome sequence of Sunxiuqinia dokdonensis DH1T, isolated from deep sub-seafloor sediment in Dokdo Island

    Directory of Open Access Journals (Sweden)

    Sooyeon Lim

    2016-09-01

    Full Text Available Sunxiuqinia dokdonensis DH1T was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  13. Abductive Inference using Array-Based Logic

    DEFF Research Database (Denmark)

    Frisvad, Jeppe Revall; Falster, Peter; Møller, Gert L.;

    The notion of abduction has found its usage within a wide variety of AI fields. Computing abductive solutions has, however, shown to be highly intractable in logic programming. To avoid this intractability we present a new approach to logicbased abduction; through the geometrical view of data...... employed in array-based logic we embrace abduction in a simple structural operation. We argue that a theory of abduction on this form allows for an implementation which, at runtime, can perform abductive inference quite efficiently on arbitrary rules of logic representing knowledge of finite domains....

  14. Colonization with methicillin-resistant Staphylococcus pseudintermedius in multi-dog households: A longitudinal study using whole genome sequencing.

    Science.gov (United States)

    Windahl, Ulrika; Gren, Joakim; Holst, Bodil S; Börjesson, Stefan

    2016-06-30

    Despite a worldwide increase in the presence of methicillin-resistant Staphylococcus pseudintermedius (MRSP) in dogs and its potential to cause serious canine health problem, the understanding of the transmission and long-term carriage of MRSP is limited. The objective of this study was to investigate the transmission of MRSP to contact dogs living in multiple dog households where one or more of the dogs had been diagnosed with a clinically apparent infection with MRSP. MRSP carriage was investigated over several months in 11 dogs living in four separate multiple dog households where an MRSP infection in a dog had been diagnosed. Whole-genome sequencing was used for genotypic characterization. Contact dogs were only MRSP-positive if the index dog was positive on the same sample occasion. Three contact dogs were consistently MRSP-negative. The data from whole genome sequencing showed similarities between isolates within each family group, indicating that MRSP was transmitted within each family. The results show that the risk of MRSP-colonization in dogs living with an MRSP-infected dog is reduced if the index dog becomes MRSP negative. All of the contact dogs will not carry MRSP continuously during the time the index dog is MRSP-positive. The information yielded from whole genome sequencing showed the methodology to be a promising additional tool in epidemiologic investigations of MRSP transmission.

  15. TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data

    Science.gov (United States)

    Roth, Andrew; Khattra, Jaswinder; Ho, Julie; Yap, Damian; Prentice, Leah M.; Melnyk, Nataliya; McPherson, Andrew; Bashashati, Ali; Laks, Emma; Biele, Justina; Ding, Jiarui; Le, Alan; Rosner, Jamie; Shumansky, Karey; Marra, Marco A.; Gilks, C. Blake; Huntsman, David G.; McAlpine, Jessica N.; Aparicio, Samuel

    2014-01-01

    The evolution of cancer genomes within a single tumor creates mixed cell populations with divergent somatic mutational landscapes. Inference of tumor subpopulations has been disproportionately focused on the assessment of somatic point mutations, whereas computational methods targeting evolutionary dynamics of copy number alterations (CNA) and loss of heterozygosity (LOH) in whole-genome sequencing data remain underdeveloped. We present a novel probabilistic model, TITAN, to infer CNA and LOH events while accounting for mixtures of cell populations, thereby estimating the proportion of cells harboring each event. We evaluate TITAN on idealized mixtures, simulating clonal populations from whole-genome sequences taken from genomically heterogeneous ovarian tumor sites collected from the same patient. In addition, we show in 23 whole genomes of breast tumors that the inference of CNA and LOH using TITAN critically informs population structure and the nature of the evolving cancer genome. Finally, we experimentally validated subclonal predictions using fluorescence in situ hybridization (FISH) and single-cell sequencing from an ovarian cancer patient sample, thereby recapitulating the key modeling assumptions of TITAN. PMID:25060187

  16. Construction of Whole Genome Microarrays, and Expression Analysis of Desulfovibrio vulgaris cells in Metal-Reducing Conditions (Uranium and Chromium)

    Energy Technology Data Exchange (ETDEWEB)

    Fields, Matthew W.

    2005-06-01

    One of the major goals of the project is to construct whole-genome microarrays for Desulfovibrio vulgaris. Previous whole-genome microarrays constructed at ORNL have been PCR-amplimer based, and we wanted to re-evaluate the type of microarrays being built because oligonucleotide probes have several advantages. Microarrays have been generally constructed with two types of probes, PCR-generated probes that typically range in size between 200 and 2000 bp, and oligonucleotide probes with typical size of 20-70 nt. Producing PCR product-based DNA arrays can be a time-consuming procedure that includes PCR primer design, amplification, size verification, product purification, and product quantification. Also, some ORFs are difficult to amplify and thus the construction of comprehensive arrays can be a challenge. Recently, to alleviate some of the problems associated with PCR product-based microarrays, oligonucleotide microarrays that contain probes longer than 40 nt have been evaluated and used for whole genome expression studies. These microarrays should have higher specificity and are easy to construct, and can thus provide an important alternative approach to monitor gene expression. However, due to the smaller probe size, it is expected that the detection sensitivity of oligonucleotide arrays will be lower than PCR product-based probes.

  17. High-Quality Exome Sequencing of Whole-Genome Amplified Neonatal Dried Blood Spot DNA.

    Directory of Open Access Journals (Sweden)

    Jesper Buchhave Poulsen

    Full Text Available Stored neonatal dried blood spot (DBS samples from neonatal screening programmes are a valuable diagnostic and research resource. Combined with information from national health registries they can be used in population-based studies of genetic diseases. DNA extracted from neonatal DBSs can be amplified to obtain micrograms of an otherwise limited resource, referred to as whole-genome amplified DNA (wgaDNA. Here we investigate the robustness of exome sequencing of wgaDNA of neonatal DBS samples. We conducted three pilot studies of seven, eight and seven subjects, respectively. For each subject we analysed a neonatal DBS sample and corresponding adult whole-blood (WB reference sample. Different DNA sample types were prepared for each of the subjects. Pilot 1: wgaDNA of 2x3.2mm neonatal DBSs (DBS_2x3.2 and raw DNA extract of the WB reference sample (WB_ref. Pilot 2: DBS_2x3.2, WB_ref and a WB_ref replica sharing DNA extract with the WB_ref sample. Pilot 3: DBS_2x3.2, WB_ref, wgaDNA of 2x1.6 mm neonatal DBSs and wgaDNA of the WB reference sample. Following sequencing and data analysis, we compared pairwise variant calls to obtain a measure of similarity-the concordance rate. Concordance rates were slightly lower when comparing DBS vs WB sample types than for any two WB sample types of the same subject before filtering of the variant calls. The overall concordance rates were dependent on the variant type, with SNPs performing best. Post-filtering, the comparisons of DBS vs WB and WB vs WB sample types yielded similar concordance rates, with values close to 100%. WgaDNA of neonatal DBS samples performs with great accuracy and efficiency in exome sequencing. The wgaDNA performed similarly to matched high-quality reference-whole-blood DNA-based on concordance rates calculated from variant calls. No differences were observed substituting 2x3.2 with 2x1.6 mm discs, allowing for additional reduction of sample material in future projects.

  18. Whole Genome Sequencing Based Characterization of Extensively Drug-Resistant Mycobacterium tuberculosis Isolates from Pakistan

    KAUST Repository

    Ali, Asho

    2015-02-26

    Improved molecular diagnostic methods for detection drug resistance in Mycobacterium tuberculosis (MTB) strains are required. Resistance to first- and second- line anti-tuberculous drugs has been associated with single nucleotide polymorphisms (SNPs) in particular genes. However, these SNPs can vary between MTB lineages therefore local data is required to describe different strain populations. We used whole genome sequencing (WGS) to characterize 37 extensively drug-resistant (XDR) MTB isolates from Pakistan and investigated 40 genes associated with drug resistance. Rifampicin resistance was attributable to SNPs in the rpoB hot-spot region. Isoniazid resistance was most commonly associated with the katG codon 315 (92%) mutation followed by inhA S94A (8%) however, one strain did not have SNPs in katG, inhA or oxyR-ahpC. All strains were pyrazimamide resistant but only 43% had pncA SNPs. Ethambutol resistant strains predominantly had embB codon 306 (62%) mutations, but additional SNPs at embB codons 406, 378 and 328 were also present. Fluoroquinolone resistance was associated with gyrA 91-94 codons in 81% of strains; four strains had only gyr B mutations, while others did not have SNPs in either gyrA or gyrB. Streptomycin resistant strains had mutations in ribosomal RNA genes; rpsL codon 43 (42%); rrs 500 region (16%), and gidB (34%) while six strains did not have mutations in any of these genes. Amikacin/kanamycin/capreomycin resistance was associated with SNPs in rrs at nt1401 (78%) and nt1484 (3%), except in seven (19%) strains. We estimate that if only the common hot-spot region targets of current commercial assays were used, the concordance between phenotypic and genotypic testing for these XDR strains would vary between rifampicin (100%), isoniazid (92%), flouroquinolones (81%), aminoglycoside (78%) and ethambutol (62%); while pncA sequencing would provide genotypic resistance in less than half the isolates. This work highlights the importance of expanded

  19. Whole genome transcript profiling from fingerstick blood samples: a comparison and feasibility study

    Directory of Open Access Journals (Sweden)

    Williams Adam R

    2009-12-01

    Full Text Available Abstract Background Whole genome gene expression profiling has revolutionized research in the past decade especially with the advent of microarrays. Recently, there have been significant improvements in whole blood RNA isolation techniques which, through stabilization of RNA at the time of sample collection, avoid bias and artifacts introduced during sample handling. Despite these improvements, current human whole blood RNA stabilization/isolation kits are limited by the requirement of a venous blood sample of at least 2.5 mL. While fingerstick blood collection has been used for many different assays, there has yet to be a kit developed to isolate high quality RNA for use in gene expression studies from such small human samples. The clinical and field testing advantages of obtaining reliable and reproducible gene expression data from a fingerstick are many; it is less invasive, time saving, more mobile, and eliminates the need of a trained phlebotomist. Furthermore, this method could also be employed in small animal studies, i.e. mice, where larger sample collections often require sacrificing the animal. In this study, we offer a rapid and simple method to extract sufficient amounts of high quality total RNA from approximately 70 μl of whole blood collected via a fingerstick using a modified protocol of the commercially available Qiagen PAXgene RNA Blood Kit. Results From two sets of fingerstick collections, about 70 uL whole blood collected via finger lancet and capillary tube, we recovered an average of 252.6 ng total RNA with an average RIN of 9.3. The post-amplification yields for 50 ng of total RNA averaged at 7.0 ug cDNA. The cDNA hybridized to Affymetrix HG-U133 Plus 2.0 GeneChips had an average % Present call of 52.5%. Both fingerstick collections were highly correlated with r2 values ranging from 0.94 to 0.97. Similarly both fingerstick collections were highly correlated to the venous collection with r2 values ranging from 0.88 to 0

  20. Assessment of the Utility of Whole Genome Sequencing of Measles Virus in the Characterisation of Outbreaks.

    Directory of Open Access Journals (Sweden)

    Ana Raquel Penedos

    Full Text Available Measles is a highly infectious disease caused by measles virus (MeV. Despite the availability of a safe and cost-effective vaccine, measles is one of the world-leading causes of death in young children. Within Europe, there is a target for eliminating endemic measles in 2015, with molecular epidemiology required on 80% of cases for inclusion/exclusion of outbreak transmission chains. Currently, MeV is genotyped on the basis of a 450 nucleotide region of the nucleoprotein gene (N-450 and the hemagglutinin gene (H. However, this is not sufficiently informative for distinguishing endemic from imported MeV. We have developed an amplicon-based method for obtaining whole genome sequences (WGS using NGS or Sanger methodologies from cell culture isolates or oral fluid specimens, and have sequenced over 60 samples, including 42 from the 2012 outbreak in the UK.Overall, NGS coverage was over 90% for approximately 71% of the samples tested. Analysis of 32 WGS excluding 3' and 5' termini (WGS-t obtained from the outbreak indicates that the single nucleotide difference found between the two major groups of N-450 sequences detected during the outbreak is most likely a result of stochastic viral mutation during endemic transmission rather than of multiple importation events: earlier strains appear to have evolved into two distinct strain clusters in 2013, one containing strains with both outbreak-associated N-450 sequences. Additionally, phylogenetic analysis of each genomic region of MeV for the strains in this study suggests that the most information is acquired from the non-coding region located between the matrix and fusion protein genes (M/F NCR and the N-450 genotyping sequence, an observation supported by entropy analysis across genotypes.We suggest that both M/F NCR and WGS-t could be used to complement the information from classical epidemiology and N-450 sequencing to address specific questions in the context of measles elimination.

  1. Whole-genome sequencing of individuals from a founder population identifies candidate genes for asthma.

    Science.gov (United States)

    Campbell, Catarina D; Mohajeri, Kiana; Malig, Maika; Hormozdiari, Fereydoun; Nelson, Benjamin; Du, Gaixin; Patterson, Kristen M; Eng, Celeste; Torgerson, Dara G; Hu, Donglei; Herman, Catherine; Chong, Jessica X; Ko, Arthur; O'Roak, Brian J; Krumm, Niklas; Vives, Laura; Lee, Choli; Roth, Lindsey A; Rodriguez-Cintron, William; Rodriguez-Santana, Jose; Brigino-Buenaventura, Emerita; Davis, Adam; Meade, Kelley; LeNoir, Michael A; Thyne, Shannon; Jackson, Daniel J; Gern, James E; Lemanske, Robert F; Shendure, Jay; Abney, Mark; Burchard, Esteban G; Ober, Carole; Eichler, Evan E

    2014-01-01

    Asthma is a complex genetic disease caused by a combination of genetic and environmental risk factors. We sought to test classes of genetic variants largely missed by genome-wide association studies (GWAS), including copy number variants (CNVs) and low-frequency variants, by performing whole-genome sequencing (WGS) on 16 individuals from asthma-enriched and asthma-depleted families. The samples were obtained from an extended 13-generation Hutterite pedigree with reduced genetic heterogeneity due to a small founding gene pool and reduced environmental heterogeneity as a result of a communal lifestyle. We sequenced each individual to an average depth of 13-fold, generated a comprehensive catalog of genetic variants, and tested the most severe mutations for association with asthma. We identified and validated 1960 CNVs, 19 nonsense or splice-site single nucleotide variants (SNVs), and 18 insertions or deletions that were out of frame. As follow-up, we performed targeted sequencing of 16 genes in 837 cases and 540 controls of Puerto Rican ancestry and found that controls carry a significantly higher burden of mutations in IL27RA (2.0% of controls; 0.23% of cases; nominal p = 0.004; Bonferroni p = 0.21). We also genotyped 593 CNVs in 1199 Hutterite individuals. We identified a nominally significant association (p = 0.03; Odds ratio (OR) = 3.13) between a 6 kbp deletion in an intron of NEDD4L and increased risk of asthma. We genotyped this deletion in an additional 4787 non-Hutterite individuals (nominal p = 0.056; OR = 1.69). NEDD4L is expressed in bronchial epithelial cells, and conditional knockout of this gene in the lung in mice leads to severe inflammation and mucus accumulation. Our study represents one of the early instances of applying WGS to complex disease with a large environmental component and demonstrates how WGS can identify risk variants, including CNVs and low-frequency variants, largely untested in GWAS.

  2. Whole-genome sequence and analysis of Xanthomonas euvesicatoria strains and reassessment of the species

    Directory of Open Access Journals (Sweden)

    Jeri D. Barak

    2016-12-01

    Full Text Available Multiple species of Xanthomonas cause bacterial spot of tomato (BST and pepper. We sequenced five Xanthomonas euvesicatoria strains isolated from three continents (Africa, Asia, and South America to provide a set of representative genomes with temporal and geographic diversity. LMG strains 667, 905, 909, and 933 were pathogenic on tomato and pepper, except LMG 918 which was pathogenic on pepper but elicited a hypersensitive reaction (HR on tomato. Furthermore, LMG 667, 909, and 918 elicited a HR on Early Cal Wonder 30R containing Bs3. We examined pectolytic activity and starch hydrolysis, two tests which are useful in differentiating X. euvesicatoria from X. perforans, both causal agents of BST. LMG strains 905, 909, 918, and 933 were nonpectolytic while only LMG 918 was amylolytic. These results suggest that these strains are all atypical to both X. euvesicatoria and X. perforans. Sequence analysis of all the publicly available X. euvesicatoria and X. perforans strains comparing seven housekeeping genes identified seven haplotypes with few polymorphisms. Whole genome comparison by average nucleotide identity (ANI resulted in values of >99% among the LMG strains 667, 905, 909, 918, and 933 and X. euvesicatoria strains and >99.6% among the LMG strains and a subset of X. perforans strains. These results suggest that X. euvesicatoria and X. perforans should be considered a single species. ANI values between strains of X. euvesicatoria, X. perforans, X. allii, X. alfalfa subsp. citrumelonis, X. dieffenbachiae, and a recently described pathogen of rose were >97.8% suggesting these pathogens should be a single species and recognized as X. euvesicatoria as well. Analysis of the newly sequenced X. euvesicatoria strains revealed interesting findings among the type 3 (T3 effectors, relatively ancient stepwise erosion of some T3 effectors, additional X. euvesicatoria-specific T3 effectors among the causal agents of BST, orthologs of avrBs3 and avrBs4, and

  3. Whole genome analysis of p38 SAPK-mediated gene expression upon stress

    Directory of Open Access Journals (Sweden)

    Lopez-Bigas Nuria

    2010-03-01

    Full Text Available Abstract Background Cells have the ability to respond and adapt to environmental changes through activation of stress-activated protein kinases (SAPKs. Although p38 SAPK signalling is known to participate in the regulation of gene expression little is known on the molecular mechanisms used by this SAPK to regulate stress-responsive genes and the overall set of genes regulated by p38 in response to different stimuli. Results Here, we report a whole genome expression analyses on mouse embryonic fibroblasts (MEFs treated with three different p38 SAPK activating-stimuli, namely osmostress, the cytokine TNFα and the protein synthesis inhibitor anisomycin. We have found that the activation kinetics of p38α SAPK in response to these insults is different and also leads to a complex gene pattern response specific for a given stress with a restricted set of overlapping genes. In addition, we have analysed the contribution of p38α the major p38 family member present in MEFs, to the overall stress-induced transcriptional response by using both a chemical inhibitor (SB203580 and p38α deficient (p38α-/- MEFs. We show here that p38 SAPK dependency ranged between 60% and 88% depending on the treatments and that there is a very good overlap between the inhibitor treatment and the ko cells. Furthermore, we have found that the dependency of SAPK varies depending on the time the cells are subjected to osmostress. Conclusions Our genome-wide transcriptional analyses shows a selective response to specific stimuli and a restricted common response of up to 20% of the stress up-regulated early genes that involves an important set of transcription factors, which might be critical for either cell adaptation or preparation for continuous extra-cellular changes. Interestingly, up to 85% of the up-regulated genes are under the transcriptional control of p38 SAPK. Thus, activation of p38 SAPK is critical to elicit the early gene expression program required for cell

  4. Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery

    Directory of Open Access Journals (Sweden)

    Stothard Paul

    2011-11-01

    Full Text Available Abstract Background One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle. Results The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs, 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs. Ten

  5. Whole genome SNP discovery and analysis of genetic diversity in Turkey (Meleagris gallopavo

    Directory of Open Access Journals (Sweden)

    Aslam Muhammad L

    2012-08-01

    whole genome SNP discovery study in turkey resulted in the detection of 5.49 million putative SNPs compared to the reference genome. All commercial lines appear to share a common origin. Presence of different alleles/haplotypes in the SM population highlights that specific haplotypes have been selected in the modern domesticated turkey.

  6. Whole-Genome Analysis of Antimicrobial-Resistant and Extraintestinal Pathogenic Escherichia coli in River Water.

    Science.gov (United States)

    Gomi, Ryota; Matsuda, Tomonari; Matsumura, Yasufumi; Yamamoto, Masaki; Tanaka, Michio; Ichiyama, Satoshi; Yoneda, Minoru

    2017-03-01

    Contamination of surface waters by antimicrobial-resistant bacteria and pathogenic bacteria is a great concern. In this study, 531 Escherichia coli isolates obtained from the Yamato River in Japan were evaluated phenotypically for resistance to 25 antimicrobials. Seventy-six isolates (14.3%) were multidrug resistant (MDR), 66 (12.4%) were nonsusceptible to one or two classes of agents, and 389 (73.3%) were susceptible. We performed whole-genome sequencing of selected strains by using Illumina technology. In total, the genome sequences of 155 strains were analyzed for antibiotic resistance determinants and phylogenetic characteristics. More than 50 different resistance determinants, including acquired resistance genes and chromosomal resistance mutations, were detected. Among the sequenced MDR strains (n = 66), sequence type 155 (ST155) complex (n = 9), ST10 complex (n = 9), and ST69 complex (n = 7) were prevalent. Among extraintestinal pathogenic E. coli (ExPEC) strains (n = 58), clinically important clonal groups, namely, ST95 complex (n = 18), ST127 complex (n = 8), ST12 complex (n = 6), ST14 complex (n = 6), and ST131 complex (n = 6), were prevalent, demonstrating the clonal distribution of environmental ExPEC strains. Typing of the fimH (type 1 fimbrial adhesin) gene revealed that ST131 complex strains carried fimH22 or fimH41, and no strains belonging to the fimH30 subgroup were detected. Fine-scale phylogenetic analysis and virulence gene content analysis of strains belonging to the ST95 complex (one of the major clonal ExPEC groups causing community-onset infections) revealed no significant differences between environmental and clinical strains. The results indicate contamination of surface waters by E. coli strains belonging to clinically important clonal groups.IMPORTANCE The prevalence of antimicrobial-resistant and pathogenic E. coli strains in surface waters is a concern because surface waters are used as sources for drinking water, irrigation, and

  7. Toward a Taxonomy for Multi-Omics Science? Terminology Development for Whole Genome Study Approaches by Omics Technology and Hierarchy.

    Science.gov (United States)

    Pirih, Nina; Kunej, Tanja

    2017-01-01

    Omics is a form of high-throughput systems science. However, taxonomies for omics studies are limited, inviting us to rethink new ways in which we classify, prioritize, and rank various omics systems science studies. In this overarching context, the genome-wide study approaches have proliferated in number and popularity over the past decade. However, their hierarchy is not well organized and the development of attendant terminology is not controlled. In the present study, we searched the literature in PubMed and the Web of Science databases published from March 1999 to September 2016 using the keywords, including genome-wide, association, whole genome, transcriptome-wide, metabolome, epigenome, and phenome. We identified the whole genome study approaches and sorted them according to the omics technology types (genomics, proteomics, and so on) and hierarchy. Thirty-four studies from over 90 publications were sorted into 10 omics groups: DNA level, transcriptomics, proteomics, interactomics, metabolomics, epigenomics, miRNomics/ncRNomics, phenomics, environmental omics, and pharmacogenomics. We suggest here modifications of terminology for study approaches, which share the same acronyms such as EWAS for epigenome-wide association and environment-wide association studies, and MWAS for methylome-wide association and metabolome-wide association studies. Taken together, our study presented here provides the first systematic review and analyses of whole genome approaches and presents a baseline for further controlled terminology development, with a view to a new taxonomy for omics and multi-omics studies in the future. Finally, we call for greater dialogue and collaboration across diverse omics knowledge domains and applications, for example, across plants, animals, clinical medicine, and ecology.

  8. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Science.gov (United States)

    Wilson, Mark R; Brown, Eric; Keys, Chris; Strain, Errol; Luo, Yan; Muruvanda, Tim; Grim, Christopher; Jean-Gilles Beaubrun, Junia; Jarvis, Karen; Ewing, Laura; Gopinath, Gopal; Hanes, Darcy; Allard, Marc W; Musser, Steven

    2016-01-01

    Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS) to Salmonella subspecies enterica serotype Tennessee (S. Tennessee) to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana), which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP) analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs), suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts during future

  9. Whole Genome DNA Sequence Analysis of Salmonella subspecies enterica serotype Tennessee obtained from related peanut butter foodborne outbreaks.

    Directory of Open Access Journals (Sweden)

    Mark R Wilson

    Full Text Available Establishing an association between possible food sources and clinical isolates requires discriminating the suspected pathogen from an environmental background, and distinguishing it from other closely-related foodborne pathogens. We used whole genome sequencing (WGS to Salmonella subspecies enterica serotype Tennessee (S. Tennessee to describe genomic diversity across the serovar as well as among and within outbreak clades of strains associated with contaminated peanut butter. We analyzed 71 isolates of S. Tennessee from disparate food, environmental, and clinical sources and 2 other closely-related Salmonella serovars as outgroups (S. Kentucky and S. Cubana, which were also shot-gun sequenced. A whole genome single nucleotide polymorphism (SNP analysis was performed using a maximum likelihood approach to infer phylogenetic relationships. Several monophyletic lineages of S. Tennessee with limited SNP variability were identified that recapitulated several food contamination events. S. Tennessee clades were separated from outgroup salmonellae by more than sixteen thousand SNPs. Intra-serovar diversity of S. Tennessee was small compared to the chosen outgroups (1,153 SNPs, suggesting recent divergence of some S. Tennessee clades. Analysis of all 1,153 SNPs structuring an S. Tennessee peanut butter outbreak cluster revealed that isolates from several food, plant, and clinical isolates were very closely related, as they had only a few SNP differences between them. SNP-based cluster analyses linked specific food sources to several clinical S. Tennessee strains isolated in separate contamination events. Environmental and clinical isolates had very similar whole genome sequences; no markers were found that could be used to discriminate between these sources. Finally, we identified SNPs within variable S. Tennessee genes that may be useful markers for the development of rapid surveillance and typing methods, potentially aiding in traceback efforts

  10. epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data.

    Science.gov (United States)

    Vincent, Martin; Mundbjerg, Kamilla; Skou Pedersen, Jakob; Liang, Gangning; Jones, Peter A; Ørntoft, Torben Falck; Dalsgaard Sørensen, Karina; Wiuf, Carsten

    2017-02-21

    The study of epigenetic heterogeneity at the level of individual cells and in whole populations is the key to understanding cellular differentiation, organismal development, and the evolution of cancer. We develop a statistical method, epiG, to infer and differentiate between different epi-allelic haplotypes, annotated with CpG methylation status and DNA polymorphisms, from whole-genome bisulfite sequencing data, and nucleosome occupancy from NOMe-seq data. We demonstrate the capabilities of the method by inferring allele-specific methylation and nucleosome occupancy in cell lines, and colon and tumor samples, and by benchmarking the method against independent experimental data.

  11. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats.

    Science.gov (United States)

    van der Weide, Robin H; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.

  12. Two listeria outbreaks caused by smoked fish consumption-using whole-genome sequencing for outbreak investigations

    DEFF Research Database (Denmark)

    Gillesberg Lassen, S.; Ethelberg, S.; Björkman, J. T.

    2016-01-01

    production facilities by use of whole-genome sequencing and subsequent multilocus sequence type and single nucleotide polymorphism analysis. Outbreak cases shared outbreak strains, defined as Listeria monocytogenes isolates belonging to the same sequence type with fewer than five single nucleotide...... polymorphism differences. We performed routine food consumption interviews of L. monocytogenes patients and compared outbreak cases with sporadic cases. Two outbreaks were defined, each consisting of ten outbreak cases in the period 2013-15. Seven outbreak cases and a fetus in gestational week 38 died...

  13. Whole-Genome Sequence of a blaOXA-48-Harboring Raoultella ornithinolytica Clinical Isolate from Lebanon.

    Science.gov (United States)

    Al-Bayssari, Charbel; Olaitan, Abiola Olumuyiwa; Leangapichart, Thongpan; Okdah, Liliane; Dabboussi, Fouad; Hamze, Monzer; Rolain, Jean-Marc

    2016-04-01

    We analyzed the whole-genome sequence of ablaOXA-48-harboringRaoultella ornithinolyticaclinical isolate from a patient in Lebanon. The size of theRaoultella ornithinolyticaCMUL058 genome was 5,622,862 bp, with a G+C content of 55.7%. We deciphered all the molecular mechanisms of antibiotic resistance, and we compared our genome to other availableR. ornithinolyticagenomes in GenBank. The resistome consisted of 9 antibiotic resistance genes, including a plasmidicblaOXA-48gene whose genetic organization is also described.

  14. Whole Genome Comparison of Campylobacter jejuni Human Isolates Using a Low-Cost Microarray Reveals Extensive Genetic Diversity

    OpenAIRE

    2001-01-01

    Campylobacter jejuni is the leading cause of bacterial food-borne diarrhoeal disease throughout the world, and yet is still a poorly understood pathogen. Whole genome microarray comparisons of 11 C. jejuni strains of diverse origin identified genes in up to 30 NCTC 11168 loci ranging from 0.7 to 18.7 kb that are either absent or highly divergent in these isolates. Many of these regions are associated with the biosynthesis of surface structures including flagella, lipo-oligosaccharide, and the...

  15. Whole genome sequencing identifies circulating Beijing-lineage Mycobacterium tuberculosis strains in Guatemala and an associated urban outbreak.

    Science.gov (United States)

    Saelens, Joseph W; Lau-Bonilla, Dalia; Moller, Anneliese; Medina, Narda; Guzmán, Brenda; Calderón, Maylena; Herrera, Raúl; Sisk, Dana M; Xet-Mull, Ana M; Stout, Jason E; Arathoon, Eduardo; Samayoa, Blanca; Tobin, David M

    2015-12-01

    Limited data are available regarding the molecular epidemiology of Mycobacterium tuberculosis (Mtb) strains circulating in Guatemala. Beijing-lineage Mtb strains have gained prevalence worldwide and are associated with increased virulence and drug resistance, but there have been only a few cases reported in Central America. Here we report the first whole genome sequencing of Central American Beijing-lineage strains of Mtb. We find that multiple Beijing-lineage strains, derived from independent founding events, are currently circulating in Guatemala, but overall still represent a relatively small proportion of disease burden. Finally, we identify a specific Beijing-lineage outbreak centered on a poor neighborhood in Guatemala City.

  16. Monitoring meticillin resistant Staphylococcus aureus and its spread in Copenhagen, Denmark, 2013, through routine whole genome sequencing

    DEFF Research Database (Denmark)

    Bartels, M D; Larner-Svensson, H; Meiniche, H;

    2015-01-01

    Typing of meticillin resistant Staphylococcus aureus (MRSA) by whole genome sequencing (WGS) is performed routinely in Copenhagen since January 2013. We describe the relatedness, based on WGS data and epidemiological data, of 341 MRSA isolates. These comprised all MRSA (n = 300) identified...... in Copenhagen in the first five months of 2013. Moreover, because MRSA of staphylococcal protein A (spa)-type 304 (t304), sequence type (ST) 6 had been associated with a continuous neonatal ward outbreak in Copenhagen starting in 2011, 41 t304 isolates collected in the city between 2010 and 2012 were also...

  17. Draft whole-genome sequence of the Diaporthe helianthi 7/96 strain, causal agent of sunflower stem canker

    Directory of Open Access Journals (Sweden)

    Riccardo Baroncelli

    2016-12-01

    Full Text Available Diaporthe helianthi is a fungus pathogenic to sunflower. Virulent strains of this fungus cause stem canker with important yield losses and reduction of oil content. Here we present the first draft whole-genome sequence of the highly virulent isolate D. helianthi strain 7/96, thus providing a useful platform for future research on stem canker of sunflower and fungal genomics. The genome sequence of the D. helianthi isolate 7/96 was deposited at DDBJ/ENA/GenBank under the accession number MAVT00000000 (BioProject PRJNA327798.

  18. Fast and low-cost decentralized surveillance of transmission of tuberculosis based on strain-specific PCRs tailored from whole genome sequencing data: a pilot study.

    Science.gov (United States)

    Pérez-Lago, L; Martínez Lirola, M; Herranz, M; Comas, I; Bouza, E; García-de-Viedma, D

    2015-03-01

    Molecular epidemiology has transformed our knowledge of how tuberculosis (TB) is transmitted. Whole genome sequencing (WGS) has reached unprecedented levels of accuracy. However, it has increased technical requirements and costs, and analysis of data delays results. Our objective was to find a way to reconcile speed and ease of implementation with the high resolution of WGS. The targeted regional allele-specific oligonucleotide PCR (TRAP) assay presented here is based on allele-specific PCR targeting strain-specific single nucleotide polymorphisms, identified from WGS, and makes it possible to track actively transmitted Mycobacterium tuberculosis strains. A TRAP assay was optimized to track the most actively transmitted strains in a population in Almería, Southeast Spain, with high rates of TB. TRAP was transferred to the local laboratory where transmission was occurring. It performed well from cultured isolates and directly from sputa, enabling new secondary cases of infection from the actively transmitted strains to be detected. TRAP constitutes a fast, simple and low-cost tool that could modify surveillance of TB transmission. This pilot study could help to define a new model to survey TB transmission based on a decentralized multinodal network of local laboratories applying fast and low-cost TRAPs, which are developed by central reference centres, tailored to the specific demands of transmission at each local node.

  19. Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    William P. Gilks

    2016-11-01

    Full Text Available As part of a study into the molecular genetics of sexually dimorphic complex traits, we used next-generation sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6, and a unique haplotype from the outbred base population (LHM. The use of a static and known genetic background enabled us to obtain sequences from whole genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502. We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp. Additionally we detected and genotyped 167 large structural variants (1-100Kb in size using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591. We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/.

  20. Construction of whole genome radiation hybrid panels and map of chromosome 5A of wheat using asymmetric somatic hybridization.

    Directory of Open Access Journals (Sweden)

    Chuanen Zhou

    Full Text Available To explore the feasibility of constructing a whole genome radiation hybrid (WGRH map in plant species with large genomes, asymmetric somatic hybridization between wheat (Triticum aestivum L. and Bupleurum scorzonerifolium Willd. was performed. The protoplasts of wheat were irradiated with ultraviolet light (UV and gamma-ray and rescued by protoplast fusion using B. scorzonerifolium as the recipient. Assessment of SSR markers showed that the radiation hybrids have the average marker retention frequency of 15.5%. Two RH panels (RHPWI and RHPWII that contained 92 and 184 radiation hybrids, respectively, were developed and used for mapping of 68 SSR markers in chromosome 5A of wheat. A total of 1557 and 2034 breaks were detected in each panel. The RH map of chromosome 5A based on RHPWII was constructed. The distance of the comprehensive map was 2103 cR and the approximate resolution was estimated to be ∼501.6 kb/break. The RH panels evaluated in this study enabled us to order the ESTs in a single deletion bin or in the multiple bins cross the chromosome. These results demonstrated that RH mapping via protoplast fusion is feasible at the whole genome level for mapping purposes in wheat and the potential value of this mapping approach for the plant species with large genomes.

  1. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  2. Lessons learned from the application of whole-genome analysis to the treatment of patients with advanced cancers.

    Science.gov (United States)

    Laskin, Janessa; Jones, Steven; Aparicio, Samuel; Chia, Stephen; Ch'ng, Carolyn; Deyell, Rebecca; Eirew, Peter; Fok, Alexandra; Gelmon, Karen; Ho, Cheryl; Huntsman, David; Jones, Martin; Kasaian, Katayoon; Karsan, Aly; Leelakumari, Sreeja; Li, Yvonne; Lim, Howard; Ma, Yussanne; Mar, Colin; Martin, Monty; Moore, Richard; Mungall, Andrew; Mungall, Karen; Pleasance, Erin; Rassekh, S Rod; Renouf, Daniel; Shen, Yaoqing; Schein, Jacqueline; Schrader, Kasmintan; Sun, Sophie; Tinker, Anna; Zhao, Eric; Yip, Stephen; Marra, Marco A

    2015-10-01

    Given the success of targeted agents in specific populations it is expected that some degree of molecular biomarker testing will become standard of care for many, if not all, cancers. To facilitate this, cancer centers worldwide are experimenting with targeted "panel" sequencing of selected mutations. Recent advances in genomic technology enable the generation of genome-scale data sets for individual patients. Recognizing the risk, inherent in panel sequencing, of failing to detect meaningful somatic alterations, we sought to establish processes to integrate data from whole-genome analysis (WGA) into routine cancer care. Between June 2012 and August 2014, 100 adult patients with incurable cancers consented to participate in the Personalized OncoGenomics (POG) study. Fresh tumor and blood samples were obtained and used for whole-genome and RNA sequencing. Computational approaches were used to identify candidate driver mutations, genes, and pathways. Diagnostic and drug information were then sought based on these candidate "drivers." Reports were generated and discussed weekly in a multidisciplinary team setting. Other multidisciplinary working groups were assembled to establish guidelines on the interpretation, communication, and integration of individual genomic findings into patient care. Of 78 patients for whom WGA was possible, results were considered actionable in 55 cases. In 23 of these 55 cases, the patients received treatments motivated by WGA. Our experience indicates that a multidisciplinary team of clinicians and scientists can implement a paradigm in which WGA is integrated into the care of late stage cancer patients to inform systemic therapy decisions.

  3. Enterobacter asburiae Strain L1: Complete Genome and Whole Genome Optical Mapping Analysis of a Quorum Sensing Bacterium

    Directory of Open Access Journals (Sweden)

    Yin Yin Lau

    2014-07-01

    Full Text Available Enterobacter asburiae L1 is a quorum sensing bacterium isolated from lettuce leaves. In this study, for the first time, the complete genome of E. asburiae L1 was sequenced using the single molecule real time sequencer (PacBio RSII and the whole genome sequence was verified by using optical genome mapping (OpGen technology. In our previous study, E. asburiae L1 has been reported to produce AHLs, suggesting the possibility of virulence factor regulation which is quorum sensing dependent. This evoked our interest to study the genome of this bacterium and here we present the complete genome of E. asburiae L1, which carries the virulence factor gene virK, the N-acyl homoserine lactone-based QS transcriptional regulator gene luxR and the N-acyl homoserine lactone synthase gene which we firstly named easI. The availability of the whole genome sequence of E. asburiae L1 will pave the way for the study of the QS-mediated gene expression in this bacterium. Hence, the importance and functions of these signaling molecules can be further studied in the hope of elucidating the mechanisms of QS-regulation in E. asburiae. To the best of our knowledge, this is the first documentation of both a complete genome sequence and the establishment of the molecular basis of QS properties of E. asburiae.

  4. Efficient Haplotype Inference Algorithms in One Whole Genome Scan for Pedigree Data with Non-genotyped Founders

    Institute of Scientific and Technical Information of China (English)

    Yongxi Cheng; Hadi Sabaa; Zhipeng Cai; Randy Goebel; Guohui Lin

    2009-01-01

    An efficient rule-based algorithm is presented for haplotype inference from general pedigree genotype data, with the assumption of no recombination. This algorithm generalizes previous algorithms to handle the cases where some pedigree founders are not genotyped, provided that for each nuclear family at least one parent is genotyped and each non-genotyped founder appears in exactly one nuclear family. The importance of this generalization lies in that such cases frequently happen in real data, because some founders may have passed away and their genotype data can no longer be collected. The algorithm runs in O(m3n3) time, where m is the number of single nucleotide polymorphism (SNP) loci under consideration and n is the number of genotyped members in the pedigree. This zero-recombination haplotyping algorithm is extended to a maximum parsimoniously haplotyping algorithm in one whole genome scan to minimize the total number of breakpoint sites, or equivalently, the number of maximal zero-recombination chromosomal regions. We show that such a whole genome scan haplotyping algorithm can be implemented in O(m3n3) time in a novel incremental fashion,here m denotes the total number of SNP loci along the chromosome.

  5. Evaluation ofA Single-reaction Method for Whole Genome Sequencing of Influenza A Virus using Next Generation Sequencing

    Institute of Scientific and Technical Information of China (English)

    ZOU Xiao Hui; CHEN Wen Bing; ZHAO Xiang; ZHU Wen Fei; YANG Lei; WANG Da Yan; SHU Yue Long

    2016-01-01

    ObjectiveTo evaluate a single-reaction genome amplification method, the multisegment reverse transcription-PCR (M-RTPCR), for its sensitivity to full genome sequencing of influenza A virus, and the ability to differentiate mix-subtype virus, using the next generation sequencing (NGS) platform. MethodsVirus genome copy was quantified and serially diluted to different titers, followed by amplification with the M-RTPCR method and sequencing on the NGS platform. Furthermore, we manually mixed two subtype viruses to different titer rate and amplified the mixed virus with the M-RTPCR protocol, followed by whole genome sequencing on the NGS platform. We also used clinical samples to test the method performance. ResultsThe M-RTPCR method obtained complete genome of testing virus at 125 copies/reaction and determined the virus subtype at titer of 25 copies/reaction. Moreover, the two subtypes in the mixed virus could be discriminated, even though these two virus copies differed by 200-fold using this amplification protocol. The sensitivity of this protocol we detected using virus RNA was also confirmed with clinical samples containing low-titer virus. ConclusionThe M-RTPCR is a robust and sensitive amplification method for whole genome sequencing of influenza A virus using NGS platform.

  6. Validation of whole genome amplification for analysis of the p53 tumor suppressor gene in limited amounts of tumor samples.

    Science.gov (United States)

    Hasmats, Johanna; Green, Henrik; Solnestam, Beata Werne; Zajac, Pawel; Huss, Mikael; Orear, Cedric; Validire, Pierre; Bjursell, Magnus; Lundeberg, Joakim

    2012-08-24

    Personalized cancer treatment requires molecular characterization of individual tumor biopsies. These samples are frequently only available in limited quantities hampering genomic analysis. Several whole genome amplification (WGA) protocols have been developed with reported varying representation of genomic regions post amplification. In this study we investigate region dropout using a φ29 polymerase based WGA approach. DNA from 123 lung cancers specimens and corresponding normal tissue were used and evaluated by Sanger sequencing of the p53 exons 5-8. To enable comparative analysis of this scarce material, WGA samples were compared with unamplified material using a pooling strategy of the 123 samples. In addition, a more detailed analysis of exon 7 amplicons were performed followed by extensive cloning and Sanger sequencing. Interestingly, by comparing data from the pooled samples to the individually sequenced exon 7, we demonstrate that mutations are more easily recovered from WGA pools and this was also supported by simulations of different sequencing coverage. Overall this data indicate a limited random loss of genomic regions supporting the use of whole genome amplification for genomic analysis.

  7. Whole genome sequencing and phylogenetic characterization of brown bullhead (Ameiurus nebulosus) origin ranavirus strains from independent disease outbreaks.

    Science.gov (United States)

    Fehér, Enikő; Doszpoly, Andor; Horváth, Balázs; Marton, Szilvia; Forró, Barbara; Farkas, Szilvia L; Bányai, Krisztián; Juhász, Tamás

    2016-11-01

    Ranaviruses are emerging pathogens associated with high mortality diseases in fish, amphibians and reptiles. Here we describe the whole genome sequence of two ranavirus isolates from brown bullhead (Ameiurus nebulosus) specimens collected in 2012 at two different locations in Hungary during independent mass mortality events. The two Hungarian isolates were highly similar to each other at the genome sequence level (99.9% nucleotide identity) and to a European sheatfish (Silurus glanis) origin ranavirus (ESV, 99.7%-99.9% nucleotide identity). The coding potential of the genomes of both Hungarian isolates, with 136 putative proteins, were shared with that of the ESV. The core genes commonly used in phylogenetic analysis of ranaviruses were not useful to differentiate the two brown bullhead ESV strains. However genome-wide distribution of point mutations and structural variations observed mainly in the non-coding regions of the genome suggested that the ranavirus disease outbreaks in Hungary were caused by different virus strains. At this moment, due to limited whole genome sequence data of ESV it is unclear whether these genomic changes are useful in molecular epidemiological monitoring of ranavirus disease outbreaks. Therefore, complete genome sequencing of further isolates will be needed to identify adequate genetic markers, if any, and demonstrate their utility in disease control and prevention.

  8. Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Alexander C Outhred

    Full Text Available Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways.We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants.Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade.Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster.

  9. From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing

    Science.gov (United States)

    Laurie, Steve; Fernandez‐Callejo, Marcos; Marco‐Sola, Santiago; Trotta, Jean‐Remi; Camps, Jordi; Chacón, Alejandro; Espinosa, Antonio; Gut, Marta; Gut, Ivo; Heath, Simon

    2016-01-01

    ABSTRACT As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next‐generation sequencing as standard practice in research and diagnostics. However, computing cost–performance ratio is not advancing at an equivalent rate. Therefore, it is essential to evaluate the robustness of the variant detection process taking into account the computing resources required. We have benchmarked six combinations of state‐of‐the‐art read aligners (BWA‐MEM and GEM3) and variant callers (FreeBayes, GATK HaplotypeCaller, SAMtools) on whole genome and whole exome sequencing data from the NA12878 human sample. Results have been compared between them and against the NIST Genome in a Bottle (GIAB) variants reference dataset. We report differences in speed of up to 20 times in some steps of the process and have observed that SNV, and to a lesser extent InDel, detection is highly consistent in 70% of the genome. SNV, and especially InDel, detection is less reliable in 20% of the genome, and almost unfeasible in the remaining 10%. These findings will aid in choosing the appropriate tools bearing in mind objectives, workload, and computing infrastructure available. PMID:27604516

  10. Whole-genome resequencing analyses of five pig breeds, including Korean wild and native, and three European origin breeds.

    Science.gov (United States)

    Choi, Jung-Woo; Chung, Won-Hyong; Lee, Kyung-Tai; Cho, Eun-Seok; Lee, Si-Woo; Choi, Bong-Hwan; Lee, Sang-Heon; Lim, Wonjun; Lim, Dajeong; Lee, Yun-Gyeong; Hong, Joon-Ki; Kim, Doo-Wan; Jeon, Hyeon-Jeong; Kim, Jiwoong; Kim, Namshin; Kim, Tae-Hun

    2015-08-01

    Pigs have been one of the most important sources of meat for humans, and their productivity has been substantially improved by recent strong selection. Here, we present whole-genome resequencing analyses of 55 pigs of five breeds representing Korean native pigs, wild boar and three European origin breeds. 1,673.1 Gb of sequence reads were mapped to the Swine reference assembly, covering ∼99.2% of the reference genome, at an average of ∼11.7-fold coverage. We detected 20,123,573 single-nucleotide polymorphisms (SNPs), of which 25.5% were novel. We extracted 35,458 of non-synonymous SNPs in 9,904 genes, which may contribute to traits of interest. The whole SNP sets were further used to access the population structures of the breeds, using multiple methodologies, including phylogenetic, similarity matrix, and population structure analysis. They showed clear population clusters with respect to each breed. Furthermore, we scanned the whole genomes to identify signatures of selection throughout the genome. The result revealed several promising loci that might underlie economically important traits in pigs, such as the CLDN1 and TWIST1 genes. These discoveries provide useful genomic information for further study of the discrete genetic mechanisms associated with economically important traits in pigs.

  11. An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform

    Directory of Open Access Journals (Sweden)

    Zhang Tongwu

    2011-11-01

    Full Text Available Abstract Motivation Complete organellar genome sequences (chloroplasts and mitochondria provide valuable resources and information for studying plant molecular ecology and evolution. As high-throughput sequencing technology advances, it becomes the norm that a shotgun approach is used to obtain complete genome sequences. Therefore, to assemble organellar sequences from the whole genome, shotgun reads are inevitable. However, associated techniques are often cumbersome, time-consuming, and difficult, because true organellar DNA is difficult to separate efficiently from nuclear copies, which have been transferred to the nucleus through the course of evolution. Results We report a new, rapid procedure for plant chloroplast and mitochondrial genome sequencing and assembly using the Roche/454 GS FLX platform. Plant cells can contain multiple copies of the organellar genomes, and there is a significant correlation between the depth of sequence reads in contigs and the number of copies of the genome. Without isolating organellar DNA from the mixture of nuclear and organellar DNA for sequencing, we retrospectively extracted assembled contigs of either chloroplast or mitochondrial sequences from the whole genome shotgun data. Moreover, the contig connection graph property of Newbler (a platform-specific sequence assembler ensures an efficient final assembly. Using this procedure, we assembled both chloroplast and mitochondrial genomes of a resurrection plant, Boea hygrometrica, with high fidelity. We also present information and a minimal sequence dataset as a reference for the assembly of other plant organellar genomes.

  12. Genotyping performance assessment of whole genome amplified DNA with respect to multiplexing level of assay and its period of storage.

    Directory of Open Access Journals (Sweden)

    Daniel W H Ho

    Full Text Available Whole genome amplification can faithfully amplify genomic DNA (gDNA with minimal bias and substantial genome coverage. Whole genome amplified DNA (wgaDNA has been tested to be workable for high-throughput genotyping arrays. However, issues about whether wgaDNA would decrease genotyping performance at increasing multiplexing levels and whether the storage period of wgaDNA would reduce genotyping performance have not been examined. Using the Sequenom MassARRAY iPLEX Gold assays, we investigated 174 single nucleotide polymorphisms for 3 groups of matched samples: group 1 of 20 gDNA samples, group 2 of 20 freshly prepared wgaDNA samples, and group 3 of 20 stored wgaDNA samples that had been kept frozen at -70°C for 18 months. MassARRAY is a medium-throughput genotyping platform with reaction chemistry different from those of high-throughput genotyping arrays. The results showed that genotyping performance (efficiency and accuracy of freshly prepared wgaDNA was similar to that of gDNA at various multiplexing levels (17-plex, 21-plex, 28-plex and 36-plex of the MassARRAY assays. However, compared with gDNA or freshly prepared wgaDNA, stored wgaDNA was found to give diminished genotyping performance (efficiency and accuracy due to potentially inferior quality. Consequently, no matter whether gDNA or wgaDNA was used, better genotyping efficiency would tend to have better genotyping accuracy.

  13. swDMR: A Sliding Window Approach to Identify Differentially Methylated Regions Based on Whole Genome Bisulfite Sequencing.

    Directory of Open Access Journals (Sweden)

    Zhen Wang

    Full Text Available DNA methylation is a widespread epigenetic modification that plays an essential role in gene expression through transcriptional regulation and chromatin remodeling. The emergence of whole genome bisulfite sequencing (WGBS represents an important milestone in the detection of DNA methylation. Characterization of differential methylated regions (DMRs is fundamental as well for further functional analysis. In this study, we present swDMR (http://sourceforge.net/projects/swDMR/ for the comprehensive analysis of DMRs from whole genome methylation profiles by a sliding window approach. It is an integrated tool designed for WGBS data, which not only implements accessible statistical methods to perform hypothesis test adapted to two or more samples without replicates, but false discovery rate was also controlled by multiple test correction. Downstream analysis tools were also provided, including cluster, annotation and visualization modules. In summary, based on WGBS data, swDMR can produce abundant information of differential methylated regions. As a convenient and flexible tool, we believe swDMR will bring us closer to unveil the potential functional regions involved in epigenetic regulation.

  14. Spatiotemporal characterizations of dengue virus in mainland China: insights into the whole genome from 1978 to 2011.

    Science.gov (United States)

    Zhang, Hao; Zhang, Yanru; Hamoudi, Rifat; Yan, Guiyun; Chen, Xiaoguang; Zhou, Yuanping

    2014-01-01

    Temporal-Spatial of dengue virus (DENV) analyses have been performed in previous epidemiological studies in mainland China, but few studies have examined the whole genome of the DENV. Herein, 40 whole genome sequences of DENVs isolated from mainland China were downloaded from GenBank. Phylogenetic analyses and evolutionary distances of the dengue serotypes 1 and 2 were calculated using 14 maximum likelihood trees created from individual genes and whole genome. Amino acid variations were also analyzed in the 40 sequences that included dengue serotypes 1, 2, 3 and 4, and they were grouped according to temporal and spatial differences. The results showed that none of the phylogenetic trees created from each individual gene were similar to the trees created using the complete genome and the evolutionary distances were variable with each individual gene. The number of amino acid variations was significantly different (p = 0.015) between DENV-1 and DENV-2 after 2001; seven mutations, the N290D, L402F and A473T mutations in the E gene region and the R101K, G105R, D340E and L349M mutations in the NS1 region of DENV-1, had significant substitutions, compared to the amino acids of DENV-2. Based on the spatial distribution using Guangzhou, including Foshan, as the indigenous area and the other regions as expanding areas, significant differences in the number of amino acid variations in the NS3 (p = 0.03) and NS1 (p = 0.024) regions and the NS2B (p = 0.016) and NS3 (p = 0.042) regions were found in DENV-1 and DENV-2. Recombination analysis showed no inter-serotype recombination events between the DENV-1 and DENV-2, while six and seven breakpoints were found in DENV-1 and DENV-2. Conclusively, the individual genes might not be suitable to analyze the evolution and selection pressure isolated in mainland China; the mutations in the amino acid residues in the E, NS1 and NS3 regions may play important roles in DENV-1 and DENV-2 epidemics.

  15. Spatiotemporal characterizations of dengue virus in mainland China: insights into the whole genome from 1978 to 2011.

    Directory of Open Access Journals (Sweden)

    Hao Zhang

    Full Text Available Temporal-Spatial of dengue virus (DENV analyses have been performed in previous epidemiological studies in mainland China, but few studies have examined the whole genome of the DENV. Herein, 40 whole genome sequences of DENVs isolated from mainland China were downloaded from GenBank. Phylogenetic analyses and evolutionary distances of the dengue serotypes 1 and 2 were calculated using 14 maximum likelihood trees created from individual genes and whole genome. Amino acid variations were also analyzed in the 40 sequences that included dengue serotypes 1, 2, 3 and 4, and they were grouped according to temporal and spatial differences. The results showed that none of the phylogenetic trees created from each individual gene were similar to the trees created using the complete genome and the evolutionary distances were variable with each individual gene. The number of amino acid variations was significantly different (p = 0.015 between DENV-1 and DENV-2 after 2001; seven mutations, the N290D, L402F and A473T mutations in the E gene region and the R101K, G105R, D340E and L349M mutations in the NS1 region of DENV-1, had significant substitutions, compared to the amino acids of DENV-2. Based on the spatial distribution using Guangzhou, including Foshan, as the indigenous area and the other regions as expanding areas, significant differences in the number of amino acid variations in the NS3 (p = 0.03 and NS1 (p = 0.024 regions and the NS2B (p = 0.016 and NS3 (p = 0.042 regions were found in DENV-1 and DENV-2. Recombination analysis showed no inter-serotype recombination events between the DENV-1 and DENV-2, while six and seven breakpoints were found in DENV-1 and DENV-2. Conclusively, the individual genes might not be suitable to analyze the evolution and selection pressure isolated in mainland China; the mutations in the amino acid residues in the E, NS1 and NS3 regions may play important roles in DENV-1 and DENV-2 epidemics.

  16. An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella

    Directory of Open Access Journals (Sweden)

    James B. Pettengill

    2014-10-01

    Full Text Available Comparative genomics based on whole genome sequencing (WGS is increasingly being applied to investigate questions within evolutionary and molecular biology, as well as questions concerning public health (e.g., pathogen outbreaks. Given the impact that conclusions derived from such analyses may have, we have evaluated the robustness of clustering individuals based on WGS data to three key factors: (1 next-generation sequencing (NGS platform (HiSeq, MiSeq, IonTorrent, 454, and SOLiD, (2 algorithms used to construct a SNP (single nucleotide polymorphism matrix (reference-based and reference-free, and (3 phylogenetic inference method (FastTreeMP, GARLI, and RAxML. We carried out these analyses on 194 whole genome sequences representing 107 unique Salmonella enterica subsp. enterica ser. Montevideo strains. Reference-based approaches for identifying SNPs produced trees that were significantly more similar to one another than those produced under the reference-free approach. Topologies inferred using a core matrix (i.e., no missing data were significantly more discordant than those inferred using a non-core matrix that allows for some missing data. However, allowing for too much missing data likely results in a high false discovery rate of SNPs. When analyzing the same SNP matrix, we observed that the more thorough inference methods implemented in GARLI and RAxML produced more similar topologies than FastTreeMP. Our results also confirm that reproducibility varies among NGS platforms where the MiSeq had the lowest number of pairwise differences among replicate runs. Our investigation into the robustness of clustering patterns illustrates the importance of carefully considering how data from different platforms are combined and analyzed. We found clear differences in the topologies inferred, and certain methods performed significantly better than others for discriminating between the highly clonal organisms investigated here. The methods supported by

  17. Whole Genome Sequencing in the Undergraduate Classroom: Outcomes and Lessons from a Pilot Course

    Directory of Open Access Journals (Sweden)

    Jennifer C. Drew

    2009-12-01

    Full Text Available The BIO2010 report challenged undergraduate institutions to prepare the next generation of researchers for the changing direction of biology that increasingly integrates advanced technologies, digital information, and large-scale analyses. In response, the Microbiology and Cell Science Department at the University of Florida developed a research-based course, “Bacterial Genome Sequencing.” The objectives were to teach undergraduates about genomics and original research by sequencing a bacterial genome, to develop scientific communication skills by writing and submitting the project results as a class effort, and to promote an interest in biological research, particularly genomics. The students worked together to sequence, assemble, and annotate the Enterobacter cloacae P101 genome. We assessed student learning, scientific communication skills, and student attitudes by a variety of methods including exams, writing assignments, oral presentations, pre- and postcourse surveys, and a final exit survey. Assessment results demonstrate student learning gains and positive attitudes regarding the course.

  18. Whole genome sequencing in the undergraduate classroom: outcomes and lessons from a pilot course.

    Science.gov (United States)

    Drew, Jennifer C; Triplett, Eric W

    2008-01-01

    The BIO2010 report challenged undergraduate institutions to prepare the next generation of researchers for the changing direction of biology that increasingly integrates advanced technologies, digital information, and large-scale analyses. In response, the Microbiology and Cell Science Department at the University of Florida developed a research-based course, "Bacterial Genome Sequencing." The objectives were to teach undergraduates about genomics and original research by sequencing a bacterial genome, to develop scientific communication skills by writing and submitting the project results as a class effort, and to promote an interest in biological research, particularly genomics. The students worked together to sequence, assemble, and annotate the Enterobacter cloacae P101 genome. We assessed student learning, scientific communication skills, and student attitudes by a variety of methods including exams, writing assignments, oral presentations, pre- and postcourse surveys, and a final exit survey. Assessment results demonstrate student learning gains and positive attitudes regarding the course.

  19. Transmission of Methicillin-Resistant Staphylococcus aureus via Deceased Donor Liver Transplantation Confirmed by Whole Genome Sequencing

    Science.gov (United States)

    Altman, D. R.; Sebra, R.; Hand, J.; Attie, O.; Deikus, G.; Carpini, K. W. D.; Patel, G.; Rana, M.; Arvelakis, A.; Grewal, P.; Dutta, J.; Rose, H.; Shopsin, B.; Daefler, S.; Schadt, E.; Kasarskis, A.; van Bakel, H.; Bashir, A.; Huprikar, S.

    2015-01-01

    Donor-derived bacterial infection is a recognized complication of solid organ transplantation (SOT). The present report describes the clinical details and successful outcome in a liver transplant recipient despite transmission of methicillin-resistant Staphylococcus aureus (MRSA) from a deceased donor with MRSA endocarditis and bacteremia. We further describe whole genome sequencing (WGS) and complete de novo assembly of the donor and recipient MRSA isolate genomes, which confirms that both isolates are genetically 100% identical. We propose that similar application of WGS techniques to future investigations of donor bacterial transmission would strengthen the definition of proven bacterial transmission in SOT, particularly in the presence of highly clonal bacteria such as MRSA. WGS will further improve our understanding of the epidemiology of bacterial transmission in SOT and the risk of adverse patient outcomes when it occurs. PMID:25250641

  20. Whole-genome amplification: a useful approach to characterize new genes in unculturable protozoan parasites such as Bonamia exitiosa.

    Science.gov (United States)

    Prado-Alvarez, Maria; Couraleau, Yann; Chollet, Bruno; Tourbiez, Delphine; Arzul, Isabelle

    2015-10-01

    Bonamia exitiosa is an intracellular parasite (Haplosporidia) that has been associated with mass mortalities in oyster populations in the Southern hemisphere. This parasite was recently detected in the Northern hemisphere including Europe. Some representatives of the Bonamia genus have not been well categorized yet due to the lack of genomic information. In the present work, we have applied Whole-Genome Amplification (WGA) technique in order to characterize the actin gene in the unculturable protozoan B. exitiosa. This is the first protein coding gene described in this species. Molecular analysis revealed that B. exitiosa actin is more similar to Bonamia ostreae actin gene-1. Actin phylogeny placed the Bonamia sp. infected oysters in the same clade where the herein described B. exitiosa actin resolved, offering novel information about the classification of the genus. Our results showed that WGA methodology is a promising and valuable technique to be applied to unculturable protozoans whose genomic material is limited.

  1. Inability of ‘Whole Genome Amplification’ to Improve Success Rates for the Biomolecular Detection of Tuberculosis in Archaeological Samples

    Science.gov (United States)

    Forst, Jannine; Brown, Terence A.

    2016-01-01

    We assessed the ability of whole genome amplification (WGA) to improve the efficiency of downstream polymerase chain reactions (PCRs) directed at ancient DNA (aDNA) of members of the Mycobacterium tuberculosis complex (MTBC). Using extracts from a variety of bones and a tooth from human skeletons with or without lesions indicative of tuberculosis, from multiple time periods, we obtained inconsistent results. We conclude that WGA does not provide any advantage in studies of MTBC aDNA. The sporadic nature of our results are probably due to the fact that WGA is itself a PCR-based procedure which, although designed to deal with fragmented DNA, might be inefficient with the low concentration of templates in an aDNA extract. As such, WGA is subject to similar, if not the same, restrictions as PCR when applied to aDNA. PMID:27654468

  2. Mycobacterium tuberculosis and whole genome sequencing: a practical guide and online tools available for the clinical microbiologist.

    Science.gov (United States)

    Satta, G; Atzeni, A; McHugh, T D

    2017-02-01

    Whole genome sequencing (WGS) has the potential to revolutionize the diagnosis of Mycobacterium tuberculosis infection but the lack of bioinformatic expertise among clinical microbiologists is a barrier for adoption. Software products for analysis should be simple, free of charge, able to accept data directly from the sequencer (FASTQ files) and to provide the basic functionalities all-in-one. The main aim of this narrative review is to provide a practical guide for the clinical microbiologist, with little or no practical experience of WGS analysis, with a specific focus on software products tailor-made for M. tuberculosis analysis. With sequencing performed by an external provider, it is now feasible to implement WGS analysis in the routine clinical practice of any microbiology laboratory, with the potential to detect resistance weeks before traditional phenotypic culture methods, but the clinical microbiologist should be aware of the limitations of this approach.

  3. Identification of RAN1 orthologue associated with sex determination through whole genome sequencing analysis in fig (Ficus carica L.).

    Science.gov (United States)

    Mori, Kazuki; Shirasawa, Kenta; Nogata, Hitoshi; Hirata, Chiharu; Tashiro, Kosuke; Habu, Tsuyoshi; Kim, Sangwan; Himeno, Shuichi; Kuhara, Satoru; Ikegami, Hidetoshi

    2017-01-25

    With the aim of identifying sex determinants of fig, we generated the first draft genome sequence of fig and conducted the subsequent analyses. Linkage analysis with a high-density genetic map established by a restriction-site associated sequencing technique, and genome-wide association study followed by whole-genome resequencing analysis identified two missense mutations in RESPONSIVE-TO-ANTAGONIST1 (RAN1) orthologue encoding copper-transporting ATPase completely associated with sex phenotypes of investigated figs. This result suggests that RAN1 is a possible sex determinant candidate in the fig genome. The genomic resources and genetic findings obtained in this study can contribute to general understanding of Ficus species and provide an insight into fig's and plant's sex determination system.

  4. Genetic Diversity and Fingerprint Profiles of Commercial Lentinula edodes Cultivars Based on SSR Markers Developed from the Whole Genome Sequence

    Institute of Scientific and Technical Information of China (English)

    ZHANG Dan; SONG Chunyan; ZHANG Lujun; WU Ping; BAO Dapeng; SHANG Xiaodong; TAN Qi

    2014-01-01

    Lentinula edodes is an important cultivated mushroom in China, and accurate and reliable identification of individual cultivars is a prerequisite for successful cultivation and variety protection.In this study,the whole genome sequence of L.edodes was used to generate 200 simple sequence repeat (SSR) markers for delineating 25 commercial cultivars and for determining their genetic diversity.Our data revealed a relatively high level of genetic similarity among the cultivars,with average,minimum and maximum genetic similarity coefficient values of 0.776,0.567 and 1.000,respectively.Seven SSR primer pairs delineated eleven of the cultivars (Cr-02,Minfeng-1,Xianggu 241-4,Senyuan-1,Senyuan-8404,Xiang-9,Guangxiang-51,Huaxiang-5,L952,L9319 and L808)based on their unique multilocus SSR fingerprint profiles.

  5. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products.

    Science.gov (United States)

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-07-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.

  6. Performance Evaluation of NIPT in Detection of Chromosomal Copy Number Variants Using Low-Coverage Whole-Genome Sequencing of Plasma DNA

    DEFF Research Database (Denmark)

    Liu, Hongtai; Gao, Ya; Hu, Zhiyang;

    2016-01-01

    Objectives The aim of this study was to assess the performance of noninvasively prenatal testing (NIPT) for fetal copy number variants (CNVs) in clinical samples, using a whole-genome sequencing method. Method A total of 919 archived maternal plasma samples with karyotyping/microarray results......, including 33 CNVs samples and 886 normal samples from September 1, 2011 to May 31, 2013, were enrolled in this study. The samples were randomly rearranged and blindly sequenced by low-coverage (about 7M reads) whole-genome sequencing of plasma DNA. Fetal CNVs were detected by Fetal Copy-number Analysis...... in the study. Ten false positive results and two false negative results were obtained. The sensitivity and specificity of detection deletions/duplications were 84.21% and 98.42%, respectively. Conclusion Whole-genome sequencing-based NIPT has high performance in detecting genome-wide CNVs, in particular > 10Mb...

  7. Whole-genome sequences of influenza A(H3N2 viruses isolated from Brazilian patients with mild illness during the 2014 season

    Directory of Open Access Journals (Sweden)

    Paola Cristina Resende

    2015-02-01

    Full Text Available The influenza A(H3N2 virus has circulated worldwide for almost five decades and is the dominant subtype in most seasonal influenza epidemics, as occurred in the 2014 season in South America. In this study we evaluate five whole genome sequences of influenza A(H3N2 viruses detected in patients with mild illness collected from January-March 2014. To sequence the genomes, a new generation sequencing (NGS protocol was performed using the Ion Torrent PGM platform. In addition to analysing the common genes, haemagglutinin, neuraminidase and matrix, our work also comprised internal genes. This was the first report of a whole genome analysis with Brazilian influenza A(H3N2 samples. Considerable amino acid variability was encountered in all gene segments, demonstrating the importance of studying the internal genes. NGS of whole genomes in this study will facilitate deeper virus characterisation, contributing to the improvement of influenza strain surveillance in Brazil.

  8. A rural worker infected with a bovine-prevalent genotype of Campylobacter fetus subsp. fetus supports zoonotic transmission and inconsistency of MLST and whole-genome typing.

    Science.gov (United States)

    Iraola, G; Betancor, L; Calleros, L; Gadea, P; Algorta, G; Galeano, S; Muxi, P; Greif, G; Pérez, R

    2015-08-01

    Whole-genome characterisation in clinical microbiology enables to detect trends in infection dynamics and disease transmission. Here, we report a case of bacteraemia due to Campylobacter fetus subsp. fetus in a rural worker under cancer treatment that was diagnosed with cellulitis; the patient was treated with antibiotics and recovered. The routine typing methods were not able to identify the microorganism causing the infection, so it was further analysed by molecular methods and whole-genome sequencing. The multi-locus sequence typing (MLST) revealed the presence of the bovine-associated ST-4 genotype. Whole-genome comparisons with other C. fetus strains revealed an inconsistent phylogenetic position based on the core genome, discordant with previous ST-4 strains. To the best of our knowledge, this is the first C. fetus subsp. fetus carrying the ST-4 isolated from humans and represents a probable case of zoonotic transmission from cattle.

  9. Whole genome expression profiling shows that BRG1 transcriptionally regulates UV inducible genes and other novel targets in human cells.

    Science.gov (United States)

    Zhang, Ling; Nemzow, Leah; Chen, Hua; Hu, Jennifer J; Gong, Feng

    2014-01-01

    UV irradiation is known to cause cyclobutane pyrimidine dimers (CPDs) and pyrimidine (6-4) pyrimidone photoproducts (6-4PPs), and plays a large role in the development of cancer. Tumor suppression, through DNA repair and proper cell cycle regulation, is an integral factor in maintaining healthy cells and preventing development of cancer. Transcriptional regulation of the genes involved in the various tumor suppression pathways is essential for them to be expressed when needed and to function properly. BRG1, an ATPase catalytic subunit of the SWI/SNF chromatin remodeling complex, has been identified as a tumor suppressor protein, as it has been shown to play a role in Nucleotide Excision Repair (NER) of CPDs, suppress apoptosis, and restore checkpoint deficiency, in response to UV exposure. Although BRG1 has been shown to regulate transcription of some genes that are instrumental in proper DNA damage repair and cell cycle maintenance in response to UV, its role in transcriptional regulation of the whole genome in response to UV has not yet been elucidated. With whole genome expression profiling in SW13 cells, we show that upon UV induction, BRG1 regulates transcriptional expression of many genes involved in cell stress response. Additionally, our results also highlight BRG1's general role as a master regulator of the genome, as it transcriptionally regulates approximately 4.8% of the human genome, including expression of genes involved in many pathways. RT-PCR and ChIP were used to validate our genome expression analysis. Importantly, our study identifies several novel transcriptional targets of BRG1, such as ATF3. Thus, BRG1 has a larger impact on human genome expression than previously thought, and our studies will provide inroads for future analysis of BRG1's role in gene regulation.

  10. Reconstructing the demographic history of the human lineage using whole-genome sequences from human and three great apes.

    Science.gov (United States)

    Hara, Yuichiro; Imanishi, Tadashi; Satta, Yoko

    2012-01-01

    The demographic history of human would provide helpful information for identifying the evolutionary events that shaped the humanity but remains controversial even in the genomic era. To settle the controversies, we inferred the speciation times (T) and ancestral population sizes (N) in the lineage leading to human and great apes based on whole-genome alignment. A coalescence simulation determined the sizes of alignment blocks and intervals between them required to obtain recombination-free blocks with a high frequency. This simulation revealed that the size of the block strongly affects the parameter inference, indicating that recombination is an important factor for achieving optimum parameter inference. From the whole genome alignments (1.9 giga-bases) of human (H), chimpanzee (C), gorilla (G), and orangutan, 100-bp alignment blocks separated by ≥5-kb intervals were sampled and subjected to estimate τ = μT and θ = 4μgN using the Markov chain Monte Carlo method, where μ is the mutation rate and g is the generation time. Although the estimated τ(HC) differed across chromosomes, τ(HC) and τ(HCG) were strongly correlated across chromosomes, indicating that variation in τ is subject to variation in μ, rather than T, and thus, all chromosomes share a single speciation time. Subsequently, we estimated Ts of the human lineage from chimpanzee, gorilla, and orangutan to be 6.0-7.6, 7.6-9.7, and 15-19 Ma, respectively, assuming variable μ across lineages and chromosomes. These speciation times were consistent with the fossil records. We conclude that the speciation times in our recombination-free analysis would be conclusive and the speciation between human and chimpanzee was a single event.

  11. Capacitive DNA sensor for rapid and sensitive detection of whole genome human herpes virus-1 dsDNA in serum.

    Science.gov (United States)

    Cheng, Cheng; Oueslati, Rania; Wu, Jayne; Chen, Jiangang; Eda, Shigetoshi

    2017-03-22

    This work presents a rapid, highly sensitive, low-cost and specific capacitive DNA sensor for detection of whole genome human herpes virus-1 DNA. This sensor is capable of direct DNA detection with a response time of 30 seconds, and it can be used to test standard buffer or serum samples. The sensing approach for DNA detection is based on AC electrokinetics. By applying an inhomogeneous AC electric field on sensor electrodes, positive dielectrophoresis is induced to accelerate DNA hybridization. The same applied AC signal also directly measures the hybridization of target with the probe on the sensor surface. Experiments are conducted to optimize the AC signal, as well as the buffers for probe immobilization and target DNA hybridization. The assay is highly sensitive and specific, with no response to human herpes virus-2 DNA at 5 ng/mL and a limit of detection of 1.0 pg/mL (6.5 copies/μL or 10.7 aM) in standard buffer. When testing the dsDNA spiked in human serum samples, the sensor yields a limit of detection of 20.0 pg/mL (129.5 copies/μL or 0.21 fM) in neat serum. In this work, the target is whole genome dsDNA, consequently the test can be performed without then use of enzyme or amplification, which considerably simplifies the sensor operation and is highly suitable for point-of-care disease diagnosis. This article is protected by copyright. All rights reserved.

  12. Whole genome sequence of Treponema pallidum ssp. pallidum, strain Mexico A, suggests recombination between yaws and syphilis strains.

    Directory of Open Access Journals (Sweden)

    Helena Pětrošová

    Full Text Available BACKGROUND: Treponema pallidum ssp. pallidum (TPA, the causative agent of syphilis, and Treponema pallidum ssp. pertenue (TPE, the causative agent of yaws, are closely related spirochetes causing diseases with distinct clinical manifestations. The TPA Mexico A strain was isolated in 1953 from male, with primary syphilis, living in Mexico. Attempts to cultivate TPA Mexico A strain under in vitro conditions have revealed lower growth potential compared to other tested TPA strains. METHODOLOGY/PRINCIPAL FINDINGS: The complete genome sequence of the TPA Mexico A strain was determined using the Illumina sequencing technique. The genome sequence assembly was verified using the whole genome fingerprinting technique and the final sequence was annotated. The genome size of the Mexico A strain was determined to be 1,140,038 bp with 1,035 predicted ORFs. The Mexico A genome sequence was compared to the whole genome sequences of three TPA (Nichols, SS14 and Chicago and three TPE (CDC-2, Samoa D and Gauthier strains. No large rearrangements in the Mexico A genome were found and the identified nucleotide changes occurred most frequently in genes encoding putative virulence factors. Nevertheless, the genome of the Mexico A strain, revealed two genes (TPAMA_0326 (tp92 and TPAMA_0488 (mcp2-1 which combine TPA- and TPE- specific nucleotide sequences. Both genes were found to be under positive selection within TPA strains and also between TPA and TPE strains. CONCLUSIONS/SIGNIFICANCE: The observed mosaic character of the TPAMA_0326 and TPAMA_0488 loci is likely a result of inter-strain recombination between TPA and TPE strains during simultaneous infection of a single host suggesting horizontal gene transfer between treponemal subspecies.

  13. Rapid identification of genetic modifications in Bacillus anthracis using whole genome draft sequences generated by 454 pyrosequencing.

    Directory of Open Access Journals (Sweden)

    Peter E Chen

    Full Text Available BACKGROUND: The anthrax letter attacks of 2001 highlighted the need for rapid identification of biothreat agents not only for epidemiological surveillance of the intentional outbreak but also for implementing appropriate countermeasures, such as antibiotic treatment, in a timely manner to prevent further casualties. It is clear from the 2001 cases that survival may be markedly improved by administration of antimicrobial therapy during the early symptomatic phase of the illness; i.e., within 3 days of appearance of symptoms. Microbiological detection methods are feasible only for organisms that can be cultured in vitro and cannot detect all genetic modifications with the exception of antibiotic resistance. Currently available immuno or nucleic acid-based rapid detection assays utilize known, organism-specific proteins or genomic DNA signatures respectively. Hence, these assays lack the ability to detect novel natural variations or intentional genetic modifications that circumvent the targets of the detection assays or in the case of a biological attack using an antibiotic resistant or virulence enhanced Bacillus anthracis, to advise on therapeutic treatments. METHODOLOGY/PRINCIPAL FINDINGS: We show here that the Roche 454-based pyrosequencing can generate whole genome draft sequences of deep and broad enough coverage of a bacterial genome in less than 24 hours. Furthermore, using the unfinished draft sequences, we demonstrate that unbiased identification of known as well as heretofore-unreported genetic modifications that include indels and single nucleotide polymorphisms conferring antibiotic and phage resistances is feasible within the next 12 hours. CONCLUSIONS/SIGNIFICANCE: Second generation sequencing technologies have paved the way for sequence-based rapid identification of both known and previously undocumented genetic modifications in cultured, conventional and newly emerging biothreat agents. Our findings have significant implications in

  14. Whole-Genome Resequencing and Transcriptomic Analysis to Identify Genes Involved in Leaf-Color Diversity in Ornamental Rice Plants

    Science.gov (United States)

    Shin, Younhee; Lim, Hye-Min; Lee, Gang-Seob; Kim, A-Ram; Lee, Tae-Ho; Lee, Jae-Hee; Park, Dong-Suk; Yoo, Seungil; Kim, Yong-Hwan; Kim, Yong-Kab

    2015-01-01

    Rice field art is a large-scale art form in which people design rice fields using various kinds of ornamental rice plants with different leaf colors. Leaf color-related genes play an important role in the study of chlorophyll biosynthesis, chloroplast structure and function, and anthocyanin biosynthesis. Despite the role of different metabolites in the traditional relationship between leaf and color, comprehensive color-specific metabolite studies of ornamental rice have been limited. We performed whole-genome resequencing and transcriptomic analysis of regulatory patterns and genetic diversity among different rice cultivars to discover new genetic mechanisms that promote enhanced levels of various leaf colors. We resequenced the genomes of 10 rice leaf-color accessions to an average of 40× reads depth and >95% coverage and performed 30 RNA-seq experiments using the 10 rice accessions sampled at three developmental stages. The sequencing results yielded a total of 1,814 × 106 reads and identified an average of 713,114 SNPs per rice accession. Based on our analysis of the DNA variation and gene expression, we selected 47 candidate genes. We used an integrated analysis of the whole-genome resequencing data and the RNA-seq data to divide the candidate genes into two groups: genes related to macronutrient (i.e., magnesium and sulfur) transport and genes related to flavonoid pathways, including anthocyanidin biosynthesis. We verified the candidate genes with quantitative real-time PCR using transgenic T-DNA insertion mutants. Our study demonstrates the potential of integrated screening methods combined with genetic-variation and transcriptomic data to isolate genes involved in complex biosynthetic networks and pathways. PMID:25897514

  15. Whole genome sequencing of mutation accumulation lines reveals a low mutation rate in the social amoeba Dictyostelium discoideum.

    Directory of Open Access Journals (Sweden)

    Gerda Saxer

    Full Text Available Spontaneous mutations play a central role in evolution. Despite their importance, mutation rates are some of the most elusive parameters to measure in evolutionary biology. The combination of mutation accumulation (MA experiments and whole-genome sequencing now makes it possible to estimate mutation rates by directly observing new mutations at the molecular level across the whole genome. We performed an MA experiment with the social amoeba Dictyostelium discoideum and sequenced the genomes of three randomly chosen lines using high-throughput sequencing to estimate the spontaneous mutation rate in this model organism. The mitochondrial mutation rate of 6.76×10(-9, with a Poisson confidence interval of 4.1×10(-9 - 9.5×10(-9, per nucleotide per generation is slightly lower than estimates for other taxa. The mutation rate estimate for the nuclear DNA of 2.9×10(-11, with a Poisson confidence interval ranging from 7.4×10(-13 to 1.6×10(-10, is the lowest reported for any eukaryote. These results are consistent with low microsatellite mutation rates previously observed in D. discoideum and low levels of genetic variation observed in wild D. discoideum populations. In addition, D. discoideum has been shown to be quite resistant to DNA damage, which suggests an efficient DNA-repair mechanism that could be an adaptation to life in soil and frequent exposure to intracellular and extracellular mutagenic compounds. The social aspect of the life cycle of D. discoideum and a large portion of the genome under relaxed selection during vegetative growth could also select for a low mutation rate. This hypothesis is supported by a significantly lower mutation rate per cell division in multicellular eukaryotes compared with unicellular eukaryotes.

  16. Whole-genome phylogenomic heterogeneity of Neisseria gonorrhoeae isolates with decreased cephalosporin susceptibility collected in Canada between 1989 and 2013.

    Science.gov (United States)

    Demczuk, Walter; Lynch, Tarah; Martin, Irene; Van Domselaar, Gary; Graham, Morag; Bharat, Amrita; Allen, Vanessa; Hoang, Linda; Lefebvre, Brigitte; Tyrrell, Greg; Horsman, Greg; Haldane, David; Garceau, Richard; Wylie, John; Wong, Tom; Mulvey, Michael R

    2015-01-01

    A large-scale, whole-genome comparison of Canadian Neisseria gonorrhoeae isolates with high-level cephalosporin MICs was used to demonstrate a genomic epidemiology approach to investigate strain relatedness and dynamics. Although current typing methods have been very successful in tracing short-chain transmission of gonorrheal disease, investigating the temporal evolutionary relationships and geographical dissemination of highly clonal lineages requires enhanced resolution only available through whole-genome sequencing (WGS). Phylogenomic cluster analysis grouped 169 Canadian strains into 12 distinct clades. While some N. gonorrhoeae multiantigen sequence types (NG-MAST) agreed with specific phylogenomic clades or subclades, other sequence types (ST) and closely related groups of ST were widely distributed among clades. Decreased susceptibility to extended-spectrum cephalosporins (ESC-DS) emerged among a group of diverse strains in Canada during the 1990s with a variety of nonmosaic penA alleles, followed in 2000/2001 with the penA mosaic X allele and then in 2007 with ST1407 strains with the penA mosaic XXXIV allele. Five genetically distinct ESC-DS lineages were associated with penA mosaic X, XXXV, and XXXIV alleles and nonmosaic XII and XIII alleles. ESC-DS with coresistance to azithromycin was observed in 5 strains with 23S rRNA C2599T or A2143G mutations. As the costs associated with WGS decline and analysis tools are streamlined, WGS can provide a more thorough understanding of strain dynamics, facilitate epidemiological studies to better resolve social networks, and improve surveillance to optimize treatment for gonorrheal infections.

  17. Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance

    KAUST Repository

    Phelan, Jody

    2016-03-23

    Background Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure and interaction modelling are used to understand the functional effects of putative mutations and provide insight into the molecular mechanisms leading to resistance. Methods To investigate the potential utility of these approaches, we analysed the genomes of 144 Mycobacterium tuberculosis clinical isolates from The Special Programme for Research and Training in Tropical Diseases (TDR) collection sourced from 20 countries in four continents. A genome-wide approach was applied to 127 isolates to identify polymorphisms associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. In addition, the effect of identified candidate mutations on protein stability and interactions was assessed quantitatively with well-established computational methods. Results The analysis revealed that mutations in the genes rpoB (rifampicin), katG (isoniazid), inhA-promoter (isoniazid), rpsL (streptomycin) and embB (ethambutol) were responsible for the majority of resistance observed. A subset of the mutations identified in rpoB and katG were predicted to affect protein stability. Further, a strong direct correlation was observed between the minimum inhibitory concentration values and the distance of the mutated residues in the three-dimensional structures of rpoB and katG to their respective drugs binding sites. Conclusions Using the TDR resource, we demonstrate the usefulness of whole genome association and convergent evolution approaches to detect known and potentially novel mutations associated with drug resistance. Further, protein structural modelling could provide a means of predicting the impact of polymorphisms on drug efficacy in the absence of phenotypic data. These approaches could ultimately lead to novel resistance

  18. Whole-genome resequencing and transcriptomic analysis to identify genes involved in leaf-color diversity in ornamental rice plants.

    Directory of Open Access Journals (Sweden)

    Chang-Kug Kim

    Full Text Available Rice field art is a large-scale art form in which people design rice fields using various kinds of ornamental rice plants with different leaf colors. Leaf color-related genes play an important role in the study of chlorophyll biosynthesis, chloroplast structure and function, and anthocyanin biosynthesis. Despite the role of different metabolites in the traditional relationship between leaf and color, comprehensive color-specific metabolite studies of ornamental rice have been limited. We performed whole-genome resequencing and transcriptomic analysis of regulatory patterns and genetic diversity among different rice cultivars to discover new genetic mechanisms that promote enhanced levels of various leaf colors. We resequenced the genomes of 10 rice leaf-color accessions to an average of 40× reads depth and >95% coverage and performed 30 RNA-seq experiments using the 10 rice accessions sampled at three developmental stages. The sequencing results yielded a total of 1,814 × 106 reads and identified an average of 713,114 SNPs per rice accession. Based on our analysis of the DNA variation and gene expression, we selected 47 candidate genes. We used an integrated analysis of the whole-genome resequencing data and the RNA-seq data to divide the candidate genes into two groups: genes related to macronutrient (i.e., magnesium and sulfur transport and genes related to flavonoid pathways, including anthocyanidin biosynthesis. We verified the candidate genes with quantitative real-time PCR using transgenic T-DNA insertion mutants. Our study demonstrates the potential of integrated screening methods combined with genetic-variation and transcriptomic data to isolate genes involved in complex biosynthetic networks and pathways.

  19. Accurate Breakpoint Mapping in Apparently Balanced Translocation Families with Discordant Phenotypes Using Whole Genome Mate-Pair Sequencing

    Science.gov (United States)

    Aristidou, Constantia; Koufaris, Costas; Theodosiou, Athina; Bak, Mads; Mehrjouy, Mana M.; Behjati, Farkhondeh; Tanteles, George; Christophidou-Anastasiadou, Violetta; Tommerup, Niels

    2017-01-01

    Familial apparently balanced translocations (ABTs) segregating with discordant phenotypes are extremely challenging for interpretation and counseling due to the scarcity of publications and lack of routine techniques for quick investigation. Recently, next generation sequencing has emerged as an efficacious methodology for precise detection of translocation breakpoints. However, studies so far have mainly focused on de novo translocations. The present study focuses specifically on familial cases in order to shed some light to this diagnostic dilemma. Whole-genome mate-pair sequencing (WG-MPS) was applied to map the breakpoints in nine two-way ABT carriers from four families. Translocation breakpoints and patient-specific structural variants were validated by Sanger sequencing and quantitative Real Time PCR, respectively. Identical sequencing patterns and breakpoints were identified in affected and non-affected members carrying the same translocations. PTCD1, ATP5J2-PTCD1, CADPS2, and STPG1 were disrupted by the translocations in three families, rendering them initially as possible disease candidate genes. However, subsequent mutation screening and structural variant analysis did not reveal any pathogenic mutations or unique variants in the affected individuals that could explain the phenotypic differences between carriers of the same translocations. In conclusion, we suggest that NGS-based methods, such as WG-MPS, can be successfully used for detailed mapping of translocation breakpoints, which can also be used in routine clinical investigation of ABT cases. Unlike de novo translocations, no associations were determined here between familial two-way ABTs and the phenotype of the affected members, in which the presence of cryptic imbalances and complex chromosomal rearrangements has been excluded. Future whole-exome or whole-genome sequencing will potentially reveal unidentified mutations in the patients underlying the discordant phenotypes within each family. In

  20. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Directory of Open Access Journals (Sweden)

    Sathishkumar Natarajan

    Full Text Available Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L. and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs, 1.9 million InDels, and 182,398 putative structural variations (SVs. Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  1. A dense linkage map for Chinook salmon (Oncorhynchus tshawytscha) reveals variable chromosomal divergence after an ancestral whole genome duplication event.

    Science.gov (United States)

    Brieuc, Marine S O; Waters, Charles D; Seeb, James E; Naish, Kerry A

    2014-03-20

    Comparisons between the genomes of salmon species reveal that they underwent extensive chromosomal rearrangements following whole genome duplication that occurred in their lineage 58-63 million years ago. Extant salmonids are diploid, but occasional pairing between homeologous chromosomes exists in males. The consequences of re-diploidization can be characterized by mapping the position of duplicated loci in such species. Linkage maps are also a valuable tool for genome-wide applications such as genome-wide association studies, quantitative trait loci mapping or genome scans. Here, we investigated chromosomal evolution in Chinook salmon (Oncorhynchus tshawytscha) after genome duplication by mapping 7146 restriction-site associated DNA loci in gynogenetic haploid, gynogenetic diploid, and diploid crosses. In the process, we developed a reference database of restriction-site associated DNA loci for Chinook salmon comprising 48528 non-duplicated loci and 6409 known duplicated loci, which will facilitate locus identification and data sharing. We created a very dense linkage map anchored to all 34 chromosomes for the species, and all arms were identified through centromere mapping. The map positions of 799 duplicated loci revealed that homeologous pairs have diverged at different rates following whole genome duplication, and that degree of differentiation along arms was variable. Many of the homeologous pairs with high numbers of duplicated markers appear conserved with other salmon species, suggesting that retention of conserved homeologous pairing in some arms preceded species divergence. As chromosome arms are highly conserved across species, the major resources developed for Chinook salmon in this study are also relevant for other related species.

  2. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Science.gov (United States)

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  3. Whole genome sequencing and evolutionary analysis of human respiratory syncytial virus A and B from Milwaukee, WI 1998-2010.

    Directory of Open Access Journals (Sweden)

    Cecilia Rebuffo-Scheer

    Full Text Available BACKGROUND: Respiratory Syncytial Virus (RSV is the leading cause of lower respiratory-tract infections in infants and young children worldwide. Despite this, only six complete genome sequences of original strains have been previously published, the most recent of which dates back 35 and 26 years for RSV group A and group B respectively. METHODOLOGY/PRINCIPAL FINDINGS: We present a semi-automated sequencing method allowing for the sequencing of four RSV whole genomes simultaneously. We were able to sequence the complete coding sequences of 13 RSV A and 4 RSV B strains from Milwaukee collected from 1998-2010. Another 12 RSV A and 5 RSV B strains sequenced in this study cover the majority of the genome. All RSV A and RSV B sequences were analyzed by neighbor-joining, maximum parsimony and Bayesian phylogeny methods. Genetic diversity was high among RSV A viruses in Milwaukee including the circulation of multiple genotypes (GA1, GA2, GA5, GA7 with GA2 persisting throughout the 13 years of the study. However, RSV B genomes showed little variation with all belonging to the BA genotype. For RSV A, the same evolutionary patterns and clades were seen consistently across the whole genome including all intergenic, coding, and non-coding regions sequences. CONCLUSIONS/SIGNIFICANCE: The sequencing strategy presented in this work allows for RSV A and B genomes to be sequenced simultaneously in two working days and with a low cost. We have significantly increased the amount of genomic data that is available for both RSV A and B, providing the basic molecular characteristics of RSV strains circulating in Milwaukee over the last 13 years. This information can be used for comparative analysis with strains circulating in other communities around the world which should also help with the development of new strategies for control of RSV, specifically vaccine development and improvement of RSV diagnostics.

  4. A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome

    Science.gov (United States)

    Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a mor...

  5. Proficiency Testing for Bacterial Whole Genome Sequencing: An End-User Survey of Current Capabilities, Requirements and Priorities

    DEFF Research Database (Denmark)

    Moran-Gilad, Jacob; Sintchenko, Vitali; Karlsmose Pedersen, Susanne

    2015-01-01

    of costs. The priority pathogens reported by respondents reflected the key drivers for NGS use (high burden disease and ‘high profile’ pathogens). The performance of and participation in PT was perceived as important by most respondents. The wide range of sequencing and bioinformatics practices reported...

  6. Genetic signatures of Mycobacterium tuberculosis Nonthaburi genotype revealed by whole genome analysis of isolates from tuberculous meningitis patients in Thailand.

    Science.gov (United States)

    Coker, Olabisi Oluwabukola; Chaiprasert, Angkana; Ngamphiw, Chumpol; Tongsima, Sissades; Regmi, Sanjib Mani; Clark, Taane G; Ong, Rick Twee Hee; Teo, Yik-Ying; Prammananan, Therdsak; Palittapongarnpim, Prasit

    2016-01-01

    Genome sequencing plays a key role in understanding the genetic diversity of Mycobacterium tuberculosis (M.tb). The genotype-specific character of M. tb contributes to tuberculosis severity and emergence of drug resistance. Strains of M. tb complex can be classified into seven lineages. The Nonthaburi (NB) genotype, belonging to the Indo-Oceanic lineage (lineage 1), has a unique spoligotype and IS6110-RFLP pattern but has not previously undergone a detailed whole genome analysis. In addition, there is not much information available on the whole genome analysis of M. tb isolates from tuberculous meningitis (TBM) patients in public databases. Isolates CSF3053, 46-5069 and 43-13838 of NB genotype were obtained from the cerebrospinal fluids of TBM Thai patients in Siriraj Hospital, Bangkok. The whole genomes were subjected to high throughput sequencing. The sequence data of each isolate were assembled into draft genome. The sequences were also aligned to reference genome, to determine genomic variations. Single nucleotide polymorphisms (SNPs) were obtained and grouped according to the functions of the genes containing them. They were compared with SNPs from 1,601 genomes, representing the seven lineages of M. tb complex, to determine the uniqueness of NB genotype. Susceptibility to first-line, second-line and other antituberculosis drugs were determined and related to the SNPs previously reported in drug-resistant related genes. The assembled genomes have an average size of 4,364,461 bp, 4,154 genes, 48 RNAs and 64 pseudogenes. A 500 base pairs deletion, which includes ppe50, was found in all isolates. RD239, specific for members of Indo Oceanic lineage, and RD147c were identified. A total of 2,202 SNPs were common to the isolates and used to classify the NB strains as members of sublineage 1.2.1. Compared with 1,601 genomes from the seven lineages of M. tb complex, mutation G2342203C was found novel to the isolates in this study. Three mutations (T28910C, C1180580T

  7. Genetic signatures of Mycobacterium tuberculosis Nonthaburi genotype revealed by whole genome analysis of isolates from tuberculous meningitis patients in Thailand

    Directory of Open Access Journals (Sweden)

    Olabisi Oluwabukola Coker

    2016-04-01

    Full Text Available Genome sequencing plays a key role in understanding the genetic diversity of Mycobacterium tuberculosis (M.tb. The genotype-specific character of M. tb contributes to tuberculosis severity and emergence of drug resistance. Strains of M. tb complex can be classified into seven lineages. The Nonthaburi (NB genotype, belonging to the Indo-Oceanic lineage (lineage 1, has a unique spoligotype and IS6110-RFLP pattern but has not previously undergone a detailed whole genome analysis. In addition, there is not much information available on the whole genome analysis of M. tb isolates from tuberculous meningitis (TBM patients in public databases. Isolates CSF3053, 46-5069 and 43-13838 of NB genotype were obtained from the cerebrospinal fluids of TBM Thai patients in Siriraj Hospital, Bangkok. The whole genomes were subjected to high throughput sequencing. The sequence data of each isolate were assembled into draft genome. The sequences were also aligned to reference genome, to determine genomic variations. Single nucleotide polymorphisms (SNPs were obtained and grouped according to the functions of the genes containing them. They were compared with SNPs from 1,601 genomes, representing the seven lineages of M. tb complex, to determine the uniqueness of NB genotype. Susceptibility to first-line, second-line and other antituberculosis drugs were determined and related to the SNPs previously reported in drug-resistant related genes. The assembled genomes have an average size of 4,364,461 bp, 4,154 genes, 48 RNAs and 64 pseudogenes. A 500 base pairs deletion, which includes ppe50, was found in all isolates. RD239, specific for members of Indo Oceanic lineage, and RD147c were identified. A total of 2,202 SNPs were common to the isolates and used to classify the NB strains as members of sublineage 1.2.1. Compared with 1,601 genomes from the seven lineages of M. tb complex, mutation G2342203C was found novel to the isolates in this study. Three mutations (T28910

  8. Whole-Genome Sequence of Leptospira interrogans Serovar Hardjo Subtype Hardjoprajitno Strain Norma, Isolated from Cattle in a Leptospirosis Outbreak in Brazil.

    Science.gov (United States)

    Cosate, M R V; Soares, S C; Mendes, T A; Raittz, R T; Moreira, E C; Leite, R; Fernandes, G R; Haddad, J P A; Ortega, J Miguel

    2015-11-05

    Leptospirosis is caused by pathogenic bacteria of the genus Leptospira spp. This neglected re-emergent disease has global distribution and relevance in veterinary production. Here, we report the whole-genome sequence and annotation of Leptospira interrogans serovar Hardjo subtype Hardjoprajitno strain Norma, isolated from cattle in a livestock leptospirosis outbreak in Brazil.

  9. Complete sequence of the first chimera genome constructed by cloning the whole genome of Synechocystis strain PCC6803 into the Bacillus subtilis 168 genome.

    Science.gov (United States)

    Watanabe, Satoru; Shiwa, Yuh; Itaya, Mitsuhiro; Yoshikawa, Hirofumi

    2012-12-01

    Genome synthesis of existing or designed genomes is made feasible by the first successful cloning of a cyanobacterium, Synechocystis PCC6803, in Gram-positive, endospore-forming Bacillus subtilis. Whole-genome sequence analysis of the isolate and parental B. subtilis strains provides clues for identifying single nucleotide polymorphisms (SNPs) in the 2 complete bacterial genomes in one cell.

  10. Development of a multiplex taqMan real-time PCR assay for typing of Mycoplasma pneumoniae based on type-specific indels identified through whole genome sequencing.

    Science.gov (United States)

    Wolff, Bernard J; Benitez, Alvaro J; Desai, Heta P; Morrison, Shatavia S; Diaz, Maureen H; Winchell, Jonas M

    2017-03-01

    We developed a multiplex real-time PCR assay for simultaneously detecting M. pneumoniae and typing into historically-defined P1 types. Typing was achieved based on the presence of short type-specific indels identified through whole genome sequencing. This assay was 100% specific compared to existing methods and may be useful during epidemiologic investigations.

  11. Characterization of a novel blaIMP gene, blaIMP-58, using whole genome sequencing in a Pseudomonas putida isolate detected in Denmark

    DEFF Research Database (Denmark)

    Holmgaard, Dennis Back; Hansen, Frank; Hasman, Henrik;

    2017-01-01

    A multidrug-resistant strain of Pseudomonas putida was isolated from the urine of a 65-year-old women hospitalized for serious clinical conditions. Using whole genome sequencing a novel blaIMP gene, blaIMP-58 was discovered and characterized....

  12. Whole-Genome Sequence of Multidrug-Resistant Campylobacter coli Strain COL B1-266, Isolated from the Colombian Poultry Chain.

    Science.gov (United States)

    Bernal, Johan F; Donado-Godoy, Pilar; Arévalo, Alejandra; Duarte, Carolina; Realpe, María E; Díaz, Paula L; Gómez, Yolanda; Rodríguez, Fernando; Agarwala, Richa; Landsman, David; Mariño-Ramírez, Leonardo

    2016-03-17

    Campylobacter coli is considered one of the main causes of food-borne illness worldwide. We report here the whole-genome sequence of multidrug-resistant Campylobacter coli strain COL B1-266, isolated from the Colombian poultry chain. The genome sequences encode genes for a variety of antimicrobial resistance genes, including aminoglycosides, β-lactams, lincosamides, fluoroquinolones, and tetracyclines.

  13. Molecular diversity of Bacillus anthracis in the Netherlands: investigating the relationship to the wordwide population using whole-genome SNP discovery

    NARCIS (Netherlands)

    Derzelle, S.; Girault, G.; Roest, H.I.J.; Koene, M.G.J.

    2015-01-01

    Bacillus anthracis, the causative agent of anthrax, has been widely described as a clonal species. Here we report the use of both canonical SNP analysis and whole-genome sequencing to characterize the phylogenetic lineages of B. anthracis from the Netherlands. Eleven strains isolated over a 25-years

  14. Monitoring meticillin resistant Staphylococcus aureus and its spread in Copenhagen, Denmark, 2013, through routine whole genome sequencing

    DEFF Research Database (Denmark)

    Bartels, M D; Larner-Svensson, H; Meiniche, H;

    2015-01-01

    Typing of meticillin resistant Staphylococcus aureus (MRSA) by whole genome sequencing (WGS) is performed routinely in Copenhagen since January 2013. We describe the relatedness, based on WGS data and epidemiological data, of 341 MRSA isolates. These comprised all MRSA (n = 300) identified in Cop...

  15. Whole genome sequence analysis of the arctic-lineage strain responsible for distemper in Italian wolves and dogs through a fast and robust next generation sequencing protocol.

    Science.gov (United States)

    Marcacci, Maurilia; Ancora, Massimo; Mangone, Iolanda; Teodori, Liana; Di Sabatino, Daria; De Massis, Fabrizio; Camma', Cesare; Savini, Giovanni; Lorusso, Alessio

    2014-06-01

    Dynamic surveillance and characterization of canine distemper virus (CDV) circulating strains are essential against possible vaccine breakthroughs events. This study describes the setup of a fast and robust next-generation sequencing (NGS) Ion PGM™ protocol that was used to obtain the complete genome sequence of a CDV isolate (CDV2784/2013). CDV2784/2013 is the prototype of CDV strains responsible for severe clinical distemper in dogs and wolves in Italy during 2013. CDV2784/2013 was isolated on cell culture and total RNA was used for NGS sample preparation. A total of 112.3 Mb of reads were assembled de novo using MIRA version 4.0rc4, which yielded a total number of 403 contigs with 12.1% coverage. The whole genome (15,690 bp) was recovered successfully and compared to those of existing CDV whole genomes. CDV2784/2013 was shown to have 92% nt identity with the Onderstepoort vaccine strain. This study describes for the first time a fast and robust Ion PGM™ platform-based whole genome amplification protocol for non-segmented negative stranded RNA viruses starting from total cell-purified RNA. Additionally, this is the first study reporting the whole genome analysis of an Arctic lineage strain that is known to circulate widely in Europe, Asia and USA.

  16. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly

    NARCIS (Netherlands)

    Scheinin, I.; Sie, D.; Bengtsson, H.; Wiel, M.A. van de; Olshen, A.B.; Thuijl, H.F. van; Essen, H.F. van; Eijk, P.P.; Rustenburg, F.; Meijer, G.A.; Reijneveld, J.C.; Wesseling, P.; Pinkel, D.; Albertson, D.G.; Ylstra, B.

    2014-01-01

    Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges, including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraff

  17. Single site suppressors of a fission yeast temperature-sensitive mutant in cdc48 identified by whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    Irina N Marinova

    Full Text Available The protein called p97 in mammals and Cdc48 in budding and fission yeast is a homo-hexameric, ring-shaped, ubiquitin-dependent ATPase complex involved in a range of cellular functions, including protein degradation, vesicle fusion, DNA repair, and cell division. The cdc48+ gene is essential for viability in fission yeast, and point mutations in the human orthologue have been linked to disease. To analyze the function of p97/Cdc48 further, we performed a screen for cold-sensitive suppressors of the temperature-sensitive cdc48-353 fission yeast strain. In total, 29 independent pseudo revertants that had lost the temperature-sensitive growth defect of the cdc48-353 strain were isolated. Of these, 28 had instead acquired a cold-sensitive phenotype. Since the suppressors were all spontaneous mutants, and not the result of mutagenesis induced by chemicals or UV irradiation, we reasoned that the genome sequences of the 29 independent cdc48-353 suppressors were most likely identical with the exception of the acquired suppressor mutations. This prompted us to test if a whole genome sequencing approach would allow us to map the mutations. Indeed genome sequencing unambiguously revealed that the cold-sensitive suppressors were all second site intragenic cdc48 mutants. Projecting these onto the Cdc48 structure revealed that while the original temperature-sensitive G338D mutation is positioned near the central pore in the hexameric ring, the suppressor mutations locate to subunit-subunit and inter-domain boundaries. This suggests that Cdc48-353 is structurally compromized at the restrictive temperature, but re-established in the suppressor mutants. The last suppressor was an extragenic frame shift mutation in the ufd1 gene, which encodes a known Cdc48 co-factor. In conclusion, we show, using a novel whole genome sequencing approach, that Cdc48-353 is structurally compromized at the restrictive temperature, but stabilized in the suppressors.

  18. Whole Genome Sequencing and a New Bioinformatics Platform Allow for Rapid Gene Identification in D. melanogaster EMS Screens

    Directory of Open Access Journals (Sweden)

    Jeannette Osterloh

    2012-12-01

    Full Text Available Forward genetic screens in Drosophila melanogaster using ethyl methanesulfonate (EMS mutagenesis are a powerful approach for identifying genes that modulate specific biological processes in an in vivo setting. The mapping of genes that contain randomly-induced point mutations has become more efficient in Drosophila thanks to the maturation and availability of many types of genetic tools. However, classic approaches to gene mapping are relatively slow and ultimately require extensive Sanger sequencing of candidate chromosomal loci. With the advent of new high-throughput sequencing techniques, it is increasingly efficient to directly re-sequence the whole genome of model organisms. This approach, in combination with traditional chromosomal mapping, has the potential to greatly simplify and accelerate mutation identification in mutants generated in EMS screens. Here we show that next-generation sequencing (NGS is an accurate and efficient tool for high-throughput sequencing and mutation discovery in Drosophila melanogaster. As a test case, mutant strains of Drosophila that exhibited long-term survival of severed peripheral axons were identified in a forward EMS mutagenesis. All mutants were recessive and fell into a single lethal complementation group, which suggested that a single gene was responsible for the protective axon degenerative phenotype. Whole genome sequencing of these genomes identified the underlying gene ect4. To improve the process of genome wide mutation identification, we developed Genomes Management Application (GEM.app, https://genomics.med.miami.edu, a graphical online user interface to a custom query framework. Using a custom GEM.app query, we were able to identify that each mutant carried a unique non-sense mutation in the gene ect4 (dSarm, which was recently shown by Osterloh et al. to be essential for the activation of axonal degeneration. Our results demonstrate the current advantages and limitations of NGS in Drosophila

  19. Whole genome HBV deletion profiles and the accumulation of preS deletion mutant during antiviral treatment

    Directory of Open Access Journals (Sweden)

    Zhang Dake

    2012-12-01

    Full Text Available Abstract Background Hepatitis B virus (HBV, because of its error-prone viral polymerase, has a high mutation rate leading to widespread substitutions, deletions, and insertions in the HBV genome. Deletions may significantly change viral biological features complicating the progression of liver diseases. However, the clinical conditions correlating to the accumulation of deleted mutants remain unclear. In this study, we explored HBV deletion patterns and their association with disease status and antiviral treatment by performing whole genome sequencing on samples from 51 hepatitis B patients and by monitoring changes in deletion variants during treatment. Clone sequencing was used to analyze preS regions in another cohort of 52 patients. Results Among the core, preS, and basic core promoter (BCP deletion hotspots, we identified preS to have the highest frequency and the most complex deletion pattern using whole genome sequencing. Further clone sequencing analysis on preS identified 70 deletions which were classified into 4 types, the most common being preS2. Also, in contrast to the core and BCP regions, most preS deletions were in-frame. Most deletions interrupted viral surface epitopes, and are possibly involved in evading immuno-surveillance. Among various clinical factors examined, logistic regression showed that antiviral medication affected the accumulation of deletion mutants (OR = 6.81, 95% CI = 1.296 ~ 35.817, P = 0.023. In chronic carriers of the virus, and individuals with chronic hepatitis, the deletion rate was significantly higher in the antiviral treatment group (Fisher exact test, P = 0.007. Particularly, preS2 deletions were associated with the usage of nucleos(tide analog therapy (Fisher exact test, P = 0.023. Dynamic increases in preS1 or preS2 deletions were also observed in quasispecies from samples taken from patients before and after three months of ADV therapy. In vitro experiments demonstrated that

  20. Copy number and loss of heterozygosity detected by SNP array of formalin-fixed tissues using whole-genome amplification.

    Science.gov (United States)

    Stokes, Angela; Drozdov, Ignat; Guerra, Eliete; Ouzounis, Christos A; Warnakulasuriya, Saman; Gleeson, Michael J; McGurk, Mark; Tavassoli, Mahvash; Odell, Edward W

    2011-01-01

    The requirement for large amounts of good quality DNA for whole-genome applications prohibits their use for small, laser capture micro-dissected (LCM), and/or rare clinical samples, which are also often formalin-fixed and paraffin-embedded (FFPE). Whole-genome amplification of DNA from these samples could, potentially, overcome these limitations. However, little is known about the artefacts introduced by amplification of FFPE-derived DNA with regard to genotyping, and subsequent copy number and loss of heterozygosity (LOH) analyses. Using a ligation adaptor amplification method, we present data from a total of 22 Affymetrix SNP 6.0 experiments, using matched paired amplified and non-amplified DNA from 10 LCM FFPE normal and dysplastic oral epithelial tissues, and an internal method control. An average of 76.5% of SNPs were called in both matched amplified and non-amplified DNA samples, and concordance was a promising 82.4%. Paired analysis for copy number, LOH, and both combined, showed that copy number changes were reduced in amplified DNA, but were 99.5% concordant when detected, amplifications were the changes most likely to be 'missed', only 30% of non-amplified LOH changes were identified in amplified pairs, and when copy number and LOH are combined ∼50% of gene changes detected in the unamplified DNA were also detected in the amplified DNA and within these changes, 86.5% were concordant for both copy number and LOH status. However, there are also changes introduced as ∼20% of changes in the amplified DNA are not detected in the non-amplified DNA. An integrative network biology approach revealed that changes in amplified DNA of dysplastic oral epithelium localize to topologically critical regions of the human protein-protein interaction network, suggesting their functional implication in the pathobiology of this disease. Taken together, our results support the use of amplification of FFPE-derived DNA, provided sufficient samples are used to increase power

  1. Scanning the landscape of genome architecture of non-O1 and non-O139 Vibrio cholerae by whole genome mapping reveals extensive population genetic diversity.

    Science.gov (United States)

    Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A; Awosika, Joy; Briska, Adam; Ptashkin, Ryan N; Wagner, Trevor; Rajanna, Chythanya; Tsang, Hsinyi; Johnson, Shannon L; Mokashi, Vishwesh P; Chain, Patrick S G; Sozhamannan, Shanmuga

    2015-01-01

    Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, ordered restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.

  2. Comprehensive genome characterization of solitary fibrous tumors using high-resolution array-based comparative genomic hybridization.

    Science.gov (United States)

    Bertucci, François; Bouvier-Labit, Corinne; Finetti, Pascal; Adélaïde, José; Metellus, Philippe; Mokhtari, Karima; Decouvelaere, Anne-Valérie; Miquel, Catherine; Jouvet, Anne; Figarella-Branger, Dominique; Pedeutour, Florence; Chaffanet, Max; Birnbaum, Daniel

    2013-02-01

    Solitary fibrous tumors (SFTs) are rare spindle cell tumors with limited therapeutic options. Their molecular basis is poorly known. No consistent cytogenetic abnormality has been reported. We used high-resolution whole-genome array-based comparative genomic hybridization (Agilent 244K oligonucleotide chips) to profile 47 samples, meningeal in >75% of cases. Few copy number aberrations (CNAs) were observed. Sixty-eight percent of samples did not show any gene CNA after exclusion of probes located in regions with referenced copy number variation (CNV). Only low-level CNAs were observed. The genomic profiles were very homogeneous among samples. No molecular class was revealed by clustering of DNA copy numbers. All cases displayed a "simplex" profile. No recurrent CNA was identified. Imbalances occurring in >20%, such as the gain of 8p11.23-11.22 region, contained known CNVs. The 13q14.11-13q31.1 region (lost in 4% of cases) was the largest altered region and contained the lowest percentage of genes with referenced CNVs. A total of 425 genes without CNV showed copy number transition in at least one sample, but only but only 1 in at least 10% of samples. The genomic profiles of meningeal and extra-meningeal cases did not show any differences.

  3. Genetic characterization of dogs via chromosomal analysis and array-based comparative genomic hybridization (aCGH).

    Science.gov (United States)

    Müller, M H; Reimann-Berg, N; Bullerdiek, J; Murua Escobar, H

    2012-01-01

    The results of cytogenetic and molecular cytogenetic investigations revealed similarities in genetic background and biological behaviour between tumours and genetic diseases of humans and dogs. These findings classify the dog a good and accepted model for human cancers such as osteosarcomas, mammary carcinomas, oral melanomas and others. With the appearance of new studies and advances in canine genome sequencing, the number of known homologies in diseases between these species raised and still is expected to increase. In this context, array-based comparative genomic hybridization (aCGH) provides a novel tool to rapidly characterize numerical aberrations in canine tumours or to detect copy number aberrations between different breeds. As it is possible to spot probes covering the whole genome on each chip to discover copy number aberrations of all chromosomes simultaneously, this method is time-saving and cost-effective - considering the relation of costs and the amount of data obtained. Complemented with traditional methods like karyotyping and fluorescence in situ hybridization (FISH) analyses, the aCGH is able to provide new insights into the underlying causes of canine carcinogenesis.

  4. Whole Genome Sequencing Identifies a Missense Mutation in HES7 Associated with Short Tails in Asian Domestic Cats

    Science.gov (United States)

    Xu, Xiao; Sun, Xin; Hu, Xue-Song; Zhuang, Yan; Liu, Yue-Chen; Meng, Hao; Miao, Lin; Yu, He; Luo, Shu-Jin

    2016-01-01

    Domestic cats exhibit abundant variations in tail morphology and serve as an excellent model to study the development and evolution of vertebrate tails. Cats with shortened and kinked tails were first recorded in the Malayan archipelago by Charles Darwin in 1868 and remain quite common today in Southeast and East Asia. To elucidate the genetic basis of short tails in Asian cats, we built a pedigree of 13 cats segregating at the trait with a founder from southern China and performed linkage mapping based on whole genome sequencing data from the pedigree. The short-tailed trait was mapped to a 5.6 Mb region of Chr E1, within which the substitution c. 5T > C in the somite segmentation-related gene HES7 was identified as the causal mutation resulting in a missense change (p.V2A). Validation in 245 unrelated cats confirmed the correlation between HES7-c. 5T > C and Chinese short-tailed feral cats as well as the Japanese Bobtail breed, indicating a common genetic basis of the two. In addition, some of our sampled kinked-tailed cats could not be explained by either HES7 or the Manx-related T-box, suggesting at least three independent events in the evolution of domestic cats giving rise to short-tailed traits. PMID:27560986

  5. Whole Genome Re-Sequencing Identifies a Quantitative Trait Locus Repressing Carbon Reserve Accumulation during Optimal Growth in Chlamydomonas reinhardtii.

    Science.gov (United States)

    Goold, Hugh Douglas; Nguyen, Hoa Mai; Kong, Fantao; Beyly-Adriano, Audrey; Légeret, Bertrand; Billon, Emmanuelle; Cuiné, Stéphan; Beisson, Fred; Peltier, Gilles; Li-Beisson, Yonghua

    2016-05-04

    Microalgae have emerged as a promising source for biofuel production. Massive oil and starch accumulation in microalgae is possible, but occurs mostly when biomass growth is impaired. The molecular networks underlying the negative correlation between growth and reserve formation are not known. Thus isolation of strains capable of accumulating carbon reserves during optimal growth would be highly desirable. To this end, we screened an insertional mutant library of Chlamydomonas reinhardtii for alterations in oil content. A mutant accumulating five times more oil and twice more starch than wild-type during optimal growth was isolated and named constitutive oil accumulator 1 (coa1). Growth in photobioreactors under highly controlled conditions revealed that the increase in oil and starch content in coa1 was dependent on light intensity. Genetic analysis and DNA hybridization pointed to a single insertional event responsible for the phenotype. Whole genome re-sequencing identified in coa1 a >200 kb deletion on chromosome 14 containing 41 genes. This study demonstrates that, 1), the generation of algal strains accumulating higher reserve amount without compromising biomass accumulation is feasible; 2), light is an important parameter in phenotypic analysis; and 3), a chromosomal region (Quantitative Trait Locus) acts as suppressor of carbon reserve accumulation during optimal growth.

  6. Hyperlipidemia-associated gene variations and expression patterns revealed by whole-genome and transcriptome sequencing of rabbit models

    Science.gov (United States)

    Wang, Zhen; Zhang, Jifeng; Li, Hong; Li, Junyi; Niimi, Manabu; Ding, Guohui; Chen, Haifeng; Xu, Jie; Zhang, Hongjiu; Xu, Ze; Dai, Yulin; Gui, Tuantuan; Li, Shengdi; Liu, Zhi; Wu, Sujuan; Cao, Mushui; Zhou, Lu; Lu, Xingyu; Wang, Junxia; Yang, Jing; Fu, Yunhe; Yang, Dongshan; Song, Jun; Zhu, Tianqing; Li, Shen; Ning, Bo; Wang, Ziyun; Koike, Tomonari; Shiomi, Masashi; Liu, Enqi; Chen, Luonan; Fan, Jianglin; Chen, Y. Eugene; Li, Yixue

    2016-01-01

    The rabbit (Oryctolagus cuniculus) is an important experimental animal for studying human diseases, such as hypercholesterolemia and atherosclerosis. Despite this, genetic information and RNA expression profiling of laboratory rabbits are lacking. Here, we characterized the whole-genome variants of three breeds of the most popular experimental rabbits, New Zealand White (NZW), Japanese White (JW) and Watanabe heritable hyperlipidemic (WHHL) rabbits. Although the genetic diversity of WHHL rabbits was relatively low, they accumulated a large proportion of high-frequency deleterious mutations due to the small population size. Some of the deleterious mutations were associated with the pathophysiology of WHHL rabbits in addition to the LDLR deficiency. Furthermore, we conducted transcriptome sequencing of different organs of both WHHL and cholesterol-rich diet (Chol)-fed NZW rabbits. We found that gene expression profiles of the two rabbit models were essentially similar in the aorta, even though they exhibited different types of hypercholesterolemia. In contrast, Chol-fed rabbits, but not WHHL rabbits, exhibited pronounced inflammatory responses and abnormal lipid metabolism in the liver. These results provide valuable insights into identifying therapeutic targets of hypercholesterolemia and atherosclerosis with rabbit models. PMID:27245873

  7. A systematic study of normalization methods for Infinium 450K methylation data using whole-genome bisulfite sequencing data.

    Science.gov (United States)

    Wang, Ting; Guan, Weihua; Lin, Jerome; Boutaoui, Nadia; Canino, Glorisa; Luo, Jianhua; Celedón, Juan Carlos; Chen, Wei

    2015-01-01

    DNA methylation plays an important role in disease etiology. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a widely used platform in large-scale epidemiologic studies. This platform can efficiently and simultaneously measure methylation levels at ∼480,000 CpG sites in the human genome in multiple study samples. Due to the intrinsic chip design of 2 types of chemistry probes, data normalization or preprocessing is a critical step to consider before data analysis. To date, numerous methods and pipelines have been developed for this purpose, and some studies have been conducted to evaluate different methods. However, validation studies have often been limited to a small number of CpG sites to reduce the variability in technical replicates. In this study, we measured methylation on a set of samples using both whole-genome bisulfite sequencing (WGBS) and 450K chips. We used WGBS data as a gold standard of true methylation states in cells to compare the performances of 8 normalization methods for 450K data on a genome-wide scale. Analyses on our dataset indicate that the most effective methods are peak-based correction (PBC) and quantile normalization plus β-mixture quantile normalization (QN.BMIQ). To our knowledge, this is the first study to systematically compare existing normalization methods for Illumina 450K data using novel WGBS data. Our results provide a benchmark reference for the analysis of DNA methylation chip data, particularly in white blood cells.

  8. Whole-genome sequencing analysis of phenotypic heterogeneity and anticipation in Li-Fraumeni cancer predisposition syndrome.

    Science.gov (United States)

    Ariffin, Hany; Hainaut, Pierre; Puzio-Kuter, Anna; Choong, Soo Sin; Chan, Adelyne Sue Li; Tolkunov, Denis; Rajagopal, Gunaretnam; Kang, Wenfeng; Lim, Leon Li Wen; Krishnan, Shekhar; Chen, Kok-Siong; Achatz, Maria Isabel; Karsa, Mawar; Shamsani, Jannah; Levine, Arnold J; Chan, Chang S

    2014-10-28

    The Li-Fraumeni syndrome (LFS) and its variant form (LFL) is a familial predisposition to multiple forms of childhood, adolescent, and adult cancers associated with germ-line mutation in the TP53 tumor suppressor gene. Individual disparities in tumor patterns are compounded by acceleration of cancer onset with successive generations. It has been suggested that this apparent anticipation pattern may result from germ-line genomic instability in TP53 mutation carriers, causing increased DNA copy-number variations (CNVs) with successive generations. To address the genetic basis of phenotypic disparities of LFS/LFL, we performed whole-genome sequencing (WGS) of 13 subjects from two generations of an LFS kindred. Neither de novo CNV nor significant difference in total CNV was detected in relation with successive generations or with age at cancer onset. These observations were consistent with an experimental mouse model system showing that trp53 deficiency in the germ line of father or mother did not increase CNV occurrence in the offspring. On the other hand, individual records on 1,771 TP53 mutation carriers from 294 pedigrees were compiled to assess genetic anticipation patterns (International Agency for Research on Cancer TP53 database). No strictly defined anticipation pattern was observed. Rather, in multigeneration families, cancer onset was delayed in older compared with recent generations. These observations support an alternative model for apparent anticipation in which rare variants from noncarrier parents may attenuate constitutive resistance to tumorigenesis in the offspring of TP53 mutation carriers with late cancer onset.

  9. Exploring the diversity of Arcobacter butzleri from cattle in the UK using MLST and whole genome sequencing.

    Directory of Open Access Journals (Sweden)

    J Yvette Merga

    Full Text Available Arcobacter butzleri is considered to be an emerging human foodborne pathogen. The completion of an A. butzleri genome sequence along with microarray analysis of 13 isolates in 2007 revealed a surprising amount of diversity amongst A. butzleri isolates from humans, animals and food. In order to further investigate Arcobacter diversity, 792 faecal samples were collected from cattle on beef and dairy farms in the North West of England. Arcobacter was isolated from 42.5% of the samples and the diversity of the isolates was investigated using multilocus sequence typing. An A. butzleri whole genome sequence, obtained by 454 shotgun sequencing of an isolate from a clinically-healthy dairy cow, showed a number of differences when compared to the genome of a human-derived A. butzleri isolate. PCR-based prevalence assays for variable genes suggested some tentative evidence for source-related distributions. We also found evidence for phenotypic differences relating to growth capabilities between our representative human and cattle isolates. Our genotypic and phenotypic observations suggest that some level of niche adaptation may have occurred in A. butzleri.

  10. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    Science.gov (United States)

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  11. The roles of whole-genome and small-scale duplications in the functional specialization of Saccharomyces cerevisiae genes.

    Directory of Open Access Journals (Sweden)

    Mario A Fares

    Full Text Available Researchers have long been enthralled with the idea that gene duplication can generate novel functions, crediting this process with great evolutionary importance. Empirical data shows that whole-genome duplications (WGDs are more likely to be retained than small-scale duplications (SSDs, though their relative contribution to the functional fate of duplicates remains unexplored. Using the map of genetic interactions and the re-sequencing of 27 Saccharomyces cerevisiae genomes evolving for 2,200 generations we show that SSD-duplicates lead to neo-functionalization while WGD-duplicates partition ancestral functions. This conclusion is supported by: (a SSD-duplicates establish more genetic interactions than singletons and WGD-duplicates; (b SSD-duplicates copies share more interaction-partners than WGD-duplicates copies; (c WGD-duplicates interaction partners are more functionally related than SSD-duplicates partners; (d SSD-duplicates gene copies are more functionally divergent from one another, while keeping more overlapping functions, and diverge in their sub-cellular locations more than WGD-duplicates copies; and (e SSD-duplicates complement their functions to a greater extent than WGD-duplicates. We propose a novel model that uncovers the complexity of evolution after gene duplication.

  12. The Roles of Whole-Genome and Small-Scale Duplications in the Functional Specialization of Saccharomyces cerevisiae Genes

    Science.gov (United States)

    Fares, Mario A.; Keane, Orla M.; Toft, Christina; Carretero-Paulet, Lorenzo; Jones, Gary W.

    2013-01-01

    Researchers have long been enthralled with the idea that gene duplication can generate novel functions, crediting this process with great evolutionary importance. Empirical data shows that whole-genome duplications (WGDs) are more likely to be retained than small-scale duplications (SSDs), though their relative contribution to the functional fate of duplicates remains unexplored. Using the map of genetic interactions and the re-sequencing of 27 Saccharomyces cerevisiae genomes evolving for 2,200 generations we show that SSD-duplicates lead to neo-functionalization while WGD-duplicates partition ancestral functions. This conclusion is supported by: (a) SSD-duplicates establish more genetic interactions than singletons and WGD-duplicates; (b) SSD-duplicates copies share more interaction-partners than WGD-duplicates copies; (c) WGD-duplicates interaction partners are more functionally related than SSD-duplicates partners; (d) SSD-duplicates gene copies are more functionally divergent from one another, while keeping more overlapping functions, and diverge in their sub-cellular locations more than WGD-duplicates copies; and (e) SSD-duplicates complement their functions to a greater extent than WGD–duplicates. We propose a novel model that uncovers the complexity of evolution after gene duplication. PMID:23300483

  13. Whole genome duplication in coast redwood (Sequoia sempervirens) and its implications for explaining the rarity of polyploidy in conifers.

    Science.gov (United States)

    Scott, Alison Dawn; Stenz, Noah W M; Ingvarsson, Pär K; Baum, David A

    2016-07-01

    Polyploidy is common and an important evolutionary factor in most land plant lineages, but it is rare in gymnosperms. Coast redwood (Sequoia sempervirens) is one of just two polyploid conifer species and the only hexaploid. Evidence from fossil guard cell size suggests that polyploidy in Sequoia dates to the Eocene. Numerous hypotheses about the mechanism of polyploidy and parental genome donors have been proposed, based primarily on morphological and cytological data, but it remains unclear how Sequoia became polyploid and why this lineage overcame an apparent gymnosperm barrier to whole-genome duplication (WGD). We sequenced transcriptomes and used phylogenetic inference, Bayesian concordance analysis and paralog age distributions to resolve relationships among gene copies in hexaploid coast redwood and close relatives. Our data show that hexaploidy in coast redwood is best explained by autopolyploidy or, if there was allopolyploidy, it happened within the Californian redwood clade. We found that duplicate genes have more similar sequences than expected, given the age of the inferred polyploidization. Conflict between molecular and fossil estimates of WGD can be explained if diploidization occurred very slowly following polyploidization. We extrapolate from this to suggest that the rarity of polyploidy in gymnosperms may be due to slow diploidization in this clade.

  14. Exploiting Bacterial Whole-Genome Sequencing Data for Evaluation of Diagnostic Assays: Campylobacter Species Identification as a Case Study

    Science.gov (United States)

    Jansen van Rensburg, Melissa J.; Swift, Craig; Cody, Alison J.; Jenkins, Claire

    2016-01-01

    The application of whole-genome sequencing (WGS) to problems in clinical microbiology has had a major impact on the field. Clinical laboratories are now using WGS for pathogen identification, antimicrobial susceptibility testing, and epidemiological typing. WGS data also represent a valuable resource for the development and evaluation of molecular diagnostic assays, which continue to play an important role in clinical microbiology. To demonstrate this application of WGS, this study used publicly available genomic data to evaluate a duplex real-time PCR (RT-PCR) assay that targets mapA and ceuE for the detection of Campylobacter jejuni and Campylobacter coli, leading global causes of bacterial gastroenteritis. In silico analyses of mapA and ceuE primer and probe sequences from 1,713 genetically diverse C. jejuni and C. coli genomes, supported by RT-PCR testing, indicated that the assay was robust, with 1,707 (99.7%) isolates correctly identified. The high specificity of the mapA-ceuE assay was the result of interspecies diversity and intraspecies conservation of the target genes in C. jejuni and C. coli. Rare instances of a lack of specificity among C. coli isolates were due to introgression in mapA or sequence diversity in ceuE. The results of this study illustrate how WGS can be exploited to evaluate molecular diagnostic assays by using publicly available data, online databases, and open-source software. PMID:27733632

  15. Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop.

    Science.gov (United States)

    Hazzouri, Khaled M; Flowers, Jonathan M; Visser, Hendrik J; Khierallah, Hussam S M; Rosas, Ulises; Pham, Gina M; Meyer, Rachel S; Johansen, Caryn K; Fresquez, Zoë A; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A; Thirkhill, Deborah; Markhand, Ghulam S; Krueger, Robert R; Zaid, Abdelouahhab; Purugganan, Michael D

    2015-11-09

    Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop.

  16. Whole-Genome Expression Analysis and Signal Pathway Screening of Synovium-Derived Mesenchymal Stromal Cells in Rheumatoid Arthritis

    Directory of Open Access Journals (Sweden)

    Jingyi Hou

    2016-01-01

    Full Text Available Synovium-derived mesenchymal stromal cells (SMSCs may play an important role in the pathogenesis of rheumatoid arthritis (RA and show promise for therapeutic applications in RA. In this study, a whole-genome microarray analysis was used to detect differential gene expression in SMSCs from RA patients and healthy donors (HDs. Our results showed that there were 4828 differentially expressed genes in the RA group compared to the HD group; 3117 genes were upregulated, and 1711 genes were downregulated. A Gene Ontology analysis showed significantly enriched terms of differentially expressed genes in the biological process, cellular component, and molecular function domains. A Kyoto Encyclopedia of Genes and Genomes analysis showed that the MAPK signaling and rheumatoid arthritis pathways were upregulated and that the p53 signaling pathway was downregulated in RA SMSCs. Quantitative real-time polymerase chain reaction was applied to verify the expression variations of the partial genes mentioned above, and a western blot analysis was used to determine the expression levels of p53, p-JNK, p-ERK, and p-p38. Our study found that differentially expressed genes in the MAPK signaling, rheumatoid arthritis, and p53 signaling pathways may help to explain the pathogenic mechanism of RA and lead to therapeutic RA SMSC applications.

  17. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy.

    Science.gov (United States)

    Zuo, Guanghong; Hao, Bailin

    2015-10-01

    A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements.

  18. Yeast "make-accumulate-consume" life strategy evolved as a multi-step process that predates the whole genome duplication.

    Directory of Open Access Journals (Sweden)

    Arne Hagman

    Full Text Available When fruits ripen, microbial communities start a fierce competition for the freely available fruit sugars. Three yeast lineages, including baker's yeast Saccharomyces cerevisiae, have independently developed the metabolic activity to convert simple sugars into ethanol even under fully aerobic conditions. This fermentation capacity, named Crabtree effect, reduces the cell-biomass production but provides in nature a tool to out-compete other microorganisms. Here, we analyzed over forty Saccharomycetaceae yeasts, covering over 200 million years of the evolutionary history, for their carbon metabolism. The experiments were done under strictly controlled and uniform conditions, which has not been done before. We show that the origin of Crabtree effect in Saccharomycetaceae predates the whole genome duplication and became a settled metabolic trait after the split of the S. cerevisiae and Kluyveromyces lineages, and coincided with the origin of modern fruit bearing plants. Our results suggest that ethanol fermentation evolved progressively, involving several successive molecular events that have gradually remodeled the yeast carbon metabolism. While some of the final evolutionary events, like gene duplications of glucose transporters and glycolytic enzymes, have been deduced, the earliest molecular events initiating Crabtree effect are still to be determined.

  19. Evaluating alignment and variant-calling software for mutation identification in C. elegans by whole-genome sequencing.

    Science.gov (United States)

    Smith, Harold E; Yun, Sijung

    2017-01-01

    Whole-genome sequencing is a powerful tool for analyzing genetic variation on a global scale. One particularly useful application is the identification of mutations obtained by classical phenotypic screens in model species. Sequence data from the mutant strain is aligned to the reference genome, and then variants are called to generate a list of candidate alleles. A number of software pipelines for mutation identification have been targeted to C. elegans, with particular emphasis on ease of use, incorporation of mapping strain data, subtraction of background variants, and similar criteria. Although success is predicated upon the sensitive and accurate detection of candidate alleles, relatively little effort has been invested in evaluating the underlying software components that are required for mutation identification. Therefore, we have benchmarked a number of commonly used tools for sequence alignment and variant calling, in all pair-wise combinations, against both simulated and actual datasets. We compared the accuracy of those pipelines for mutation identification in C. elegans, and found that the combination of BBMap for alignment plus FreeBayes for variant calling offers the most robust performance.

  20. Whole-Genome Resequencing Reveals Extensive Natural Variation in the Model Green Alga Chlamydomonas reinhardtii[OPEN

    Science.gov (United States)

    Hazzouri, Khaled M.; Rosas, Ulises; Bahmani, Tayebeh; Nelson, David R.; Abdrabu, Rasha; Harris, Elizabeth H.; Salehi-Ashtiani, Kourosh; Purugganan, Michael D.

    2015-01-01

    We performed whole-genome resequencing of 12 field isolates and eight commonly studied laboratory strains of the model organism Chlamydomonas reinhardtii to characterize genomic diversity and provide a resource for studies of natural variation. Our data support previous observations that Chlamydomonas is among the most diverse eukaryotic species. Nucleotide diversity is ∼3% and is geographically structured in North America with some evidence of admixture among sampling locales. Examination of predicted loss-of-function mutations in field isolates indicates conservation of genes associated with core cellular functions, while genes in large gene families and poorly characterized genes show a greater incidence of major effect mutations. De novo assembly of unmapped reads recovered genes in the field isolates that are absent from the CC-503 assembly. The laboratory reference strains show a genomic pattern of polymorphism consistent with their origin as the recombinant progeny of a diploid zygospore. Large duplications or amplifications are a prominent feature of laboratory strains and appear to have originated under laboratory culture. Extensive natural variation offers a new source of genetic diversity for studies of Chlamydomonas, including naturally occurring alleles that may prove useful in studies of gene function and the dissection of quantitative genetic traits. PMID:26392080

  1. Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop

    Science.gov (United States)

    Hazzouri, Khaled M.; Flowers, Jonathan M.; Visser, Hendrik J.; Khierallah, Hussam S. M.; Rosas, Ulises; Pham, Gina M.; Meyer, Rachel S.; Johansen, Caryn K.; Fresquez, Zoë A.; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A.; Thirkhill, Deborah; Markhand, Ghulam S.; Krueger, Robert R.; Zaid, Abdelouahhab; Purugganan, Michael D.

    2015-01-01

    Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop. PMID:26549859

  2. Next-Gen phylogeography of rainforest trees: exploring landscape-level cpDNA variation from whole-genome sequencing.

    Science.gov (United States)

    van der Merwe, M; McPherson, H; Siow, J; Rossetto, M

    2014-01-01

    Standardized phylogeographic studies across codistributed taxa can identify important refugia and biogeographic barriers, and potentially uncover how changes in adaptive constraints through space and time impact on the distribution of genetic diversity. The combination of next-generation sequencing and methodologies that enable uncomplicated analysis of the full chloroplast genome may provide an invaluable resource for such studies. Here, we assess the potential of a shotgun-based method across twelve nonmodel rainforest trees sampled from two evolutionary distinct regions. Whole genomic shotgun sequencing libraries consisting of pooled individuals were used to assemble species-specific chloroplast references (in silicio). For each species, the pooled libraries allowed for the detection of variation within and between data sets (each representing a geographic region). The potential use of nuclear rDNA as an additional marker from the NGS libraries was investigated by mapping reads against available references. We successfully obtained phylogeographically informative sequence data from a range of previously unstudied rainforest trees. Greater levels of diversity were found in northern refugial rainforests than in southern expansion areas. The genetic signatures of varying evolutionary histories were detected, and interesting associative patterns between functional characteristics and genetic diversity were identified. This approach can suit a wide range of landscape-level studies. As the key laboratory-based steps do not require prior species-specific knowledge and can be easily outsourced, the techniques described here are even suitable for researchers without access to wet-laboratory facilities, making evolutionary ecology questions increasingly accessible to the research community.

  3. Abundant sequence divergence in the native Japanese cattle Mishima-Ushi (Bos taurus) detected using whole-genome sequencing.

    Science.gov (United States)

    Tsuda, Kaoru; Kawahara-Miki, Ryouka; Sano, Satoshi; Imai, Misaki; Noguchi, Tatsuo; Inayoshi, Yousuke; Kono, Tomohiro

    2013-10-01

    The native Japanese cattle Mishima-Ushi, a designated national natural treasure, are bred on a remote island, which has resulted in the conservation of their genealogy. We examined the genetic characteristics of 8 Mishima-Ushi individuals by using single nucleotide polymorphisms (SNPs), insertions, and deletions obtained by whole-genome sequencing. Mapping analysis with various criteria showed that predicted heterozygous SNPs were more prevalent than predicted homozygous SNPs in the exonic region, especially non-synonymous SNPs. From the identified 6.54 million polymorphisms, we found 400 non-synonymous SNPs in 313 genes specific to each of the 8 Mishima-Ushi individuals. Additionally, 3,170,833 polymorphisms were found between the 8 Mishima-Ushi individuals. Phylogenetic analysis confirmed that the Mishima-Ushi population diverged from another strain of Japanese cattle. This study provides a framework for further genetic studies of Mishima-Ushi and research on the function of SNP-containing genes as well as understanding the genetic relationship between the domestic and native Japanese cattle breeds.

  4. Algorithms to model single gene, single chromosome, and whole genome copy number changes jointly in tumor phylogenetics.

    Science.gov (United States)

    Chowdhury, Salim Akhter; Shackney, Stanley E; Heselmeyer-Haddad, Kerstin; Ried, Thomas; Schäffer, Alejandro A; Schwartz, Russell

    2014-07-01

    We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome. The methods are designed for data collected by fluorescence in situ hybridization (FISH), an experimental technique especially well suited to characterizing intratumor heterogeneity using counts of probes to genetic regions frequently gained or lost in tumor development. Here, we develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. We then apply this theory to develop a practical heuristic algorithm, implemented in publicly available software, for inferring tumor phylogenies on data from potentially hundreds of single cells by this evolutionary model. We demonstrate and validate the methods on simulated data and published FISH data from cervical cancers and breast cancers. Our computational experiments show that the new model and algorithm lead to more parsimonious trees than prior methods for single-tumor phylogenetics and to improved performance on various classification tasks, such as distinguishing primary tumors from metastases obtained from the same patient population.

  5. Differential Gene Expression Analysis of Placentas with Increased Vascular Resistance and Pre-Eclampsia Using Whole-Genome Microarrays

    Directory of Open Access Journals (Sweden)

    M. Centlow

    2011-01-01

    Full Text Available Pre-eclampsia is a pregnancy complication characterized by hypertension and proteinuria. There are several factors associated with an increased risk of developing pre-eclampsia, one of which is increased uterine artery resistance, referred to as “notching”. However, some women do not progress into pre-eclampsia whereas others may have a higher risk of doing so. The placenta, central in pre-eclampsia pathology, may express genes associated with either protection or progression into pre-eclampsia. In order to search for genes associated with protection or progression, whole-genome profiling was performed. Placental tissue from 15 controls, 10 pre-eclamptic, 5 pre-eclampsia with notching, and 5 with notching only were analyzed using microarray and antibody microarrays to study some of the same gene product and functionally related ones. The microarray showed 148 genes to be significantly altered between the four groups. In the preeclamptic group compared to notch only, there was increased expression of genes related to chemotaxis and the NF-kappa B pathway and decreased expression of genes related to antigen processing and presentation, such as human leukocyte antigen B. Our results indicate that progression of pre-eclampsia from notching may involve the development of inflammation. Increased expression of antigen-presenting genes, as seen in the notch-only placenta, may prevent this inflammatory response and, thereby, protect the patient from developing pre-eclampsia.

  6. High-resolution Whole-Genome Analysis of Skull Base Chordomas Implicates FHIT Loss in Chordoma Pathogenesis

    Directory of Open Access Journals (Sweden)

    Roberto Jose Diaz

    2012-09-01

    Full Text Available Chordoma is a rare tumor arising in the sacrum, clivus, or vertebrae. It is often not completely resectable and shows a high incidence of recurrence and progression with shortened patient survival and impaired quality of life. Chemotherapeutic options are limited to investigational therapies at present. Therefore, adjuvant therapy for control of tumor recurrence and progression is of great interest, especially in skull base lesions where complete tumor resection is often not possible because of the proximity of cranial nerves. To understand the extent of genetic instability and associated chromosomal and gene losses or gains in skull base chordoma, we undertook whole-genome single-nucleotide polymorphism microarray analysis of flash frozen surgical chordoma specimens, 21 from the clivus and 1 from C1 to C2 vertebrae. We confirm the presence of a deletion at 9p involving CDKN2A, CDKN2B, and MTAP but at a much lower rate (22% than previously reported for sacral chordoma. At a similar frequency (21%, we found aneuploidy of chromosome 3. Tissue microarray immunohistochemistry demonstrated absent or reduced fragile histidine triad (FHIT protein expression in 98% of sacral chordomas and 67%of skull base chordomas. Our data suggest that chromosome 3 aneuploidy and epigenetic regulation of FHIT contribute to loss of the FHIT tumor suppressor in chordoma. The finding that FHIT is lost in a majority of chordomas provides new insight into chordoma pathogenesis and points to a potential new therapeutic target for this challenging neoplasm.

  7. Whole genome comparison of Campylobacter jejuni human isolates using a low-cost microarray reveals extensive genetic diversity.

    Science.gov (United States)

    Dorrell, N; Mangan, J A; Laing, K G; Hinds, J; Linton, D; Al-Ghusein, H; Barrell, B G; Parkhill, J; Stoker, N G; Karlyshev, A V; Butcher, P D; Wren, B W

    2001-10-01

    Campylobacter jejuni is the leading cause of bacterial food-borne diarrhoeal disease throughout the world, and yet is still a poorly understood pathogen. Whole genome microarray comparisons of 11 C. jejuni strains of diverse origin identified genes in up to 30 NCTC 11168 loci ranging from 0.7 to 18.7 kb that are either absent or highly divergent in these isolates. Many of these regions are associated with the biosynthesis of surface structures including flagella, lipo-oligosaccharide, and the newly identified capsule. Other strain-variable genes of known function include those responsible for iron acquisition, DNA restriction/modification, and sialylation. In fact, at least 21% of genes in the sequenced strain appear dispensable as they are absent or highly divergent in one or more of the isolates tested, thus defining 1300 C. jejuni core genes. Such core genes contribute mainly to metabolic, biosynthetic, cellular, and regulatory processes, but many virulence determinants are also conserved. Comparison of the capsule biosynthesis locus revealed conservation of all the genes in this region in strains with the same Penner serotype as strain NCTC 11168. By contrast, between 5 and 17 NCTC 11168 genes in this region are either absent or highly divergent in strains of a different serotype from the sequenced strain, providing further evidence that the capsule accounts for Penner serotype specificity. These studies reveal extensive genetic diversity among C. jejuni strains and pave the way toward identifying correlates of pathogenicity and developing improved epidemiological tools for this problematic pathogen.

  8. Whole Genome Sequence Analysis of Salmonella Typhi Isolated in Thailand before and after the Introduction of a National Immunization Program

    Science.gov (United States)

    Thanh, Duy Pham; Bodhidatta, Ladaporn; Mason, Carl Jeffries; Srijan, Apichai; Rabaa, Maia A.; Vinh, Phat Voong; Thanh, Tuyen Ha; Thwaites, Guy E.; Baker, Stephen; Holt, Kathryn E.

    2017-01-01

    Vaccines against Salmonella Typhi, the causative agent of typhoid fever, are commonly used by travellers, however, there are few examples of national immunization programs in endemic areas. There is therefore a paucity of data on the impact of typhoid immunization programs on localised populations of S. Typhi. Here we have used whole genome sequencing (WGS) to characterise 44 historical bacterial isolates collected before and after a national typhoid immunization program that was implemented in Thailand in 1977 in response to a large outbreak; the program was highly effective in reducing typhoid case numbers. Thai isolates were highly diverse, including 10 distinct phylogenetic lineages or genotypes. Novel prophage and plasmids were also detected, including examples that were previously only reported in Shigella sonnei and Escherichia coli. The majority of S. Typhi genotypes observed prior to the immunization program were not observed following it. Post-vaccine era isolates were more closely related to S. Typhi isolated from neighbouring countries than to earlier Thai isolates, providing no evidence for the local persistence of endemic S. Typhi following the national immunization program. Rather, later cases of typhoid appeared to be caused by the occasional importation of common genotypes from neighbouring Vietnam, Laos, and Cambodia. These data show the value of WGS in understanding the impacts of vaccination on pathogen populations and provide support for the proposal that large-scale typhoid immunization programs in endemic areas could result in lasting local disease elimination, although larger prospective studies are needed to test this directly. PMID:28060810

  9. Whole Genome Sequence Analysis of Salmonella Typhi Isolated in Thailand before and after the Introduction of a National Immunization Program.

    Science.gov (United States)

    Dyson, Zoe A; Thanh, Duy Pham; Bodhidatta, Ladaporn; Mason, Carl Jeffries; Srijan, Apichai; Rabaa, Maia A; Vinh, Phat Voong; Thanh, Tuyen Ha; Thwaites, Guy E; Baker, Stephen; Holt, Kathryn E

    2017-01-01

    Vaccines against Salmonella Typhi, the causative agent of typhoid fever, are commonly used by travellers, however, there are few examples of national immunization programs in endemic areas. There is therefore a paucity of data on the impact of typhoid immunization programs on localised populations of S. Typhi. Here we have used whole genome sequencing (WGS) to characterise 44 historical bacterial isolates collected before and after a national typhoid immunization program that was implemented in Thailand in 1977 in response to a large outbreak; the program was highly effective in reducing typhoid case numbers. Thai isolates were highly diverse, including 10 distinct phylogenetic lineages or genotypes. Novel prophage and plasmids were also detected, including examples that were previously only reported in Shigella sonnei and Escherichia coli. The majority of S. Typhi genotypes observed prior to the immunization program were not observed following it. Post-vaccine era isolates were more closely related to S. Typhi isolated from neighbouring countries than to earlier Thai isolates, providing no evidence for the local persistence of endemic S. Typhi following the national immunization program. Rather, later cases of typhoid appeared to be caused by the occasional importation of common genotypes from neighbouring Vietnam, Laos, and Cambodia. These data show the value of WGS in understanding the impacts of vaccination on pathogen populations and provide support for the proposal that large-scale typhoid immunization programs in endemic areas could result in lasting local disease elimination, although larger prospective studies are needed to test this directly.

  10. Whole-genome single-cell copy number profiling from formalin-fixed paraffin-embedded samples.

    Science.gov (United States)

    Martelotto, Luciano G; Baslan, Timour; Kendall, Jude; Geyer, Felipe C; Burke, Kathleen A; Spraggon, Lee; Piscuoglio, Salvatore; Chadalavada, Kalyani; Nanjangud, Gouri; Ng, Charlotte K Y; Moody, Pamela; D'Italia, Sean; Rodgers, Linda; Cox, Hilary; da Cruz Paula, Arnaud; Stepansky, Asya; Schizas, Michail; Wen, Hannah Y; King, Tari A; Norton, Larry; Weigelt, Britta; Hicks, James B; Reis-Filho, Jorge S

    2017-03-01

    A substantial proportion of tumors consist of genotypically distinct subpopulations of cancer cells. This intratumor genetic heterogeneity poses a substantial challenge for the implementation of precision medicine. Single-cell genomics constitutes a powerful approach to resolve complex mixtures of cancer cells by tracing cell lineages and discovering cryptic genetic variations that would otherwise be obscured in tumor bulk analyses. Because of the chemical alterations that result from formalin fixation, single-cell genomic approaches have largely remained limited to fresh or rapidly frozen specimens. Here we describe the development and validation of a robust and accurate methodology to perform whole-genome copy-number profiling of single nuclei obtained from formalin-fixed paraffin-embedded clinical tumor samples. We applied the single-cell sequencing approach described here to study the progression from in situ to invasive breast cancer, which revealed that ductal carcinomas in situ show intratumor genetic heterogeneity at diagnosis and that these lesions may progress to invasive breast cancer through a variety of evolutionary processes.

  11. Whole-genome sequence analysis and exploration of the zoonotic potential of a rat-borne Bartonella elizabethae.

    Science.gov (United States)

    Tay, S T; Kho, K L; Wee, W Y; Choo, S W

    2016-03-01

    Bartonella elizabethae has been known to cause endocarditis and neuroretinitis in humans. The genomic features and virulence profiles of a B. elizabethae strain (designated as BeUM) isolated from the spleen of a wild rat in Kuala Lumpur, Malaysia are described in this study. The BeUM strain has a genome size of 1,932,479bp and GC content of 38.3%. There is a high degree of conservation between the genomes of strain BeUM with B. elizabethae type strains (ATCC 49927 and F9251) and a rat-borne strain, Re6043vi. Of 2137 gene clusters identified from B. elizabethae strains, 2064 (96.6%) are indicated as the core gene clusters. Comparative genome analysis of B. elizabethae strains reveals virulence genes which are known in other pathogenic Bartonella species, including VirB2-11, vbhB2-B11, VirD4, trw, vapA2-5, hbpA-E, bepA-F, bepH, badA/vomp/brp, ialB, omp43/89 and korA-B. A putative intact prophage has been identified in the strain BeUM, in addition to a 8kb pathogenicity island. The whole genome analysis supports the zoonotic potential of the rodent-borne B. elizabethae, and provides basis for future functional and pathogenicity studies of B. elizabethae.

  12. Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries.

    Science.gov (United States)

    Carpenter, Meredith L; Buenrostro, Jason D; Valdiosera, Cristina; Schroeder, Hannes; Allentoft, Morten E; Sikora, Martin; Rasmussen, Morten; Gravel, Simon; Guillén, Sonia; Nekhrizov, Georgi; Leshtakov, Krasimir; Dimitrova, Diana; Theodossiev, Nikola; Pettener, Davide; Luiselli, Donata; Sandoval, Karla; Moreno-Estrada, Andrés; Li, Yingrui; Wang, Jun; Gilbert, M Thomas P; Willerslev, Eske; Greenleaf, William J; Bustamante, Carlos D

    2013-11-07

    Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062-147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217-73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples.

  13. Characterization of Genomic Variants Associated with Scout and Recruit Behavioral Castes in Honey Bees Using Whole-Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Bruce R Southey

    Full Text Available Among forager honey bees, scouts seek new resources and return to the colony, enlisting recruits to collect these resources. Differentially expressed genes between these behaviors and genetic variability in scouting phenotypes have been reported. Whole-genome sequencing of 44 Apis mellifera scouts and recruits was undertaken to detect variants and further understand the genetic architecture underlying the behavioral differences between scouts and recruits. The median coverage depth in recruits and scouts was 10.01 and 10.7 X, respectively. Representation of bacterial species among the unmapped reads reflected a more diverse microbiome in scouts than recruits. Overall, 1,412,705 polymorphic positions were analyzed for associations with scouting behavior, and 212 significant (p-value 1000 bp apart from each other. A number of these variants were mapped to ncRNA LOC100578102, solute carrier family 12 member 6-like gene, and LOC100576965 (meprin and TRAF-C homology domain containing gene. Functional categories represented among the genes corresponding to significant variants included: neuronal function, exoskeleton, immune response, salivary gland development, and enzymatic food processing. These categories offer a glimpse into the molecular support to the behaviors of scouts and recruits. The level of association between genomic variants and scouting behavior observed in this study may be linked to the honey bee's genomic plasticity and fluidity of transition between castes.

  14. Implementation of High Resolution Whole Genome Array CGH in the Prenatal Clinical Setting: Advantages, Challenges, and Review of the Literature

    Directory of Open Access Journals (Sweden)

    Paola Evangelidou

    2013-01-01

    Full Text Available Array Comparative Genomic Hybridization analysis is replacing postnatal chromosomal analysis in cases of intellectual disabilities, and it has been postulated that it might also become the first-tier test in prenatal diagnosis. In this study, array CGH was applied in 64 prenatal samples with whole genome oligonucleotide arrays (BlueGnome, Ltd. on DNA extracted from chorionic villi, amniotic fluid, foetal blood, and skin samples. Results were confirmed with Fluorescence In Situ Hybridization or Real-Time PCR. Fifty-three cases had normal karyotype and abnormal ultrasound findings, and seven samples had balanced rearrangements, five of which also had ultrasound findings. The value of array CGH in the characterization of previously known aberrations in five samples is also presented. Seventeen out of 64 samples carried copy number alterations giving a detection rate of 26.5%. Ten of these represent benign or variables of unknown significance, giving a diagnostic capacity of the method to be 10.9%. If karyotype is performed the additional diagnostic capacity of the method is 5.1% (3/59. This study indicates the ability of array CGH to identify chromosomal abnormalities which cannot be detected during routine prenatal cytogenetic analysis, therefore increasing the overall detection rate. In addition a thorough review of the literature is presented.

  15. Whole Genome Re-Sequencing Identifies a Quantitative Trait Locus Repressing Carbon Reserve Accumulation during Optimal Growth in Chlamydomonas reinhardtii

    Science.gov (United States)

    Goold, Hugh Douglas; Nguyen, Hoa Mai; Kong, Fantao; Beyly-Adriano, Audrey; Légeret, Bertrand; Billon, Emmanuelle; Cuiné, Stéphan; Beisson, Fred; Peltier, Gilles; Li-Beisson, Yonghua

    2016-01-01

    Microalgae have emerged as a promising source for biofuel production. Massive oil and starch accumulation in microalgae is possible, but occurs mostly when biomass growth is impaired. The molecular networks underlying the negative correlation between growth and reserve formation are not known. Thus isolation of strains capable of accumulating carbon reserves during optimal growth would be highly desirable. To this end, we screened an insertional mutant library of Chlamydomonas reinhardtii for alterations in oil content. A mutant accumulating five times more oil and twice more starch than wild-type during optimal growth was isolated and named constitutive oil accumulator 1 (coa1). Growth in photobioreactors under highly controlled conditions revealed that the increase in oil and starch content in coa1 was dependent on light intensity. Genetic analysis and DNA hybridization pointed to a single insertional event responsible for the phenotype. Whole genome re-sequencing identified in coa1 a >200 kb deletion on chromosome 14 containing 41 genes. This study demonstrates that, 1), the generation of algal strains accumulating higher reserve amount without compromising biomass accumulation is feasible; 2), light is an important parameter in phenotypic analysis; and 3), a chromosomal region (Quantitative Trait Locus) acts as suppressor of carbon reserve accumulation during optimal growth. PMID:27141848

  16. Monodisperse Picoliter Droplets for Low-Bias and Contamination-Free Reactions in Single-Cell Whole Genome Amplification.

    Directory of Open Access Journals (Sweden)

    Yohei Nishikawa

    Full Text Available Whole genome amplification (WGA is essential for obtaining genome sequences from single bacterial cells because the quantity of template DNA contained in a single cell is very low. Multiple displacement amplification (MDA, using Phi29 DNA polymerase and random primers, is the most widely used method for single-cell WGA. However, single-cell MDA usually results in uneven genome coverage because of amplification bias, background amplification of contaminating DNA, and formation of chimeras by linking of non-contiguous chromosomal regions. Here, we present a novel MDA method, termed droplet MDA, that minimizes amplification bias and amplification of contaminants by using picoliter-sized droplets for compartmentalized WGA reactions. Extracted DNA fragments from a lysed cell in MDA mixture are divided into 105 droplets (67 pL within minutes via flow through simple microfluidic channels. Compartmentalized genome fragments can be individually amplified in these droplets without the risk of encounter with reagent-borne or environmental contaminants. Following quality assessment of WGA products from single Escherichia coli cells, we showed that droplet MDA minimized unexpected amplification and improved the percentage of genome recovery from 59% to 89%. Our results demonstrate that microfluidic-generated droplets show potential as an efficient tool for effective amplification of low-input DNA for single-cell genomics and greatly reduce the cost and labor investment required for determination of nearly complete genome sequences of uncultured bacteria from environmental samples.

  17. Facile mutant identification via a single parental backcross method and application of whole genome sequencing based mapping pipelines

    Directory of Open Access Journals (Sweden)

    Robert Silas Allen

    2013-09-01

    Full Text Available Forward genetic screens have identified numerous genes involved in development and metabolism, and remain a cornerstone of biological research. However to locate a causal mutation, the practice of crossing to a polymorphic background to generate a mapping population can be problematic if the mutant phenotype is difficult to recognise in the hybrid F2 progeny, or dependent on parental specific traits. Here in a screen for leaf hyponasty mutants, we have performed a single backcross of an Ethane Methyl Sulphonate (EMS generated hyponastic mutant to its parent. Whole genome deep sequencing of a bulked homozygous F2 population and analysis via the Next Generation EMS mutation mapping pipeline (NGM unambiguously determined the causal mutation to be a single nucleotide polymorphisim (SNP residing in HASTY, a previously characterised gene involved in microRNA biogenesis. We have evaluated the feasibility of this backcross approach using three additional SNP mapping pipelines; SHOREmap, the GATK pipeline, and the samtools pipeline. Although there was variance in the identification of EMS SNPs, all returned the same outcome in clearly identifying the causal mutation in HASTY. The simplicity of performing a single parental backcross and genome sequencing a small pool of segregating mutants has great promise for identifying mutations that may be difficult to map using conventional approaches.

  18. Characterization of Genomic Variants Associated with Scout and Recruit Behavioral Castes in Honey Bees Using Whole-Genome Sequencing.

    Science.gov (United States)

    Southey, Bruce R; Zhu, Ping; Carr-Markell, Morgan K; Liang, Zhengzheng S; Zayed, Amro; Li, Ruiqiang; Robinson, Gene E; Rodriguez-Zas, Sandra L

    2016-01-01

    Among forager honey bees, scouts seek new resources and return to the colony, enlisting recruits to collect these resources. Differentially expressed genes between these behaviors and genetic variability in scouting phenotypes have been reported. Whole-genome sequencing of 44 Apis mellifera scouts and recruits was undertaken to detect variants and further understand the genetic architecture underlying the behavioral differences between scouts and recruits. The median coverage depth in recruits and scouts was 10.01 and 10.7 X, respectively. Representation of bacterial species among the unmapped reads reflected a more diverse microbiome in scouts than recruits. Overall, 1,412,705 polymorphic positions were analyzed for associations with scouting behavior, and 212 significant (p-value 1000 bp apart from each other. A number of these variants were mapped to ncRNA LOC100578102, solute carrier family 12 member 6-like gene, and LOC100576965 (meprin and TRAF-C homology domain containing gene). Functional categories represented among the genes corresponding to significant variants included: neuronal function, exoskeleton, immune response, salivary gland development, and enzymatic food processing. These categories offer a glimpse into the molecular support to the behaviors of scouts and recruits. The level of association between genomic variants and scouting behavior observed in this study may be linked to the honey bee's genomic plasticity and fluidity of transition between castes.

  19. Disturbance of gene expression in primary human hepatocytes by hepatotoxic pyrrolizidine alkaloids: A whole genome transcriptome analysis.

    Science.gov (United States)

    Luckert, Claudia; Hessel, Stefanie; Lenze, Dido; Lampen, Alfonso

    2015-10-01

    1,2-unsaturated pyrrolizidine alkaloids (PA) are plant metabolites predominantly occurring in the plant families Asteraceae and Boraginaceae. Acute and chronic PA poisoning causes severe hepatotoxicity. So far, the molecular mechanisms of PA toxicity are not well understood. To analyze its mode of action, primary human hepatocytes were exposed to a non-cytotoxic dose of 100 μM of four structurally different PA: echimidine, heliotrine, senecionine, senkirkine. Changes in mRNA expression were analyzed by a whole genome microarray. Employing cut-off values with a |fold change| of 2 and a q-value of 0.01, data analysis revealed numerous changes in gene expression. In total, 4556, 1806, 3406 and 8623 genes were regulated by echimidine, heliotrine, senecione and senkirkine, respectively. 1304 genes were identified as commonly regulated. PA affected pathways related to cell cycle regulation, cell death and cancer development. The transcription factors TP53, MYC, NFκB and NUPR1 were predicted to be activated upon PA treatment. Furthermore, gene expression data showed a considerable interference with lipid metabolism and bile acid flow. The associated transcription factors FXR, LXR, SREBF1/2, and PPARα/γ/δ were predicted to be inhibited. In conclusion, though structurally different, all four PA significantly regulated a great number of genes in common. This proposes similar molecular mechanisms, although the extent seems to differ between the analyzed PA as reflected by the potential hepatotoxicity and individual PA structure.

  20. Effect of long real space flight on the whole genome mRNA expression properties in medaka Oryzias latipes

    Science.gov (United States)

    Kozlova, Olga; Gusev, Oleg; Levinskikh, Margarita; Sychev, Vladimir; Poddubko, Svetlana

    The current study is addressed to the complex analysis of whole genome mRNA expression profile and properties of splicing variants formation in different organs of medaka fish exposed to prolonged space flight in the frame of joint Russia-Japan research program “Aquarium-AQH”. The fish were kept in the AQH joint-aquariums system in October-December 2013, followed by fixation in RNA-preserving buffers and freezing during the space flight. The samples we returned to the Earth frozen in March 2013 and mRNAs from four fish were sequenced in organ-specific manner using HiSeq Illumina sequencing platform. The ground group fish treated in the same way was used as a control. The comparison between the groups revealed space group-specific specific mRNA expression pattern. More than 50 genes (including several types of myosins) were down-regulated in the space group. Moreover, we found an evidence for formation of space group-specific splicing variants of mRNA. Taking together, the data suggest that in spite of aquatic environment, space flight-associated factors have a strong effect on the activity of fish genome. This work was supported in part by subsidy of the Russian Government to support the Program of competitive growth of Kazan Federal University among world class academic centres and universities.

  1. Whole-genome sequencing to detect recent transmission of Mycobacterium tuberculosis in settings with a high burden of tuberculosis.

    Science.gov (United States)

    Luo, Tao; Yang, Chongguang; Peng, Ying; Lu, Liping; Sun, Guomei; Wu, Jie; Jin, Xiaoping; Hong, Jianjun; Li, Fabin; Mei, Jian; DeRiemer, Kathryn; Gao, Qian

    2014-07-01

    Whole genome sequencing (WGS) of Mycobacterium tuberculosis has been used to trace the transmission of M. tuberculosis, the causative agent of tuberculosis (TB). Previously published studies using WGS were conducted in developed countries with a low TB burden. We sought to evaluate the relative usefulness of traditional VNTR and SNP typing methods, WGS and epidemiological investigations to study the recent transmission of M. tuberculosis in a high TB burden country. We conducted epidemiological investigations of 42 TB patients whose M. tuberculosis isolates were classified into three clusters based on variable-number tandem repeat (VNTR) typing. We applied WGS to 32 (76.2%) of the 42 strains and calculated the pairwise genomic distances between strains within each cluster. Eighteen (56.3%) of the 32 strains had genomic differences ≥100 SNPs with every other strain, suggesting that direct transmission did not likely occurred. Ten strains were grouped into four WGS-based clusters with genomic distances ≤5 SNPs within each cluster, and confirmed epidemiological links were identified in two of these clusters. Our results indicate that WGS provides reliable resolution for tracing the transmission of M. tuberculosis in high TB burden settings. The high resolution of WGS is particularly useful to confirm or exclude the possibility of direct transmission events defined by traditional typing methods.

  2. Exploiting Bacterial Whole-Genome Sequencing Data for Evaluation of Diagnostic Assays: Campylobacter Species Identification as a Case Study.

    Science.gov (United States)

    Jansen van Rensburg, Melissa J; Swift, Craig; Cody, Alison J; Jenkins, Claire; Maiden, Martin C J

    2016-12-01

    The application of whole-genome sequencing (WGS) to problems in clinical microbiology has had a major impact on the field. Clinical laboratories are now using WGS for pathogen identification, antimicrobial susceptibility testing, and epidemiological typing. WGS data also represent a valuable resource for the development and evaluation of molecular diagnostic assays, which continue to play an important role in clinical microbiology. To demonstrate this application of WGS, this study used publicly available genomic data to evaluate a duplex real-time PCR (RT-PCR) assay that targets mapA and ceuE for the detection of Campylobacter jejuni and Campylobacter coli, leading global causes of bacterial gastroenteritis. In silico analyses of mapA and ceuE primer and probe sequences from 1,713 genetically diverse C. jejuni and C. coli genomes, supported by RT-PCR testing, indicated that the assay was robust, with 1,707 (99.7%) isolates correctly identified. The high specificity of the mapA-ceuE assay was the result of interspecies diversity and intraspecies conservation of the target genes in C. jejuni and C. coli Rare instances of a lack of specificity among C. coli isolates were due to introgression in mapA or sequence diversity in ceuE The results of this study illustrate how WGS can be exploited to evaluate molecular diagnostic assays by using publicly available data, online databases, and open-source software.

  3. CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy

    Directory of Open Access Journals (Sweden)

    Guanghong Zuo

    2015-10-01

    Full Text Available A faithful phylogeny and an objective taxonomy for prokaryotes should agree with each other and ultimately follow the genome data. With the number of sequenced genomes reaching tens of thousands, both tree inference and detailed comparison with taxonomy are great challenges. We now provide one solution in the latest Release 3.0 of the alignment-free and whole-genome-based web server CVTree3. The server resides in a cluster of 64 cores and is equipped with an interactive, collapsible, and expandable tree display. It is capable of comparing the tree branching order with prokaryotic classification at all taxonomic ranks from domains down to species and strains. CVTree3 allows for inquiry by taxon names and trial on lineage modifications. In addition, it reports a summary of monophyletic and non-monophyletic taxa at all ranks as well as produces print-quality subtree figures. After giving an overview of retrospective verification of the CVTree approach, the power of the new server is described for the mega-classification of prokaryotes and determination of taxonomic placement of some newly-sequenced genomes. A few discrepancies between CVTree and 16S rRNA analyses are also summarized with regard to possible taxonomic revisions. CVTree3 is freely accessible to all users at http://tlife.fudan.edu.cn/cvtree3/ without login requirements.

  4. Rapid and inexpensive whole-genome genotyping-by-sequencing for crossover localization and fine-scale genetic mapping.

    Science.gov (United States)

    Rowan, Beth A; Patel, Vipul; Weigel, Detlef; Schneeberger, Korbinian

    2015-01-13

    The reshuffling of existing genetic variation during meiosis is important both during evolution and in breeding. The reassortment of genetic variants relies on the formation of crossovers (COs) between homologous chromosomes. The pattern of genome-wide CO distributions can be rapidly and precisely established by the short-read sequencing of individuals from F2 populations, which in turn are useful for quantitative trait locus (QTL) mapping. Although sequencing costs have decreased precipitously in recent years, the costs of library preparation for hundreds of individuals have remained high. To enable rapid and inexpensive CO detection and QTL mapping using low-coverage whole-genome sequencing of large mapping populations, we have developed a new method for library preparation along with Trained Individual GenomE Reconstruction, a probabilistic method for genotype and CO predictions for recombinant individuals. In an example case with hundreds of F2 individuals from two Arabidopsis thaliana accessions, we resolved most CO breakpoints to within 2 kb and reduced a major flowering time QTL to a 9-kb interval. In addition, an extended region of unusually low recombination revealed a 1.8-Mb inversion polymorphism on the long arm of chromosome 4. We observed no significant differences in the frequency and distribution of COs between F2 individuals with and without a functional copy of the DNA helicase gene RECQ4A. In summary, we present a new, cost-efficient method for large-scale, high-precision genotyping-by-sequencing.

  5. Whole genome analyses of a well-differentiated liposarcoma reveals novel SYT1 and DDR2 rearrangements.

    Science.gov (United States)

    Egan, Jan B; Barrett, Michael T; Champion, Mia D; Middha, Sumit; Lenkiewicz, Elizabeth; Evers, Lisa; Francis, Princy; Schmidt, Jessica; Shi, Chang-Xin; Van Wier, Scott; Badar, Sandra; Ahmann, Gregory; Kortuem, K Martin; Boczek, Nicole J; Fonseca, Rafael; Craig, David W; Carpten, John D; Borad, Mitesh J; Stewart, A Keith

    2014-01-01

    Liposarcoma is the most common soft tissue sarcoma, but little is known about the genomic basis of this disease. Given the low cell content of this tumor type, we utilized flow cytometry to isolate the diploid normal and aneuploid tumor populations from a well-differentiated liposarcoma prior to array comparative genomic hybridization and whole genome sequencing. This work revealed massive highly focal amplifications throughout the aneuploid tumor genome including MDM2, a gene that has previously been found to be amplified in well-differentiated liposarcoma. Structural analysis revealed massive rearrangement of chromosome 12 and 11 gene fusions, some of which may be part of double minute chromosomes commonly present in well-differentiated liposarcoma. We identified a hotspot of genomic instability localized to a region of chromosome 12 that includes a highly conserved, putative L1 retrotransposon element, LOC100507498 which resides within a gene cluster (NAV3, SYT1, PAWR) where 6 of the 11 fusion events occurred. Interestingly, a potential gene fusion was also identified in amplified DDR2, which is a potential therapeutic target of kinase inhibitors such as dastinib, that are not routinely used in the treatment of patients with liposarcoma. Furthermore, 7 somatic, damaging single nucleotide variants have also been identified, including D125N in the PTPRQ protein. In conclusion, this work is the first to report the entire genome of a well-differentiated liposarcoma with novel chromosomal rearrangements associated with amplification of therapeutically targetable genes such as MDM2 and DDR2.

  6. Whole genome analyses of a well-differentiated liposarcoma reveals novel SYT1 and DDR2 rearrangements.

    Directory of Open Access Journals (Sweden)

    Jan B Egan

    Full Text Available Liposarcoma is the most common soft tissue sarcoma, but little is known about the genomic basis of this disease. Given the low cell content of this tumor type, we utilized flow cytometry to isolate the diploid normal and aneuploid tumor populations from a well-differentiated liposarcoma prior to array comparative genomic hybridization and whole genome sequencing. This work revealed massive highly focal amplifications throughout the aneuploid tumor genome including MDM2, a gene that has previously been found to be amplified in well-differentiated liposarcoma. Structural analysis revealed massive rearrangement of chromosome 12 and 11 gene fusions, some of which may be part of double minute chromosomes commonly present in well-differentiated liposarcoma. We identified a hotspot of genomic instability localized to a region of chromosome 12 that includes a highly conserved, putative L1 retrotransposon element, LOC100507498 which resides within a gene cluster (NAV3, SYT1, PAWR where 6 of the 11 fusion events occurred. Interestingly, a potential gene fusion was also identified in amplified DDR2, which is a potential therapeutic target of kinase inhibitors such as dastinib, that are not routinely used in the treatment of patients with liposarcoma. Furthermore, 7 somatic, damaging single nucleotide variants have also been identified, including D125N in the PTPRQ protein. In conclusion, this work is the first to report the entire genome of a well-differentiated liposarcoma with novel chromosomal rearrangements associated with amplification of therapeutically targetable genes such as MDM2 and DDR2.

  7. Whole genome characterization of a G6P[5] rotavirus A strain isolated from a stray cat in Japan.

    Science.gov (United States)

    Kaneko, Miho; Mochizuki, Masami; Nakagomi, Osamu; Nakagomi, Toyoko

    2016-05-30

    The whole genome of an unusual G6P[5] rotavirus A strain named FRV537, which was isolated from a stray cat in Japan, was characterized to determine its species of origin. The genotype constellation of FRV537 was G6-P[5]-I2-R2-C2-M2-A13-N2- T6-E2-H3. No known feline rotavirus has this genotype constellation; the Japanese equine strain OH-4 is the only known strain that does. While FRV537 shares the same genotype with some feline rotaviruses in all genes except those encoding VP4 and NSP1, none of these genes are closely related to those of known feline rotaviruses. By contrast, G6P[5] is almost exclusively present in bovine rotaviruses. The VP7 and VP4 genes of FRV537 formed a lineage with typical bovine rotaviruses with high bootstrap values. As to the internal capsid and nonstructural gene constellation, three bovine rotavirus strains had a constellation identical to that of FRV537. Moreover, each of the genotypes of FRV537 was found to be a common bovine genotype. In addition to the high nucleotide sequence identities between FRV537 and bovine rotaviruses in each genome segment (≥95%), phylogenetic analysis revealed a close relationship to bovine/artiodactyl rotaviruses. Thus, the molecular and phylogenetic evidence suggests that FRV537 isolated from a stray cat was of bovine rotavirus origin.

  8. Exploring the diversity of Arcobacter butzleri from cattle in the UK using MLST and whole genome sequencing.

    Science.gov (United States)

    Merga, J Yvette; Williams, Nicola J; Miller, William G; Leatherbarrow, Andrew J H; Bennett, Malcolm; Hall, Neil; Ashelford, Kevin E; Winstanley, Craig

    2013-01-01

    Arcobacter butzleri is considered to be an emerging human foodborne pathogen. The completion of an A. butzleri genome sequence along with microarray analysis of 13 isolates in 2007 revealed a surprising amount of diversity amongst A. butzleri isolates from humans, animals and food. In order to further investigate Arcobacter diversity, 792 faecal samples were collected from cattle on beef and dairy farms in the North West of England. Arcobacter was isolated from 42.5% of the samples and the diversity of the isolates was investigated using multilocus sequence typing. An A. butzleri whole genome sequence, obtained by 454 shotgun sequencing of an isolate from a clinically-healthy dairy cow, showed a number of differences when compared to the genome of a human-derived A. butzleri isolate. PCR-based prevalence assays for variable genes suggested some tentative evidence for source-related distributions. We also found evidence for phenotypic differences relating to growth capabilities between our representative human and cattle isolates. Our genotypic and phenotypic observations suggest that some level of niche adaptation may have occurred in A. butzleri.

  9. TreeSeq, a Fast and Intuitive Tool for Analysis of Whole Genome and Metagenomic Sequence Data.

    Directory of Open Access Journals (Sweden)

    Bastiaan Wintermans

    Full Text Available Next-generation sequencing is not yet commonly used in clinical laboratories because of a lack of simple and intuitive tools. We developed a software tool (TreeSeq with a quaternary tree search structure for the analysis of sequence data. This permits rapid searches for sequences of interest in large datasets. We used TreeSeq to screen a gut microbiota metagenomic dataset and a whole genome sequencing (WGS dataset of a strain of Klebsiella pneumoniae for antibiotic resistance genes and compared the results with BLAST and phenotypic resistance determination. TreeSeq was more than thirty times faster than BLAST and accurately detected resistance gene sequences in complex metagenomic data and resistance genes corresponding with the phenotypic resistance pattern of the Klebsiella strain. Resistance genes found by TreeSeq were visualized as a gene coverage heat map, aiding in the interpretation of results. TreeSeq brings analysis of metagenomic and WGS data within reach of clinical diagnostics.

  10. A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer.

    Science.gov (United States)

    Gudmundsson, Julius; Sulem, Patrick; Gudbjartsson, Daniel F; Masson, Gisli; Agnarsson, Bjarni A; Benediktsdottir, Kristrun R; Sigurdsson, Asgeir; Magnusson, Olafur Th; Gudjonsson, Sigurjon A; Magnusdottir, Droplaug N; Johannsdottir, Hrefna; Helgadottir, Hafdis Th; Stacey, Simon N; Jonasdottir, Adalbjorg; Olafsdottir, Stefania B; Thorleifsson, Gudmar; Jonasson, Jon G; Tryggvadottir, Laufey; Navarrete, Sebastian; Fuertes, Fernando; Helfand, Brian T; Hu, Qiaoyan; Csiki, Irma E; Mates, Ioan N; Jinga, Viorel; Aben, Katja K H; van Oort, Inge M; Vermeulen, Sita H; Donovan, Jenny L; Hamdy, Freddy C; Ng, Chi-Fai; Chiu, Peter K F; Lau, Kin-Mang; Ng, Maggie C Y; Gulcher, Jeffrey R; Kong, Augustine; Catalona, William J; Mayordomo, Jose I; Einarsson, Gudmundur V; Barkardottir, Rosa B; Jonsson, Eirikur; Mates, Dana; Neal, David E; Kiemeney, Lambertus A; Thorsteinsdottir, Unnur; Rafnar, Thorunn; Stefansson, Kari

    2012-12-01

    In Western countries, prostate cancer is the most prevalent cancer of men and one of the leading causes of cancer-related death in men. Several genome-wide association studies have yielded numerous common variants conferring risk of prostate cancer. Here, we analyzed 32.5 million variants discovered by whole-genome sequencing 1,795 Icelanders. We identified a new low-frequency variant at 8q24 associated with prostate cancer in European populations, rs188140481[A] (odds ratio (OR) = 2.90; P(combined) = 6.2 × 10(-34)), with an average risk allele frequency in controls of 0.54%. This variant is only very weakly correlated (r(2) ≤ 0.06) with previously reported risk variants at 8q24, and its association remains significant after adjustment for all known risk-associated variants. Carriers of rs188140481[A] were diagnosed with prostate cancer 1.26 years younger than non-carriers (P = 0.0059). We also report results for a previously described HOXB13 variant (rs138213197[T]), confirming it as a prostate cancer risk variant in populations from across Europe.

  11. Whole genome sequencing of field isolates reveals a common duplication of the Duffy binding protein gene in Malagasy Plasmodium vivax strains.

    Directory of Open Access Journals (Sweden)

    Didier Menard

    2013-11-01

    Full Text Available BACKGROUND: Plasmodium vivax is the most prevalent human malaria parasite, causing serious public health problems in malaria-endemic countries. Until recently the Duffy-negative blood group phenotype was considered to confer resistance to vivax malaria for most African ethnicities. We and others have reported that P. vivax strains in African countries from Madagascar to Mauritania display capacity to cause clinical vivax malaria in Duffy-negative people. New insights must now explain Duffy-independent P. vivax invasion of human erythrocytes. METHODS/PRINCIPAL FINDINGS: Through recent whole genome sequencing we obtained ≥ 70× coverage of the P. vivax genome from five field-isolates, resulting in ≥ 93% of the Sal I reference sequenced at coverage greater than 20×. Combined with sequences from one additional Malagasy field isolate and from five monkey-adapted strains, we describe here identification of DNA sequence rearrangements in the P. vivax genome, including discovery of a duplication of the P. vivax Duffy binding protein (PvDBP gene. A survey of Malagasy patients infected with P. vivax showed that the PvDBP duplication was present in numerous locations in Madagascar and found in over 50% of infected patients evaluated. Extended geographic surveys showed that the PvDBP duplication was detected frequently in vivax patients living in East Africa and in some residents of non-African P. vivax-endemic countries. Additionally, the PvDBP duplication was observed in travelers seeking treatment of vivax malaria upon returning home. PvDBP duplication prevalence was highest in west-central Madagascar sites where the highest frequencies of P. vivax-infected, Duffy-negative people were reported. CONCLUSIONS/SIGNIFICANCE: The highly conserved nature of the sequence involved in the PvDBP duplication suggests that it has occurred in a recent evolutionary time frame. These data suggest that PvDBP, a merozoite surface protein involved in red cell adhesion

  12. Array-based techniques for fingerprinting medicinal herbs

    Directory of Open Access Journals (Sweden)

    Xue Charlie

    2011-05-01

    Full Text Available Abstract Poor quality control of medicinal herbs has led to instances of toxicity, poisoning and even deaths. The fundamental step in quality control of herbal medicine is accurate identification of herbs. Array-based techniques have recently been adapted to authenticate or identify herbal plants. This article reviews the current array-based techniques, eg oligonucleotides microarrays, gene-based probe microarrays, Suppression Subtractive Hybridization (SSH-based arrays, Diversity Array Technology (DArT and Subtracted Diversity Array (SDA. We further compare these techniques according to important parameters such as markers, polymorphism rates, restriction enzymes and sample type. The applicability of the array-based methods for fingerprinting depends on the availability of genomics and genetics of the species to be fingerprinted. For the species with few genome sequence information but high polymorphism rates, SDA techniques are particularly recommended because they require less labour and lower material cost.

  13. Whole Genome Sequencing

    Science.gov (United States)

    ... important role. It is important to maintain a healthy lifestyle in order to minimize your risk of disease. The growing field of gene-environment research focuses on how your lifestyle and environment ...

  14. Genetic characterization of 2006-2008 isolates of Chikungunya virus from Kerala, South India, by whole genome sequence analysis.

    Science.gov (United States)

    Sreekumar, E; Issac, Aneesh; Nair, Sajith; Hariharan, Ramkumar; Janki, M B; Arathy, D S; Regu, R; Mathew, Thomas; Anoop, M; Niyas, K P; Pillai, M R

    2010-02-01

    Chikungunya virus (CHIKV), a positive-stranded alphavirus, causes epidemic febrile infections characterized by severe and prolonged arthralgia. In the present study, six CHIKV isolates (2006 RGCB03, RGCB05; 2007 RGCB80, RGCB120; 2008 RGCB355, RGCB356) from three consecutive Chikungunya outbreaks in Kerala, South India, were analyzed for genetic variations by sequencing the 11798 bp whole genome of the virus. A total of 37 novel mutations were identified and they were predominant in the 2007 and 2008 isolates among the six isolates studied. The previously identified E1 A226V critical mutation, which enhances mosquito adaptability, was present in the 2007 and 2008 samples. An important observation was the presence of two coding region substitutions, leading to nsP2 L539S and E2 K252Q change. These were identified in three isolates (2007 RGCB80 and RGCB120; 2008 RGCB355) by full-genome analysis, and also in 13 of the 31 additional samples (42%), obtained from various parts of the state, by sequencing the corresponding genomic regions. These mutations showed 100% co-occurrence in all these samples. In phylogenetic analysis, formation of a new genetic clade by these isolates within the East, Central and South African (ECSA) genotypes was observed. Homology modeling followed by mapping revealed that at least 20 of the identified mutations fall into functionally significant domains of the viral proteins and are predicted to affect protein structure. Eighteen of the identified mutations in structural proteins, including the E2 K252Q change, are predicted to disrupt T-cell epitope immunogenicity. Our study reveals that CHIK virus with novel genetic changes were present in the severe Chikungunya outbreaks in 2007 and 2008 in South India.

  15. Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Lai-Ping Wong

    2014-05-01

    Full Text Available South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP. The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP. SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.

  16. Normalization of Illumina Infinium whole-genome SNP data improves copy number estimates and allelic intensity ratios

    Directory of Open Access Journals (Sweden)

    Juliusson Gunnar

    2008-10-01

    Full Text Available Abstract Background Illumina Infinium whole genome genotyping (WGG arrays are increasingly being applied in cancer genomics to study gene copy number alterations and allele-specific aberrations such as loss-of-heterozygosity (LOH. Methods developed for normalization of WGG arrays have mostly focused on diploid, normal samples. However, for cancer samples genomic aberrations may confound normalization and data interpretation. Therefore, we examined the effects of the conventionally used normalization method for Illumina Infinium arrays when applied to cancer samples. Results We demonstrate an asymmetry in the detection of the two alleles for each SNP, which deleteriously influences both allelic proportions and copy number estimates. The asymmetry is caused by a remaining bias between the two dyes used in the Infinium II assay after using the normalization method in Illumina's proprietary software (BeadStudio. We propose a quantile normalization strategy for correction of this dye bias. We tested the normalization strategy using 535 individual hybridizations from 10 data sets from the analysis of cancer genomes and normal blood samples generated on Illumina Infinium II 300 k version 1 and 2, 370 k and 550 k BeadChips. We show that the proposed normalization strategy successfully removes asymmetry in estimates of both allelic proportions and copy numbers. Additionally, the normalization strategy reduces the technical variation for copy number estimates while retaining the response to copy number alterations. Conclusion The proposed normalization strategy represents a valuable tool that improves the quality of data obtained from Illumina Infinium arrays, in particular when used for LOH and copy number variation studies.

  17. A Bayesian Approach for Analysis of Whole-Genome Bisulphite Sequencing Data Identifies Disease-Associated Changes in DNA Methylation.

    Science.gov (United States)

    Rackham, Owen J L; Langley, Sarah R; Oates, Thomas; Vradi, Eleni; Harmston, Nathan; Srivastava, Prashant K; Behmoaras, Jacques; Dellaportas, Petros; Bottolo, Leonardo; Petretto, Enrico

    2017-02-17

    DNA methylation is a key epigenetic modification involved in gene regulation whose contribution to disease susceptibility remains to be fully understood. Here, we present a novel Bayesian smoothing approach (called ABBA) to detect differentially methylated regions (DMRs) from whole-genome bisulphite sequencing (WGBS). We also show how this approach can be leveraged to identify disease-associated changes in DNA methylation, suggesting mechanisms through which these alterations might affect disease. From a data modeling perspective, ABBA has the distinctive feature of automatically adapting to different correlation structures in CpG methylation levels across the genome whilst taking into account the distance between CpG sites as a covariate. Our simulation study shows that ABBA has greater power to detect DMRs than existing methods, providing an accurate identification of DMRs in the large majority of simulated cases. To empirically demonstrate the method's efficacy in generating biological hypotheses, we performed WGBS of primary macrophages derived from an experimental rat system of glomerulonephritis and used ABBA to identify >1,000 disease-associated DMRs. Investigation of these DMRs revealed differential DNA methylation localized to a 600bp region in the promoter of the Ifitm3 gene. This was confirmed by ChIP-seq and RNA-seq analyses, showing differential transcription factor binding at the Ifitm3 promoter by JunD (an established determinant of glomerulonephritis) and a consistent change in Ifitm3 expression. Our ABBA analysis allowed us to propose a new role for Ifitm3 in the pathogenesis of glomerulonephritis via a mechanism involving promoter hypermethylation that is associated with Ifitm3 repression in the rat strain susceptible to glomerulonephritis.

  18. Whole-Genome Transcriptional Analysis of Chemolithoautotrophic Thiosulfate Oxidation by Thiobacillus denitrificans Under Aerobic vs. Denitrifying Conditions

    Energy Technology Data Exchange (ETDEWEB)

    Beller, H R; Letain, T E; Chakicherla, A; Kane, S R; Legler, T C; Coleman, M A

    2006-04-22

    Thiobacillus denitrificans is one of the few known obligate chemolithoautotrophic bacteria capable of energetically coupling thiosulfate oxidation to denitrification as well as aerobic respiration. As very little is known about the differential expression of genes associated with ke chemolithoautotrophic functions (such as sulfur-compound oxidation and CO2 fixation) under aerobic versus denitrifying conditions, we conducted whole-genome, cDNA microarray studies to explore this topic systematically. The microarrays identified 277 genes (approximately ten percent of the genome) as differentially expressed using Robust Multi-array Average statistical analysis and a 2-fold cutoff. Genes upregulated (ca. 6- to 150-fold) under aerobic conditions included a cluster of genes associated with iron acquisition (e.g., siderophore-related genes), a cluster of cytochrome cbb3 oxidase genes, cbbL and cbbS (encoding the large and small subunits of form I ribulose 1,5-bisphosphate carboxylase/oxygenase, or RubisCO), and multiple molecular chaperone genes. Genes upregulated (ca. 4- to 95-fold) under denitrifying conditions included nar, nir, and nor genes (associated respectively with nitrate reductase, nitrite reductase, and nitric oxide reductase, which catalyze successive steps of denitrification), cbbM (encoding form II RubisCO), and genes involved with sulfur-compound oxidation (including two physically separated but highly similar copies of sulfide:quinone oxidoreductase and of dsrC, associated with dissimilatory sulfite reductase). Among genes associated with denitrification, relative expression levels (i.e., degree of upregulation with nitrate) tended to decrease in the order nar > nir > nor > nos. Reverse transcription, quantitative PCR analysis was used to validate these trends.

  19. Whole genome comparisons suggest random distribution of Mycobacterium ulcerans genotypes in a Buruli ulcer endemic region of Ghana.

    Science.gov (United States)

    Ablordey, Anthony S; Vandelannoote, Koen; Frimpong, Isaac A; Ahortor, Evans K; Amissah, Nana Ama; Eddyani, Miriam; Durnez, Lies; Portaels, Françoise; de Jong, Bouke C; Leirs, Herwig; Porter, Jessica L; Mangas, Kirstie M; Lam, Margaret M C; Buultjens, Andrew; Seemann, Torsten; Tobias, Nicholas J; Stinear, Timothy P

    2015-03-01

    Efforts to control the spread of Buruli ulcer--an emerging ulcerative skin infection caused by Mycobacterium ulcerans--have been hampered by our poor understanding of reservoirs and transmission. To help address this issue, we compared whole genomes from 18 clinical M. ulcerans isolates from a 30 km2 region within the Asante Akim North District, Ashanti region, Ghana, with 15 other M. ulcerans isolates from elsewhere in Ghana and the surrounding countries of Ivory Coast, Togo, Benin and Nigeria. Contrary to our expectations of finding minor DNA sequence variations among isolates representing a single M. ulcerans circulating genotype, we found instead two distinct genotypes. One genotype was closely related to isolates from neighbouring regions of Amansie West and Densu, consistent with the predicted local endemic clone, but the second genotype (separated by 138 single nucleotide polymorphisms [SNPs] from other Ghanaian strains) most closely matched M. ulcerans from Nigeria, suggesting another introduction of M. ulcerans to Ghana, perhaps from that country. Both the exotic genotype and the local Ghanaian genotype displayed highly restricted intra-strain genetic variation, with less than 50 SNP differences across a 5.2 Mbp core genome within each genotype. Interestingly, there was no discernible spatial clustering of genotypes at the local village scale. Interviews revealed no obvious epidemiological links among BU patients who had been infected with identical M. ulcerans genotypes but lived in geographically separate villages. We conclude that M. ulcerans is spread widely across the region, with multiple genotypes present in any one area. These data give us new perspectives on the behaviour of possible reservoirs and subsequent transmission mechanisms of M. ulcerans. These observations also show for the first time that M. ulcerans can be mobilized, introduced to a new area and then spread within a population. Potential reservoirs of M. ulcerans thus might include

  20. Novel degenerate PCR method for whole genome amplification applied to Peru Margin (ODP Leg 201 subsurface samples

    Directory of Open Access Journals (Sweden)

    Amanda eMartino

    2012-01-01

    Full Text Available A degenerate PCR-based method of whole-genome amplification, designed to work fluidly with 454 sequencing technology, was developed and tested for use on deep marine subsurface DNA samples. The method, which we have called Random Amplification Metagenomic PCR (RAMP, involves the use of specific primers from Roche 454 amplicon sequencing, modified by the addition of a degenerate region at the 3’ end. It utilizes a PCR reaction, which resulted in no amplification from blanks, even after 50 cycles of PCR. After efforts to optimize experimental conditions, the method was tested with DNA extracted from cultured E. coli cells, and genome coverage was estimated after sequencing on three different occasions. Coverage did not vary greatly with the different experimental conditions tested, and was around 62% with a sequencing effort equivalent to a theoretical genome coverage of 14.10X. The GC content of the sequenced amplification product was within 2% of the predicted values for this strain of E. coli. The method was also applied to DNA extracted from marine subsurface samples from ODP Leg 201 site 1229 (Peru Margin, and results of a taxonomic analysis revealed microbial communities dominated by Proteobacteria, Chloroflexi, Firmicutes, Euryarchaeota, and Crenarchaeota, among others. These results were similar to those obtained previously for those samples; however, variations in the proportions of taxa show that community analysis can be sensitive to both the amplification technique used and the method of assigning sequences to taxonomic groups. Overall, we find that RAMP represents a valid methodology for amplifying metagenomes from low biomass samples.

  1. Whole genome sequence of Klebsiella pneumoniae U25, a hypermucoviscous, multidrug resistant, biofilm producing isolate from India

    Directory of Open Access Journals (Sweden)

    Zumaana Rafiq

    2016-02-01

    Full Text Available Klebsiella pneumoniae U25 is a multidrug resistant strain isolated from a tertiary care hospital in Chennai, India. Here, we report the complete annotated genome sequence of strain U25 obtained using PacBio RSII. This is the first report of the whole genome of K. pneumoniaespecies from Chennai. It consists of a single circular chromosome of size 5,491,870-bp and two plasmids of size 211,813 and 172,619-bp. The genes associated with multidrug resistance were identified. The chromosome of U25 was found to have eight antibiotic resistant genes [blaOXA-1,blaSHV-28, aac(6’1b-cr,catB3, oqxAB, dfrA1]. The plasmid pMGRU25-001 was found to have only one resistant gene (catA1 while plasmid pMGRU25-002 had 20 resistant genes [strAB, aadA1,aac(6’-Ib, aac(3-IId,sul1,2, blaTEM-1A,1B,blaOXA-9, blaCTX-M-15,blaSHV-11, cmlA1, erm(B,mph(A]. A mutation in the porin OmpK36 was identified which is likely to be associated with the intermediate resistance to carbapenems in the absence of carbapenemase genes. U25 is one of the few K. pneumoniaestrains to harbour clustered regularly interspaced short palindromic repeats (CRISPR systems. Two CRISPR arrays corresponding to Cas3 family helicase were identified in the genome. When compared to K. pneumoniaeNTUHK2044, a transposase gene InsH of IS5-13 was found inserted.

  2. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Directory of Open Access Journals (Sweden)

    Param Priya Singh

    2015-07-01

    Full Text Available Whole genome duplications (WGD have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  3. Implementation of exon arrays: alternative splicing during T-cell proliferation as determined by whole genome analysis

    Directory of Open Access Journals (Sweden)

    Whistler Toni

    2010-09-01

    Full Text Available Abstract Background The contribution of alternative splicing and isoform expression to cellular response is emerging as an area of considerable interest, and the newly developed exon arrays allow for systematic study of these processes. We use this pilot study to report on the feasibility of exon array implementation looking to replace the 3' in vitro transcription expression arrays in our laboratory. One of the most widely studied models of cellular response is T-cell activation from exogenous stimulation. Microarray studies have contributed to our understanding of key pathways activated during T-cell stimulation. We use this system to examine whole genome transcription and alternate exon usage events that are regulated during lymphocyte proliferation in an attempt to evaluate the exon arrays. Results Peripheral blood mononuclear cells form healthy donors were activated using phytohemagglutinin, IL2 and ionomycin and harvested at 5 points over a 7 day period. Flow cytometry measured cell cycle events and the Affymetrix exon array platform was used to identify the gene expression and alternate exon usage changes. Gene expression changes were noted in a total of 2105 transcripts, and alternate exon usage identified in 472 transcript clusters. There was an overlap of 263 transcripts which showed both differential expression and alternate exon usage over time. Gene ontology enrichment analysis showed a broader range of biological changes in biological processes for the differentially expressed genes, which include cell cycle, cell division, cell proliferation, chromosome segregation, cell death, component organization and biogenesis and metabolic process ontologies. The alternate exon usage ontological enrichments are in metabolism and component organization and biogenesis. We focus on alternate exon usage changes in the transcripts of the spliceosome complex. The real-time PCR validation rates were 86% for transcript expression and 71% for

  4. Whole genome comparisons suggest random distribution of Mycobacterium ulcerans genotypes in a Buruli ulcer endemic region of Ghana.

    Directory of Open Access Journals (Sweden)

    Anthony S Ablordey

    2015-03-01

    Full Text Available Efforts to control the spread of Buruli ulcer--an emerging ulcerative skin infection caused by Mycobacterium ulcerans--have been hampered by our poor understanding of reservoirs and transmission. To help address this issue, we compared whole genomes from 18 clinical M. ulcerans isolates from a 30 km2 region within the Asante Akim North District, Ashanti region, Ghana, with 15 other M. ulcerans isolates from elsewhere in Ghana and the surrounding countries of Ivory Coast, Togo, Benin and Nigeria. Contrary to our expectations of finding minor DNA sequence variations among isolates representing a single M. ulcerans circulating genotype, we found instead two distinct genotypes. One genotype was closely related to isolates from neighbouring regions of Amansie West and Densu, consistent with the predicted local endemic clone, but the second genotype (separated by 138 single nucleotide polymorphisms [SNPs] from other Ghanaian strains most closely matched M. ulcerans from Nigeria, suggesting another introduction of M. ulcerans to Ghana, perhaps from that country. Both the exotic genotype and the local Ghanaian genotype displayed highly restricted intra-strain genetic variation, with less than 50 SNP differences across a 5.2 Mbp core genome within each genotype. Interestingly, there was no discernible spatial clustering of genotypes at the local village scale. Interviews revealed no obvious epidemiological links among BU patients who had been infected with identical M. ulcerans genotypes but lived in geographically separate villages. We conclude that M. ulcerans is spread widely across the region, with multiple genotypes present in any one area. These data give us new perspectives on the behaviour of possible reservoirs and subsequent transmission mechanisms of M. ulcerans. These observations also show for the first time that M. ulcerans can be mobilized, introduced to a new area and then spread within a population. Potential reservoirs of M. ulcerans

  5. Resolving the question of trypanosome monophyly: a comparative genomics approach using whole genome data sets with low taxon sampling.

    Science.gov (United States)

    Leonard, Guy; Soanes, Darren M; Stevens, Jamie R

    2011-07-01

    Since the first attempts to classify the evolutionary history of trypanosomes, there have been conflicting reports regarding their true phylogenetic relationships and, in particular, their relationships with other vertebrate trypanosomatids, e.g. Leishmania sp., as well as with the many insect parasitising trypanosomatids. Perhaps the issue that has provided most debate is that concerning the monophyly (or otherwise) of genus Trypanosoma and, even with the advent of molecular methods, the findings of numerous studies have varied significantly depending on the gene sequences analysed, number of taxa included, choice of outgroup and phylogenetic methodology. While of arguably limited applied importance, resolution of the question as to whether or not trypanosomes are monophyletic is critical to accurate evaluation of competing, mutually exclusive evolutionary scenarios for these parasites, namely the 'vertebrate-first' or 'insect-first' hypotheses. Therefore, a new approach, which could overcome previous limitations was needed. At its most simple, the problem can be defined within the framework of a trifurcated tree with three hypothetical positions at which the root can be placed. Using BLASTp and whole-genome gene-by-gene phylogenetic analyses of Trypanosoma brucei, Trypanosoma cruzi, Leishmania major and Naegleria gruberi, we have identified 599 gene markers--putative homologues--that were shared between the genomes of these four taxa. Of these, 75 homologous gene families that demonstrate monophyly of the kinetoplastids were identified. We then used these data sets in combination with an additional outgroup, Euglena gracilis, coupled with large-scale gene concatenation and diverse phylogenetic techniques to investigate the relative branching order of T. brucei, T. cruzi and L. major. Our findings confirm the monophyly of genus Trypanosoma and demonstrate that <1% of the analysed gene markers shared between the genomes of T. brucei, T. cruzi and L. major reject

  6. Fast homozygosity mapping and identification of a zebrafish ENU-induced mutation by whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Marianne L Voz

    Full Text Available Forward genetics using zebrafish is a powerful tool for studying vertebrate development through large-scale mutagenesis. Nonetheless, the identification of the molecular lesion is still laborious and involves time-consuming genetic mapping. Here, we show that high-throughput sequencing of the whole zebrafish genome can directly locate the interval carrying the causative mutation and at the same time pinpoint the molecular lesion. The feasibility of this approach was validated by sequencing the m1045 mutant line that displays a severe hypoplasia of the exocrine pancreas. We generated 13 Gb of sequence, equivalent to an eightfold genomic coverage, from a pool of 50 mutant embryos obtained from a map-cross between the AB mutant carrier and the WIK polymorphic strain. The chromosomal region carrying the causal mutation was localized based on its unique property to display high levels of homozygosity among sequence reads as it derives exclusively from the initial AB mutated allele. We developed an algorithm identifying such a region by calculating a homozygosity score along all chromosomes. This highlighted an 8-Mb window on chromosome 5 with a score close to 1 in the m1045 mutants. The sequence analysis of all genes within this interval revealed a nonsense mutation in the snapc4 gene. Knockdown experiments confirmed the assertion that snapc4 is the gene whose mutation leads to exocrine pancreas hypoplasia. In conclusion, this study constitutes a proof-of-concept that whole-genome sequencing is a fast and effective alternative to the classical positional cloning strategies in zebrafish.

  7. In silico detection of phylogenetic informative Y-chromosomal single nucleotide polymorphisms from whole genome sequencing data.

    Science.gov (United States)

    Van Geystelen, Anneleen; Wenseleers, Tom; Decorte, Ronny; Caspers, Maarten J L; Larmuseau, Maarten H D

    2014-11-01

    A state-of-the-art phylogeny of the human Y-chromosome is an essential tool for forensic genetics. The explosion of whole genome sequencing (WGS) data due to the rapid progress of next-generation sequencing facilities is useful to optimize and to increase the resolution of the phylogenetic Y-chromosomal tree. The most interesting Y-chromosomal variants to increase the phylogeny are SNPs (Y-SNPs) especially since the software to call them in WGS data and to genotype them in forensic assays has been optimized over the past years. The PENNY software presented here detects potentially phylogenetic interesting Y-SNPs in silico based on SNP calling data files and classifies them into different types according to their position in the currently used Y-chromosomal tree. The software utilized 790 available male WGS samples of which 172 had a high SNP calling quality. In total, 1269 Y-SNPs potentially capable of increasing the resolution of the Y-chromosomal phylogenetic tree were detected based on a first run with PENNY. Based on a test panel of 57 high-quality and 618 low-quality WGS samples, we could prove that these newly added Y-SNPs indeed increased the resolution of the phylogenetic Y-chromosomal analysis substantially. Finally, we performed a second run with PENNY whereby all samples including those of the test panel are used and this resulted in 509 additional phylogenetic promising Y-SNPs. By including these additional Y-SNPs, a final update of the present phylogenetic Y-chromosomal tree which is useful for forensic applications was generated. In order to find more convincing forensic interesting Y-SNPs with this PENNY software, the number of samples and variety of the haplogroups to which these samples belong needs to increase. The PENNY software (inclusive the user manual) is freely available on the website http://bio.kuleuven.be/eeb/lbeg/software.

  8. Clinical whole-genome sequencing in severe early-onset epilepsy reveals new genes and improves molecular diagnosis.

    Science.gov (United States)

    Martin, Hilary C; Kim, Grace E; Pagnamenta, Alistair T; Murakami, Yoshiko; Carvill, Gemma L; Meyer, Esther; Copley, Richard R; Rimmer, Andrew; Barcia, Giulia; Fleming, Matthew R; Kronengold, Jack; Brown, Maile R; Hudspith, Karl A; Broxholme, John; Kanapin, Alexander; Cazier, Jean-Baptiste; Kinoshita, Taroh; Nabbout, Rima; Bentley, David; McVean, Gil; Heavin, Sinéad; Zaiwalla, Zenobia; McShane, Tony; Mefford, Heather C; Shears, Deborah; Stewart, Helen; Kurian, Manju A; Scheffer, Ingrid E; Blair, Edward; Donnelly, Peter; Kaczmarek, Leonard K; Taylor, Jenny C

    2014-06-15

    In severe early-onset epilepsy, precise clinical and molecular genetic diagnosis is complex, as many metabolic and electro-physiological processes have been implicated in disease causation. The clinical phenotypes share many features such as complex seizure types and developmental delay. Molecular diagnosis has historically been confined to sequential testing of candidate genes known to be associated with specific sub-phenotypes, but the diagnostic yield of this approach can be low. We conducted whole-genome sequencing (WGS) on six patients with severe early-onset epilepsy who had previously been refractory to molecular diagnosis, and their parents. Four of these patients had a clinical diagnosis of Ohtahara Syndrome (OS) and two patients had severe non-syndromic early-onset epilepsy (NSEOE). In two OS cases, we found de novo non-synonymous mutations in the genes KCNQ2 and SCN2A. In a third OS case, WGS revealed paternal isodisomy for chromosome 9, leading to identification of the causal homozygous missense variant in KCNT1, which produced a substantial increase in potassium channel current. The fourth OS patient had a recessive mutation in PIGQ that led to exon skipping and defective glycophosphatidyl inositol biosynthesis. The two patients with NSEOE had likely pathogenic de novo mutations in CBL and CSNK1G1, respectively. Mutations in these genes were not found among 500 additional individuals with epilepsy. This work reveals two novel genes for OS, KCNT1 and PIGQ. It also uncovers unexpected genetic mechanisms and emphasizes the power of WGS as a clinical tool for making molecular diagnoses, particularly for highly heterogeneous disorders.

  9. Whole Genome Sequencing of Mycobacterium africanum Strains from Mali Provides Insights into the Mechanisms of Geographic Restriction

    Science.gov (United States)

    Maiga, Mamoudou; Abeel, Thomas; Shea, Terrance; Desjardins, Christopher A.; Diarra, Bassirou; Baya, Bocar; Sanogo, Moumine; Diallo, Souleymane; Earl, Ashlee M.; Bishai, William R.

    2016-01-01

    Background Mycobacterium africanum, made up of lineages 5 and 6 within the Mycobacterium tuberculosis complex (MTC), causes up to half of all tuberculosis cases in West Africa, but is rarely found outside of this region. The reasons for this geographical restriction remain unknown. Possible reasons include a geographically restricted animal reservoir, a unique preference for hosts of West African ethnicity, and an inability to compete with other lineages outside of West Africa. These latter two hypotheses could be caused by loss of fitness or altered interactions with the host immune system. Methodology/Principal Findings We sequenced 92 MTC clinical isolates from Mali, including two lineage 5 and 24 lineage 6 strains. Our genome sequencing assembly, alignment, phylogeny and average nucleotide identity analyses enabled us to identify features that typify lineages 5 and 6 and made clear that these lineages do not constitute a distinct species within the MTC. We found that in Mali, lineage 6 and lineage 4 strains have similar levels of diversity and evolve drug resistance through similar mechanisms. In the process, we identified a putative novel streptomycin resistance mutation. In addition, we found evidence of person-to-person transmission of lineage 6 isolates and showed that lineage 6 is not enriched for mutations in virulence-associated genes. Conclusions This is the largest collection of lineage 5 and 6 whole genome sequences to date, and our assembly and alignment data provide valuable insights into what distinguishes these lineages from other MTC lineages. Lineages 5 and 6 do not appear to be geographically restricted due to an inability to transmit between West African hosts or to an elevated number of mutations in virulence-associated genes. However, lineage-specific mutations, such as mutations in cell wall structure, secretion systems and cofactor biosynthesis, provide alternative mechanisms that may lead to host specificity. PMID:26751217

  10. Changes in colorectal carcinoma genomes under anti-EGFR therapy identified by whole-genome plasma DNA sequencing.

    Directory of Open Access Journals (Sweden)

    Sumitra Mohan

    2014-03-01

    Full Text Available Monoclonal antibodies targeting the Epidermal Growth Factor Receptor (EGFR, such as cetuximab and panitumumab, have evolved to important therapeutic options in metastatic colorectal cancer (CRC. However, almost all patients with clinical response to anti-EGFR therapies show disease progression within a few months and little is known about mechanism and timing of resistance evolution. Here we analyzed plasma DNA from ten patients treated with anti-EGFR therapy by whole genome sequencing (plasma-Seq and ultra-sensitive deep sequencing of genes associated with resistance to anti-EGFR treatment such as KRAS, BRAF, PIK3CA, and EGFR. Surprisingly, we observed that the development of resistance to anti-EGFR therapies was associated with acquired gains of KRAS in four patients (40%, which occurred either as novel focal amplifications (n = 3 or as high level polysomy of 12p (n = 1. In addition, we observed focal amplifications of other genes recently shown to be involved in acquired resistance to anti-EGFR therapies, such as MET (n = 2 and ERBB2 (n = 1. Overrepresentation of the EGFR gene was associated with a good initial anti-EGFR efficacy. Overall, we identified predictive biomarkers associated with anti-EGFR efficacy in seven patients (70%, which correlated well with treatment response. In contrast, ultra-sensitive deep sequencing of KRAS, BRAF, PIK3CA, and EGFR did not reveal the occurrence of novel, acquired mutations. Thus, plasma-Seq enables the identification of novel mutant clones and may therefore facilitate early adjustments of therapies that may delay or prevent disease progression.

  11. Klebsiella pneumoniae Carbapenemase (KPC)-Producing K. pneumoniae at a Single Institution: Insights into Endemicity from Whole-Genome Sequencing

    Science.gov (United States)

    Stoesser, Nicole; Sheppard, Anna E.; Pankhurst, Louise; Giess, Adam; Yeh, Anthony J.; Didelot, Xavier; Turner, Stephen D.; Sebra, Robert; Kasarskis, Andrew; Peto, Tim; Crook, Derrick; Sifri, Costi D.

    2015-01-01

    The global emergence of Klebsiella pneumoniae carbapenemase-producing K. pneumoniae (KPC-Kp) multilocus sequence type ST258 is widely recognized. Less is known about the molecular and epidemiological details of non-ST258 K. pneumoniae in the setting of an outbreak mediated by an endemic plasmid. We describe the interplay of blaKPC plasmids and K. pneumoniae strains and their relationship to the location of acquisition in a U.S. health care institution. Whole-genome sequencing (WGS) analysis was applied to KPC-Kp clinical isolates collected from a single institution over 5 years following the introduction of blaKPC in August 2007, as well as two plasmid transformants. KPC-Kp from 37 patients yielded 16 distinct sequence types (STs). Two novel conjugative blaKPC plasmids (pKPC_UVA01 and pKPC_UVA02), carried by the hospital index case, accounted for the presence of blaKPC in 21/37 (57%) subsequent cases. Thirteen (35%) isolates represented an emergent lineage, ST941, which contained pKPC_UVA01 in 5/13 (38%) and pKPC_UVA02 in 6/13 (46%) cases. Seven (19%) isolates were the epidemic KPC-Kp strain, ST258, mostly imported from elsewhere and not carrying pKPC_UVA01 or pKPC_UVA02. Using WGS-based analysis of clinical isolates and plasmid transformants, we demonstrate the unexpected dispersal of blaKPC to many non-ST258 lineages in a hospital through spread of at least two novel blaKPC plasmids. In contrast, ST258 KPC-Kp was imported into the institution on numerous occasions, with other blaKPC plasmid vectors and without sustained transmission. Instead, a newly recognized KPC-Kp strain, ST941, became associated with both novel blaKPC plasmids and spread locally, making it a future candidate for clinical persistence and dissemination. PMID:25561339

  12. Clonality and Resistome Analysis of KPC-Producing Klebsiella pneumoniae Strain Isolated in Korea Using Whole Genome Sequencing

    Science.gov (United States)

    Yong, Ji Hyun; Lee, Yeong Seon; Yoo, Jung Sik; Yong, Dongeun; Hong, Seong Geun; D'Souza, Roshan; Thomson, Kenneth S.; Lee, Kyungwon; Chong, Yunsop

    2014-01-01

    We analyzed the whole genome sequence and resistome of the outbreak Klebsiella pneumoniae strain MP14 and compared it with those of K. pneumoniae carbapenemase- (KPC-) producing isolates that showed high similarity in the NCBI genome database. A KPC-2-producing multidrug-resistant (MDR) K. pneumoniae clinical isolate was obtained from a patient admitted to a Korean hospital in 2011. The strain MP14 was resistant to all tested β-lactams including monobactam, amikacin, levofloxacin, and cotrimoxazole, but susceptible to tigecycline and colistin. Resistome analysis showed the presence of β-lactamase genes including blaKPC-2, blaSHV-11, blaTEM-169, and blaOXA-9. MP14 also possessed aac(6′-)Ib, aadA2, and aph(3′-)Ia as aminoglycoside resistance-encoding genes, mph(A) for macrolides, oqxA and oqxB for quinolone, catA1 for phenicol, sul1 for sulfonamide, and dfrA12 for trimethoprim. Both SNP tree and cgMLST analysis showed the close relatedness with the KPC producers (KPNIH strains) isolated from an outbreak in the USA and colistin-resistant strains isolated in Italy. The plasmid-scaffold genes in plasmids pKpQil, pKpQil-IT, pKPN3, or pKPN-IT were identified in MP14, KPNIH, and Italian strains. The KPC-2-producing MDR K. pneumoniae ST258 stain isolated in Korea was highly clonally related with MDR K. pneumoniae strains from the USA and Italy. Global spread of KPC-producing K. pneumoniae is a worrying phenomenon. PMID:25105122

  13. Clonality and Resistome Analysis of KPC-Producing Klebsiella pneumoniae Strain Isolated in Korea Using Whole Genome Sequencing

    Directory of Open Access Journals (Sweden)

    Yangsoon Lee

    2014-01-01

    Full Text Available We analyzed the whole genome sequence and resistome of the outbreak Klebsiella pneumoniae strain MP14 and compared it with those of K. pneumoniae carbapenemase- (KPC- producing isolates that showed high similarity in the NCBI genome database. A KPC-2-producing multidrug-resistant (MDR K. pneumoniae clinical isolate was obtained from a patient admitted to a Korean hospital in 2011. The strain MP14 was resistant to all tested β-lactams including monobactam, amikacin, levofloxacin, and cotrimoxazole, but susceptible to tigecycline and colistin. Resistome analysis showed the presence of β-lactamase genes including blaKPC-2, blaSHV-11, blaTEM-169, and blaOXA-9. MP14 also possessed aac(6′-Ib, aadA2, and aph(3′-Ia as aminoglycoside resistance-encoding genes, mph(A for macrolides, oqxA and oqxB for quinolone, catA1 for phenicol, sul1 for sulfonamide, and dfrA12 for trimethoprim. Both SNP tree and cgMLST analysis showed the close relatedness with the KPC producers (KPNIH strains isolated from an outbreak in the USA and colistin-resistant strains isolated in Italy. The plasmid-scaffold genes in plasmids pKpQil, pKpQil-IT, pKPN3, or pKPN-IT were identified in MP14, KPNIH, and Italian strains. The KPC-2-producing MDR K. pneumoniae ST258 stain isolated in Korea was highly clonally related with MDR K. pneumoniae strains from the USA and Italy. Global spread of KPC-producing K. pneumoniae is a worrying phenomenon.

  14. Whole-genome comparative analysis of virulence genes unveils similarities and differences between endophytes and other symbiotic bacteria

    Directory of Open Access Journals (Sweden)

    Sebastian eLòpez-Fernàndez

    2015-05-01

    Full Text Available Plant pathogens and endophytes co-exist and often interact with the host plant and within its microbial community. The outcome of these interactions may lead to healthy plants through beneficial interactions, or to disease through the inducible production of molecules known as virulence factors. Unravelling the role of virulence in endophytes may crucially improve our understanding of host-associated microbial communities and their correlation with host health.Virulence is the outcome of a complex network of interactions, and drawing the line between pathogens and endophytes has proven to be conflictive, as strain-level differences in niche overlapping, ecological interactions, state of the host’s immune system and environmental factors are seldom taken into account. Defining genomic differences between endophytes and plant pathogens is decisive for understanding the boundaries between these two groups. Here we describe the major differences at the genomic level between seven grapevine endophytic test bacteria, and twelve reference strains. We describe the virulence factors detected in the genomes of the test group, as compared to endophytic and non-endophytic references, to better understand the distribution of these traits in endophytic genomes. To do this, we adopted a comparative whole-genome approach, encompassing BLAST-based searches through the GUI-based tools Mauve and BRIG as well as calculating the core and accessory genomes of three genera of enterobacteria. We outline divergences in metabolic pathways of these endophytes and reference strains, with the aid of the online platform RAST. We present a summary of the major differences that help in the drawing of the boundaries between harmless and harmful bacteria, in the spirit of contributing to a microbiological definition of endophyte.

  15. Whole Genome Sequencing of Mycobacterium africanum Strains from Mali Provides Insights into the Mechanisms of Geographic Restriction.

    Directory of Open Access Journals (Sweden)

    Kathryn Winglee

    2016-01-01

    Full Text Available Mycobacterium africanum, made up of lineages 5 and 6 within the Mycobacterium tuberculosis complex (MTC, causes up to half of all tuberculosis cases in West Africa, but is rarely found outside of this region. The reasons for this geographical restriction remain unknown. Possible reasons include a geographically restricted animal reservoir, a unique preference for hosts of West African ethnicity, and an inability to compete with other lineages outside of West Africa. These latter two hypotheses could be caused by loss of fitness or altered interactions with the host immune system.We sequenced 92 MTC clinical isolates from Mali, including two lineage 5 and 24 lineage 6 strains. Our genome sequencing assembly, alignment, phylogeny and average nucleotide identity analyses enabled us to identify features that typify lineages 5 and 6 and made clear that these lineages do not constitute a distinct species within the MTC. We found that in Mali, lineage 6 and lineage 4 strains have similar levels of diversity and evolve drug resistance through similar mechanisms. In the process, we identified a putative novel streptomycin resistance mutation. In addition, we found evidence of person-to-person transmission of lineage 6 isolates and showed that lineage 6 is not enriched for mutations in virulence-associated genes.This is the largest collection of lineage 5 and 6 whole genome sequences to date, and our assembly and alignment data provide valuable insights into what distinguishes these lineages from other MTC lineages. Lineages 5 and 6 do not appear to be geographically restricted due to an inability to transmit between West African hosts or to an elevated number of mutations in virulence-associated genes. However, lineage-specific mutations, such as mutations in cell wall structure, secretion systems and cofactor biosynthesis, provide alternative mechanisms that may lead to host specificity.

  16. Attitudes of non-African American focus group participants toward return of results from exome and whole genome sequencing.

    Science.gov (United States)

    Yu, Joon-Ho; Crouch, Julia; Jamal, Seema M; Bamshad, Michael J; Tabor, Holly K

    2014-09-01

    Exome sequencing and whole genome sequencing (ES/WGS) present individuals with the opportunity to benefit from a broad scope of genetic results of clinical and personal utility. Yet, it is unclear which genetic results people want to receive (i.e., what type of genetic information they want to learn about themselves) or conversely not receive, and how they want to receive or manage results over time. Very little is known about whether and how attitudes toward receiving individual results from ES/WGS vary among racial/ethnic populations. We conducted 13 focus groups with a racially and ethnically diverse parent population (n = 76) to investigate attitudes toward return of individual results from WGS. We report on our findings for non-African American (non-AA) participants. Non-AA participants were primarily interested in genetic results on which they could act or "do something about." They defined "actionability" broadly to include individual medical treatment and disease prevention. The ability to plan for the future was both a motivation for and an expected benefit of receiving results. Their concerns focused on the meaning of results, specifically the potential inaccuracy and uncertainty of results. Non-AA participants expected healthcare providers to be involved in results management by helping them interpret results in the context of their own health and by providing counseling support. We compare and contrast these themes with those we previously reported from our analysis of African American (AA) perspectives to highlight the importance of varying preferences for results, characterize the central role of temporal orientation in framing expectations about the possibility of receiving ES/WGS results, and identify potential avenues by which genomic healthcare disparities may be inadvertently perpetuated.

  17. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Science.gov (United States)

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-07-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  18. Whole Genome Sequence and Phylogenetic Analysis Show Helicobacter pylori Strains from Latin America Have Followed a Unique Evolution Pathway

    Science.gov (United States)

    Muñoz-Ramírez, Zilia Y.; Mendez-Tenorio, Alfonso; Kato, Ikuko; Bravo, Maria M.; Rizzato, Cosmeri; Thorell, Kaisa; Torres, Roberto; Aviles-Jimenez, Francisco; Camorlinga, Margarita; Canzian, Federico; Torres, Javier

    2017-01-01

    Helicobacter pylori (HP) genetics may determine its clinical outcomes. Despite high prevalence of HP infection in Latin America (LA), there have been no phylogenetic studies in the region. We aimed to understand the structure of HP populations in LA mestizo individuals, where gastric cancer incidence remains high. The genome of 107 HP strains from Mexico, Nicaragua and Colombia were analyzed with 59 publicly available worldwide genomes. To study bacterial relationship on whole genome level we propose a virtual hybridization technique using thousands of high-entropy 13 bp DNA probes to generate fingerprints. Phylogenetic virtual genome fingerprint (VGF) was compared with Multi Locus Sequence Analysis (MLST) and with phylogenetic analyses of cagPAI virulence island sequences. With MLST some Nicaraguan and Mexican strains clustered close to Africa isolates, whereas European isolates were spread without clustering and intermingled with LA isolates. VGF analysis resulted in increased resolution of populations, separating European from LA strains. Furthermore, clusters with exclusively Colombian, Mexican, or Nicaraguan strains were observed, where the Colombian cluster separated from Europe, Asia, and Africa, while Nicaraguan and Mexican clades grouped close to Africa. In addition, a mixed large LA cluster including Mexican, Colombian, Nicaraguan, Peruvian, and Salvadorian strains was observed; all LA clusters separated from the Amerind clade. With cagPAI sequence analyses LA clades clearly separated from Europe, Asia and Amerind, and Colombian strains formed a single cluster. A NeighborNet analyses suggested frequent and recent recombination events particularly among LA strains. Results suggests that in the new world, H. pylori has evolved to fit mestizo LA populations, already 500 years after the Spanish colonization. This co-adaption may account for regional variability in gastric cancer risk. PMID:28293542

  19. Whole genome sequence of Klebsiella pneumoniae U25, a hypermucoviscous, multidrug resistant, biofilm producing isolate from India.

    Science.gov (United States)

    Rafiq, Zumaana; Sam, Nithin; Vaidyanathan, Rama

    2016-02-01

    Klebsiella pneumoniae U25 is a multidrug resistant strain isolated from a tertiary care hospital in Chennai, India. Here, we report the complete annotated genome sequence of strain U25 obtained using PacBio RSII. This is the first report of the whole genome of K. pneumoniaespecies from Chennai. It consists of a single circular chromosome of size 5,491,870-bp and two plasmids of size 211,813 and 172,619-bp. The genes associated with multidrug resistance were identified. The chromosome of U25 was found to have eight antibiotic resistant genes [blaOXA-1,blaSHV-28, aac(6')1b-cr,catB3, oqxAB, dfrA1]. The plasmid pMGRU25-001 was found to have only one resistant gene (catA1) while plasmid pMGRU25-002 had 20 resistant genes [strAB, aadA1,aac(6')-Ib, aac(3)-IId,sul1,2, blaTEM-1A,1B,blaOXA-9, blaCTX-M-15,blaSHV-11, cmlA1, erm(B),mph(A)]. A mutation in the porin OmpK36 was identified which is likely to be associated with the intermediate resistance to carbapenems in the absence of carbapenemase genes. U25 is one of the few K. pneumoniaestrains to harbour clustered regularly interspaced short palindromic repeats (CRISPR) systems. Two CRISPR arrays corresponding to Cas3 family helicase were identified in the genome. When compared to K. pneumoniaeNTUHK2044, a transposase gene InsH of IS5-13 was found inserted.

  20. Prevalent role of gene features in determining evolutionary fates of whole-genome duplication duplicated genes in flowering plants.

    Science.gov (United States)

    Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua; Gao, Li-zhi

    2013-04-01

    The evolution of genes and genomes after polyploidization has been the subject of extensive studies in evolutionary biology and plant sciences. While a significant number of duplicated genes are rapidly removed during a process called fractionation, which operates after the whole-genome duplication (WGD), another considerable number of genes are retained preferentially, leading to the phenomenon of biased gene retention. However, the evolutionary mechanisms underlying gene retention after WGD remain largely unknown. Through genome-wide analyses of sequence and functional data, we comprehensively investigated the relationships between gene features and the retention probability of duplicated genes after WGDs in six plant genomes, Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa), soybean (Glycine max), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays). The results showed that multiple gene features were correlated with the probability of gene retention. Using a logistic regression model based on principal component analysis, we resolved evolutionary rate, structural complexity, and GC3 content as the three major contributors to gene retention. Cluster analysis of these features further classified retained genes into three distinct groups in terms of gene features and evolutionary behaviors. Type I genes are more prone to be selected by dosage balance; type II genes are possibly subject to subfunctionalization; and type III genes may serve as potential targets for neofunctionalization. This study highlights that gene features are able to act jointly as primary forces when determining the retention and evolution of WGD-derived duplicated genes in flowering plants. These findings thus may help to provide a resolution to the debate on different evolutionary models of gene fates after WGDs.

  1. Clonality and Resistome analysis of KPC-producing Klebsiella pneumoniae strain isolated in Korea using whole genome sequencing.

    Science.gov (United States)

    Lee, Yangsoon; Kim, Bong-Soo; Chun, Jongsik; Yong, Ji Hyun; Lee, Yeong Seon; Yoo, Jung Sik; Yong, Dongeun; Hong, Seong Geun; D'Souza, Roshan; Thomson, Kenneth S; Lee, Kyungwon; Chong, Yunsop

    2014-01-01

    We analyzed the whole genome sequence and resistome of the outbreak Klebsiella pneumoniae strain MP14 and compared it with those of K. pneumoniae carbapenemase- (KPC-) producing isolates that showed high similarity in the NCBI genome database. A KPC-2-producing multidrug-resistant (MDR) K. pneumoniae clinical isolate was obtained from a patient admitted to a Korean hospital in 2011. The strain MP14 was resistant to all tested β-lactams including monobactam, amikacin, levofloxacin, and cotrimoxazole, but susceptible to tigecycline and colistin. Resistome analysis showed the presence of β-lactamase genes including bla KPC-2, bla SHV-11, bla TEM-169, and bla OXA-9. MP14 also possessed aac(6'-)Ib, aadA2, and aph(3'-)Ia as aminoglycoside resistance-encoding genes, mph(A) for macrolides, oqxA and oqxB for quinolone, catA1 for phenicol, sul1 for sulfonamide, and dfrA12 for trimethoprim. Both SNP tree and cgMLST analysis showed the close relatedness with the KPC producers (KPNIH strains) isolated from an outbreak in the USA and colistin-resistant strains isolated in Italy. The plasmid-scaffold genes in plasmids pKpQil, pKpQil-IT, pKPN3, or pKPN-IT were identified in MP14, KPNIH, and Italian strains. The KPC-2-producing MDR K. pneumoniae ST258 stain isolated in Korea was highly clonally related with MDR K. pneumoniae strains from the USA and Italy. Global spread of KPC-producing K. pneumoniae is a worrying phenomenon.

  2. Whole genome sequencing of an African American family highlights toll like receptor 6 variants in Kawasaki disease susceptibility

    Science.gov (United States)

    Veeraraghavan, Narayanan; Levy, Eric; Ribeiro dos Santos, Andre M.; Yang, Hai; Hibberd, Martin L.; Tremoulet, Adriana H.; Harismendy, Olivier; Ohno-Machado, Lucila; Burns, Jane C.

    2017-01-01

    Kawasaki disease (KD) is the most common acquired pediatric heart disease. We analyzed Whole Genome Sequences (WGS) from a 6-member African American family in which KD affected two of four children. We sought rare, potentially causative genotypes by sequentially applying the following WGS filters: sequence quality scores, inheritance model (recessive homozygous and compound heterozygous), predicted deleteriousness, allele frequency, genes in KD-associated pathways or with significant associations in published KD genome-wide association studies (GWAS), and with differential expression in KD blood transcriptomes. Biologically plausible genotypes were identified in twelve variants in six genes in the two affected children. The affected siblings were compound heterozygous for the rare variants p.Leu194Pro and p.Arg247Lys in Toll-like receptor 6 (TLR6), which affect TLR6 signaling. The affected children were also homozygous for three common, linked (r2 = 1) intronic single nucleotide variants (SNVs) in TLR6 (rs56245262, rs56083757 and rs7669329), that have previously shown association with KD in cohorts of European descent. Using transcriptome data from pre-treatment whole blood of KD subjects (n = 146), expression quantitative trait loci (eQTL) analyses were performed. Subjects homozygous for the intronic risk allele (A allele of TLR6 rs56245262) had differential expression of Interleukin-6 (IL-6) as a function of genotype (p = 0.0007) and a higher erythrocyte sedimentation rate at diagnosis. TLR6 plays an important role in pathogen-associated molecular pattern recognition, and sequence variations may affect binding affinities that in turn influence KD susceptibility. This integrative genomic approach illustrates how the analysis of WGS in multiplex families with a complex genetic disease allows examination of both the common disease–common variant and common disease–rare variant hypotheses. PMID:28151979

  3. Whole genome transcription profiling of Anaplasma phagocytophilum in human and tick host cells by tiling array analysis

    Directory of Open Access Journals (Sweden)

    Chavez Adela

    2008-07-01

    Full Text Available Abstract Background Anaplasma phagocytophilum (Ap is an obligate intracellular bacterium and the agent of human granulocytic anaplasmosis, an emerging tick-borne disease. Ap alternately infects ticks and mammals and a variety of cell types within each. Understanding the biology behind such versatile cellular parasitism may be derived through the use of tiling microarrays to establish high resolution, genome-wide transcription profiles of the organism as it infects cell lines representative of its life cycle (tick; ISE6 and pathogenesis (human; HL-60 and HMEC-1. Results Detailed, host cell specific transcriptional behavior was revealed. There was extensive differential Ap gene transcription between the tick (ISE6 and the human (HL-60 and HMEC-1 cell lines, with far fewer differentially transcribed genes between the human cell lines, and all disproportionately represented by membrane or surface proteins. There were Ap genes exclusively transcribed in each cell line, apparent human- and tick-specific operons and paralogs, and anti-sense transcripts that suggest novel expression regulation processes. Seven virB2 paralogs (of the bacterial type IV secretion system showed human or tick cell dependent transcription. Previously unrecognized genes and coding sequences were identified, as were the expressed p44/msp2 (major surface proteins paralogs (of 114 total, through elevated signal produced to the unique hypervariable region of each – 2/114 in HL-60, 3/114 in HMEC-1, and none in ISE6. Conclusion Using these methods, whole genome transcription profiles can likely be generated for Ap, as well as other obligate intracellular organisms, in any host cells and for all stages of the cell infection process. Visual representation of comprehensive transcription data alongside an annotated map of the genome renders complex transcription into discernable patterns.

  4. Modelling human regulatory variation in mouse: finding the function in genome-wide association studies and whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Jean-François Schmouth

    Full Text Available An increasing body of literature from genome-wide association studies and human whole-genome sequencing highlights the identification of large numbers of candidate regulatory variants of potential therapeutic interest in numerous diseases. Our relatively poor understanding of the functions of non-coding genomic sequence, and the slow and laborious process of experimental validation of the functional significance of human regulatory variants, limits our ability to fully benefit from thi